using rcu_read_lock() after calling dst_neigh_lookup

2017-01-10 Thread Hadar Hen Zion

Hi Dave,

Drivers which are calling dst_neigh_lookup() are also using 
rcu_read_lock() before accessing the neigh pointer (and asking it's ll 
address data and its validity state).


You can find the same behavior in:

drivers/infiniband/core/addr.c, drivers/infiniband/hw/i40iw/i40iw_cm.c, 
drivers/infiniband/hw/nes/nes_cm.c, etc.


(the above locations are just an example).

While the documentation in neighbour.c says:

 "Neighbour entries are protected:
   - with reference count.
   - with rwlock neigh->lock
   Reference count prevents destruction.
   neigh->lock mainly serializes ll address data and its validity state."

So what is the right way to protect the neigh entry parameters? I 
couldn't find why rcu_read_lock() is helping here (dst_neigh_lookup 
already takes a reference on the neigh).


Thank you,

Hadar




[PATCH iproute2 0/2] Fix the usage of TC tunnel key

2016-12-22 Thread Hadar Hen Zion
Add dest UDP port parameter to the usage of tc tunnle key action and
classifcation.


Hadar Hen Zion (2):
  tc/cls_flower: Add to the usage encapsulation dest UDP port
  tc/m_tunnel_key: Add to the usage encapsulation dest UDP port

 tc/f_flower.c | 5 +++--
 tc/m_tunnel_key.c | 2 +-
 2 files changed, 4 insertions(+), 3 deletions(-)

-- 
1.8.3.1



[PATCH iproute2 2/2] tc/m_tunnel_key: Add to the usage encapsulation dest UDP port

2016-12-22 Thread Hadar Hen Zion
tunnel key set parameters includes also dest UDP port, add it to the
usage.

Fixes: 449c709c3868 ("tc/m_tunnel_key: Add dest UDP port to tunnel key action")
Signed-off-by: Hadar Hen Zion <had...@mellanox.com>
Reported-by: Simon Horman <simon.hor...@netronome.com>
---
 tc/m_tunnel_key.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tc/m_tunnel_key.c b/tc/m_tunnel_key.c
index 58a3042..3ceec1c 100644
--- a/tc/m_tunnel_key.c
+++ b/tc/m_tunnel_key.c
@@ -22,7 +22,7 @@
 static void explain(void)
 {
fprintf(stderr, "Usage: tunnel_key unset\n");
-   fprintf(stderr, "   tunnel_key set id TUNNELID src_ip IP dst_ip 
IP\n");
+   fprintf(stderr, "   tunnel_key set id TUNNELID src_ip IP dst_ip IP 
dst_port UDP_PORT\n");
 }
 
 static void usage(void)
-- 
1.8.3.1



[PATCH iproute2 1/2] tc/cls_flower: Add to the usage encapsulation dest UDP port

2016-12-22 Thread Hadar Hen Zion
Encapsulation dest UDP port is part of the classifier matching
parameters, add it to the usage.

Fixes: 41aa17ff4668 ("tc/cls_flower: Add dest UDP port to tunnel params")
Signed-off-by: Hadar Hen Zion <had...@mellanox.com>
Reported-by: Simon Horman <simon.hor...@netronome.com>
---
 tc/f_flower.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/tc/f_flower.c b/tc/f_flower.c
index 653dfef..71e9515 100644
--- a/tc/f_flower.c
+++ b/tc/f_flower.c
@@ -53,10 +53,11 @@ static void explain(void)
"   dst_port PORT-NUMBER |\n"
"   src_port PORT-NUMBER |\n"
"   type ICMP-TYPE |\n"
-   "   code ICMP-CODE }\n"
+   "   code ICMP-CODE |\n"
"   enc_dst_ip [ IPV4-ADDR | IPV6-ADDR ] 
|\n"
"   enc_src_ip [ IPV4-ADDR | IPV6-ADDR ] 
|\n"
-   "   enc_key_id [ KEY-ID ] }\n"
+   "   enc_key_id [ KEY-ID ] |\n"
+   "   enc_dst_port [ UDP-PORT ] }\n"
"   FILTERID := X:Y:Z\n"
"   ACTION-SPEC := ... look at individual actions\n"
"\n"
-- 
1.8.3.1



Re: [PATCH iproute2 2/2] tc/m_tunnel_key: Add dest UDP port to tunnel key action

2016-12-18 Thread Hadar Hen Zion



On 12/15/2016 3:53 PM, Simon Horman wrote:

On Thu, Dec 15, 2016 at 02:03:36PM +0100, Simon Horman wrote:

On Tue, Dec 13, 2016 at 10:07:47AM +0200, Hadar Hen Zion wrote:

Enhance tunnel key action parameters by adding destination UDP port.

Signed-off-by: Hadar Hen Zion <had...@mellanox.com>
Reviewed-by: Roi Dayan <r...@mellanox.com>

Hi,

this looks good to me but could you also update tc/m_tunnel_key.c:usage(); ?

It seems that I was a bit hasty here as I now see that Stephen has
indicated that he has applied this series. I also notice that
patch 1/2 of this series also misses updating usage(). Let me know
if sending some follow-up patches is the best way forwards.

Yes, I you are right, I'll send a follow-up patches.
Thanks,
Hadar



[PATCH iproute2 0/2] Add dest UDP port to IP tunnel parameters

2016-12-13 Thread Hadar Hen Zion
Enhance IP tunnel key classification and action parameters by adding
destination UDP port.

Thanks,
Hadar

Hadar Hen Zion (2):
  tc/cls_flower: Add dest UDP port to tunnel params
  tc/m_tunnel_key: Add dest UDP port to tunnel key action

 man/man8/tc-flower.8 |  8 +++-
 man/man8/tc-tunnel_key.8 |  6 ++
 tc/f_flower.c| 25 +
 tc/m_tunnel_key.c| 32 
 4 files changed, 70 insertions(+), 1 deletion(-)

-- 
1.8.3.1



[PATCH iproute2 2/2] tc/m_tunnel_key: Add dest UDP port to tunnel key action

2016-12-13 Thread Hadar Hen Zion
Enhance tunnel key action parameters by adding destination UDP port.

Signed-off-by: Hadar Hen Zion <had...@mellanox.com>
Reviewed-by: Roi Dayan <r...@mellanox.com>
---
 man/man8/tc-tunnel_key.8 |  6 ++
 tc/m_tunnel_key.c| 32 
 2 files changed, 38 insertions(+)

diff --git a/man/man8/tc-tunnel_key.8 b/man/man8/tc-tunnel_key.8
index 17b15b9..2e56973 100644
--- a/man/man8/tc-tunnel_key.8
+++ b/man/man8/tc-tunnel_key.8
@@ -15,6 +15,7 @@ tunnel_key - Tunnel metadata manipulation
 .BR dst_ip
 .IR ADDRESS
 .BI id " KEY_ID"
+.BI dst_port " UDP_PORT"
 
 .SH DESCRIPTION
 The
@@ -61,6 +62,8 @@ Set tunnel metadata to be used by the IP tunnel device. 
Requires
 and
 .B dst_ip
 options.
+.B dst_port
+is optional.
 .RS
 .TP
 .B id
@@ -71,6 +74,9 @@ Outer header source IP address (IPv4 or IPv6)
 .TP
 .B dst_ip
 Outer header destination IP address (IPv4 or IPv6)
+.TP
+.B dst_port
+Outer header destination UDP port
 .RE
 .SH EXAMPLES
 The following example encapsulates incoming ICMP packets on eth0 into a vxlan
diff --git a/tc/m_tunnel_key.c b/tc/m_tunnel_key.c
index f4a20e2..58a3042 100644
--- a/tc/m_tunnel_key.c
+++ b/tc/m_tunnel_key.c
@@ -60,6 +60,20 @@ static int tunnel_key_parse_key_id(const char *str, int type,
return ret;
 }
 
+static int tunnel_key_parse_dst_port(char *str, int type, struct nlmsghdr *n)
+{
+   int ret;
+   __be16 dst_port;
+
+   ret = get_be16(_port, str, 10);
+   if (ret)
+   return -1;
+
+   addattr16(n, MAX_MSG, type, dst_port);
+
+   return 0;
+}
+
 static int parse_tunnel_key(struct action_util *a, int *argc_p, char ***argv_p,
int tca_id, struct nlmsghdr *n)
 {
@@ -128,6 +142,14 @@ static int parse_tunnel_key(struct action_util *a, int 
*argc_p, char ***argv_p,
return -1;
}
has_key_id = 1;
+   } else if (matches(*argv, "dst_port") == 0) {
+   NEXT_ARG();
+   ret = tunnel_key_parse_dst_port(*argv,
+   
TCA_TUNNEL_KEY_ENC_DST_PORT, n);
+   if (ret < 0) {
+   fprintf(stderr, "Illegal \"dst port\"\n");
+   return -1;
+   }
} else if (matches(*argv, "help") == 0) {
usage();
} else {
@@ -197,6 +219,14 @@ static void tunnel_key_print_key_id(FILE *f, const char 
*name,
fprintf(f, "\n\t%s %d", name, rta_getattr_be32(attr));
 }
 
+static void tunnel_key_print_dst_port(FILE *f, char *name,
+ struct rtattr *attr)
+{
+   if (!attr)
+   return;
+   fprintf(f, "\n\t%s %d", name, rta_getattr_be16(attr));
+}
+
 static int print_tunnel_key(struct action_util *au, FILE *f, struct rtattr 
*arg)
 {
struct rtattr *tb[TCA_TUNNEL_KEY_MAX + 1];
@@ -231,6 +261,8 @@ static int print_tunnel_key(struct action_util *au, FILE 
*f, struct rtattr *arg)
 tb[TCA_TUNNEL_KEY_ENC_IPV6_DST]);
tunnel_key_print_key_id(f, "key_id",
tb[TCA_TUNNEL_KEY_ENC_KEY_ID]);
+   tunnel_key_print_dst_port(f, "dst_port",
+ tb[TCA_TUNNEL_KEY_ENC_DST_PORT]);
break;
}
fprintf(f, " %s", action_n2a(parm->action));
-- 
1.8.3.1



[PATCH iproute2 1/2] tc/cls_flower: Add dest UDP port to tunnel params

2016-12-13 Thread Hadar Hen Zion
Enhance IP tunnel parameters by adding destination UDP port.

Signed-off-by: Hadar Hen Zion <had...@mellanox.com>
Reviewed-by: Roi Dayan <r...@mellanox.com>
---
 man/man8/tc-flower.8 |  8 +++-
 tc/f_flower.c| 25 +
 2 files changed, 32 insertions(+), 1 deletion(-)

diff --git a/man/man8/tc-flower.8 b/man/man8/tc-flower.8
index 90fdfba..88df833 100644
--- a/man/man8/tc-flower.8
+++ b/man/man8/tc-flower.8
@@ -39,6 +39,8 @@ flower \- flow based traffic control filter
 .IR KEY-ID " | {"
 .BR enc_dst_ip " | " enc_src_ip " } { "
 .IR ipv4_address " | " ipv6_address " } | "
+.B enc_dst_port
+.IR UDP-PORT " | "
 .SH DESCRIPTION
 The
 .B flower
@@ -129,11 +131,15 @@ which have to be specified in beforehand.
 .BI enc_dst_ip " ADDRESS"
 .TQ
 .BI enc_src_ip " ADDRESS"
+.TQ
+.BI enc_dst_port " NUMBER"
 Match on IP tunnel metadata. Key id
 .I NUMBER
 is a 32 bit tunnel key id (e.g. VNI for VXLAN tunnel).
 .I ADDRESS
-must be a valid IPv4 or IPv6 address.
+must be a valid IPv4 or IPv6 address. Dst port
+.I NUMBER
+is a 16 bit UDP dst port.
 .SH NOTES
 As stated above where applicable, matches of a certain layer implicitly depend
 on the matches of the next lower layer. Precisely, layer one and two matches
diff --git a/tc/f_flower.c b/tc/f_flower.c
index 5dac427..653dfef 100644
--- a/tc/f_flower.c
+++ b/tc/f_flower.c
@@ -275,6 +275,20 @@ static int flower_parse_key_id(const char *str, int type, 
struct nlmsghdr *n)
return ret;
 }
 
+static int flower_parse_enc_port(char *str, int type, struct nlmsghdr *n)
+{
+   int ret;
+   __be16 port;
+
+   ret = get_be16(, str, 10);
+   if (ret)
+   return -1;
+
+   addattr16(n, MAX_MSG, type, port);
+
+   return 0;
+}
+
 static int flower_parse_opt(struct filter_util *qu, char *handle,
int argc, char **argv, struct nlmsghdr *n)
 {
@@ -482,6 +496,14 @@ static int flower_parse_opt(struct filter_util *qu, char 
*handle,
fprintf(stderr, "Illegal \"enc_key_id\"\n");
return -1;
}
+   } else if (matches(*argv, "enc_dst_port") == 0) {
+   NEXT_ARG();
+   ret = flower_parse_enc_port(*argv,
+   
TCA_FLOWER_KEY_ENC_UDP_DST_PORT, n);
+   if (ret < 0) {
+   fprintf(stderr, "Illegal \"enc_dst_port\"\n");
+   return -1;
+   }
} else if (matches(*argv, "action") == 0) {
NEXT_ARG();
ret = parse_action(, , TCA_FLOWER_ACT, n);
@@ -754,6 +776,9 @@ static int flower_print_opt(struct filter_util *qu, FILE *f,
flower_print_key_id(f, "enc_key_id",
tb[TCA_FLOWER_KEY_ENC_KEY_ID]);
 
+   flower_print_port(f, "enc_dst_port",
+ tb[TCA_FLOWER_KEY_ENC_UDP_DST_PORT]);
+
if (tb[TCA_FLOWER_FLAGS]) {
__u32 flags = rta_getattr_u32(tb[TCA_FLOWER_FLAGS]);
 
-- 
1.8.3.1



Re: [Patch net-next] act_mirred: fix a typo in get_dev

2016-12-04 Thread Hadar Hen Zion



On 12/3/2016 8:36 PM, Cong Wang wrote:

Cc: Hadar Hen Zion <had...@mellanox.com>
Cc: Jiri Pirko <j...@mellanox.com>
Signed-off-by: Cong Wang <xiyou.wangc...@gmail.com>
---
  net/sched/act_mirred.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
index bb09ba3..2d9fa6e 100644
--- a/net/sched/act_mirred.c
+++ b/net/sched/act_mirred.c
@@ -321,7 +321,7 @@ static int tcf_mirred_device(const struct tc_action *a, 
struct net *net,
int ifindex = tcf_mirred_ifindex(a);
  
  	*mirred_dev = __dev_get_by_index(net, ifindex);

-   if (!mirred_dev)
+   if (!*mirred_dev)
return -EINVAL;
return 0;
  }

Thank you for this fix! good catch.
I know it's already applied.

Hadar



[PATCH net-next] net/sched: cls_flower: Set the filter Hardware device for all use-cases

2016-12-04 Thread Hadar Hen Zion
Check if the returned device from tcf_exts_get_dev function supports tc
offload and in case the rule can't be offloaded, set the filter hw_dev
parameter to the original device given by the user.

The filter hw_device parameter should always be set by fl_hw_replace_filter
function, since this pointer is used by dump stats and destroy
filter for each flower rule (offloaded or not).

Fixes: 7091d8c7055d ('net/sched: cls_flower: Add offload support using egress 
Hardware device')
Signed-off-by: Hadar Hen Zion <had...@mellanox.com>
Reported-by: Simon Horman <ho...@verge.net.au>
---
 net/sched/cls_flower.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index c5cea78..29a9e6d 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -236,8 +236,11 @@ static int fl_hw_replace_filter(struct tcf_proto *tp,
int err;
 
if (!tc_can_offload(dev, tp)) {
-   if (tcf_exts_get_dev(dev, >exts, >hw_dev))
+   if (tcf_exts_get_dev(dev, >exts, >hw_dev) ||
+   (f->hw_dev && !tc_can_offload(f->hw_dev, tp))) {
+   f->hw_dev = dev;
return tc_skip_sw(f->flags) ? -EINVAL : 0;
+   }
dev = f->hw_dev;
tc->egress_dev = true;
} else {
-- 
1.8.3.1



[PATCH net-next V2 6/8] net/mlx5e: Bring back representor's ndos that were accidentally removed

2016-12-01 Thread Hadar Hen Zion
The VF Representor udp tunnel ndo entries were removed by mistake,
return them.

Fixes: 370bad0f9a52 ('net/mlx5e: Support HW (offloaded) and SW counters for 
SRIOV switchdev mode')
Signed-off-by: Hadar Hen Zion <had...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index 5e33f6b..9b1e351 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -384,6 +384,8 @@ int mlx5e_get_offload_stats(int attr_id, const struct 
net_device *dev,
.ndo_get_phys_port_name  = mlx5e_rep_get_phys_port_name,
.ndo_setup_tc= mlx5e_rep_ndo_setup_tc,
.ndo_get_stats64 = mlx5e_rep_get_stats,
+   .ndo_udp_tunnel_add  = mlx5e_add_vxlan_port,
+   .ndo_udp_tunnel_del  = mlx5e_del_vxlan_port,
.ndo_has_offload_stats   = mlx5e_has_offload_stats,
.ndo_get_offload_stats   = mlx5e_get_offload_stats,
 };
-- 
1.8.3.1



[PATCH net-next V2 5/8] net/sched: cls_flower: Add offload support using egress Hardware device

2016-12-01 Thread Hadar Hen Zion
In order to support hardware offloading when the device given by the tc
rule is different from the Hardware underline device, extract the mirred
(egress) device from the tc action when a filter is added, using the new
tc_action_ops, get_dev().

Flower caches the information about the mirred device and use it for
calling ndo_setup_tc in filter change, update stats and delete.

Calling ndo_setup_tc of the mirred (egress) device instead of the
ingress device will allow a resolution between the software ingress
device and the underline hardware device.

The resolution will take place inside the offloading driver using
'egress_device' flag added to tc_to_netdev struct which is provided to
the offloading driver.

Signed-off-by: Hadar Hen Zion <had...@mellanox.com>
Acked-by: Jiri Pirko <j...@mellanox.com>
---
 include/linux/netdevice.h |  1 +
 include/net/pkt_cls.h |  2 ++
 net/sched/cls_api.c   | 24 
 net/sched/cls_flower.c| 41 -
 4 files changed, 51 insertions(+), 17 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 3755317..1ff5ea6 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -802,6 +802,7 @@ struct tc_to_netdev {
struct tc_cls_matchall_offload *cls_mall;
struct tc_cls_bpf_offload *cls_bpf;
};
+   bool egress_dev;
 };
 
 /* These structures hold the attributes of xdp state that are being passed
diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 45ad9aa..f0a0514 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -171,6 +171,8 @@ void tcf_exts_change(struct tcf_proto *tp, struct tcf_exts 
*dst,
 struct tcf_exts *src);
 int tcf_exts_dump(struct sk_buff *skb, struct tcf_exts *exts);
 int tcf_exts_dump_stats(struct sk_buff *skb, struct tcf_exts *exts);
+int tcf_exts_get_dev(struct net_device *dev, struct tcf_exts *exts,
+struct net_device **hw_dev);
 
 /**
  * struct tcf_pkt_info - packet information
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index b05d4a2..3fbba79 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -682,6 +682,30 @@ int tcf_exts_dump_stats(struct sk_buff *skb, struct 
tcf_exts *exts)
 }
 EXPORT_SYMBOL(tcf_exts_dump_stats);
 
+int tcf_exts_get_dev(struct net_device *dev, struct tcf_exts *exts,
+struct net_device **hw_dev)
+{
+#ifdef CONFIG_NET_CLS_ACT
+   const struct tc_action *a;
+   LIST_HEAD(actions);
+
+   if (tc_no_actions(exts))
+   return -EINVAL;
+
+   tcf_exts_to_list(exts, );
+   list_for_each_entry(a, , list) {
+   if (a->ops->get_dev) {
+   a->ops->get_dev(a, dev_net(dev), hw_dev);
+   break;
+   }
+   }
+   if (*hw_dev)
+   return 0;
+#endif
+   return -EOPNOTSUPP;
+}
+EXPORT_SYMBOL(tcf_exts_get_dev);
+
 static int __init tc_filter_init(void)
 {
rtnl_register(PF_UNSPEC, RTM_NEWTFILTER, tc_ctl_tfilter, NULL, NULL);
diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index 13b349f..1cacfa5 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -78,6 +78,8 @@ struct cls_fl_filter {
u32 handle;
u32 flags;
struct rcu_head rcu;
+   struct tc_to_netdev tc;
+   struct net_device *hw_dev;
 };
 
 static unsigned short int fl_mask_range(const struct fl_flow_mask *mask)
@@ -203,9 +205,9 @@ static void fl_destroy_filter(struct rcu_head *head)
 
 static void fl_hw_destroy_filter(struct tcf_proto *tp, struct cls_fl_filter *f)
 {
-   struct net_device *dev = tp->q->dev_queue->dev;
struct tc_cls_flower_offload offload = {0};
-   struct tc_to_netdev tc;
+   struct net_device *dev = f->hw_dev;
+   struct tc_to_netdev *tc = >tc;
 
if (!tc_can_offload(dev, tp))
return;
@@ -213,10 +215,10 @@ static void fl_hw_destroy_filter(struct tcf_proto *tp, 
struct cls_fl_filter *f)
offload.command = TC_CLSFLOWER_DESTROY;
offload.cookie = (unsigned long)f;
 
-   tc.type = TC_SETUP_CLSFLOWER;
-   tc.cls_flower = 
+   tc->type = TC_SETUP_CLSFLOWER;
+   tc->cls_flower = 
 
-   dev->netdev_ops->ndo_setup_tc(dev, tp->q->handle, tp->protocol, );
+   dev->netdev_ops->ndo_setup_tc(dev, tp->q->handle, tp->protocol, tc);
 }
 
 static int fl_hw_replace_filter(struct tcf_proto *tp,
@@ -226,11 +228,17 @@ static int fl_hw_replace_filter(struct tcf_proto *tp,
 {
struct net_device *dev = tp->q->dev_queue->dev;
struct tc_cls_flower_offload offload = {0};
-   struct tc_to_netdev tc;
+   struct tc_to_netdev *tc = >tc;
int err;
 
-   if (!tc_can_offload(dev, tp))
-   return tc_skip_sw(f->flags) ? -EINVAL : 0;
+ 

[PATCH net-next V2 2/8] net/sched: cls_flower: Try to offload only if skip_hw flag isn't set

2016-12-01 Thread Hadar Hen Zion
Check skip_hw flag isn't set before calling
fl_hw_{replace/destroy}_filter and fl_hw_update_stats functions.

Replace the call to tc_should_offload with tc_can_offload.
tc_can_offload only checks if the device supports offloading, the check for
skip_hw flag is done earlier in the flow.

Signed-off-by: Hadar Hen Zion <had...@mellanox.com>
Acked-by: Jiri Pirko <j...@mellanox.com>
---
 net/sched/cls_flower.c | 35 ---
 1 file changed, 20 insertions(+), 15 deletions(-)

diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index e8dd09a..5e70f65 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -207,7 +207,7 @@ static void fl_hw_destroy_filter(struct tcf_proto *tp, 
unsigned long cookie)
struct tc_cls_flower_offload offload = {0};
struct tc_to_netdev tc;
 
-   if (!tc_should_offload(dev, tp, 0))
+   if (!tc_can_offload(dev, tp))
return;
 
offload.command = TC_CLSFLOWER_DESTROY;
@@ -231,7 +231,7 @@ static int fl_hw_replace_filter(struct tcf_proto *tp,
struct tc_to_netdev tc;
int err;
 
-   if (!tc_should_offload(dev, tp, flags))
+   if (!tc_can_offload(dev, tp))
return tc_skip_sw(flags) ? -EINVAL : 0;
 
offload.command = TC_CLSFLOWER_REPLACE;
@@ -259,7 +259,7 @@ static void fl_hw_update_stats(struct tcf_proto *tp, struct 
cls_fl_filter *f)
struct tc_cls_flower_offload offload = {0};
struct tc_to_netdev tc;
 
-   if (!tc_should_offload(dev, tp, 0))
+   if (!tc_can_offload(dev, tp))
return;
 
offload.command = TC_CLSFLOWER_STATS;
@@ -275,7 +275,8 @@ static void fl_hw_update_stats(struct tcf_proto *tp, struct 
cls_fl_filter *f)
 static void __fl_delete(struct tcf_proto *tp, struct cls_fl_filter *f)
 {
list_del_rcu(>list);
-   fl_hw_destroy_filter(tp, (unsigned long)f);
+   if (!tc_skip_hw(f->flags))
+   fl_hw_destroy_filter(tp, (unsigned long)f);
tcf_unbind_filter(tp, >res);
call_rcu(>rcu, fl_destroy_filter);
 }
@@ -743,20 +744,23 @@ static int fl_change(struct net *net, struct sk_buff 
*in_skb,
goto errout;
}
 
-   err = fl_hw_replace_filter(tp,
-  >dissector,
-  ,
-  >key,
-  >exts,
-  (unsigned long)fnew,
-  fnew->flags);
-   if (err)
-   goto errout;
+   if (!tc_skip_hw(fnew->flags)) {
+   err = fl_hw_replace_filter(tp,
+  >dissector,
+  ,
+  >key,
+  >exts,
+  (unsigned long)fnew,
+  fnew->flags);
+   if (err)
+   goto errout;
+   }
 
if (fold) {
rhashtable_remove_fast(>ht, >ht_node,
   head->ht_params);
-   fl_hw_destroy_filter(tp, (unsigned long)fold);
+   if (!tc_skip_hw(fold->flags))
+   fl_hw_destroy_filter(tp, (unsigned long)fold);
}
 
*arg = (unsigned long) fnew;
@@ -879,7 +883,8 @@ static int fl_dump(struct net *net, struct tcf_proto *tp, 
unsigned long fh,
goto nla_put_failure;
}
 
-   fl_hw_update_stats(tp, f);
+   if (!tc_skip_hw(f->flags))
+   fl_hw_update_stats(tp, f);
 
if (fl_dump_key_val(skb, key->eth.dst, TCA_FLOWER_KEY_ETH_DST,
mask->eth.dst, TCA_FLOWER_KEY_ETH_DST_MASK,
-- 
1.8.3.1



[PATCH net-next V2 8/8] net/mlx5e: Support adding ingress tc rule when egress device flag is set

2016-12-01 Thread Hadar Hen Zion
When ndo_setup_tc is called with an egress_dev flag set, it means that
the ndo call was executed on the mirred action (egress) device and not
on the ingress device.

In order to support this kind of ndo_setup_tc call, and insert the
correct decap rule to the hardware, the uplink device on the same eswitch
should be found.

Currently, we use this resolution between the mirred device and the
uplink on the same eswitch to offload vxlan shared device decap rules.

Signed-off-by: Hadar Hen Zion <had...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index 0868677..8503788 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -289,6 +289,14 @@ static int mlx5e_rep_ndo_setup_tc(struct net_device *dev, 
u32 handle,
if (TC_H_MAJ(handle) != TC_H_MAJ(TC_H_INGRESS))
return -EOPNOTSUPP;
 
+   if (tc->egress_dev) {
+   struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
+   struct net_device *uplink_dev = 
mlx5_eswitch_get_uplink_netdev(esw);
+
+   return uplink_dev->netdev_ops->ndo_setup_tc(uplink_dev, handle,
+   proto, tc);
+   }
+
switch (tc->type) {
case TC_SETUP_CLSFLOWER:
switch (tc->cls_flower->command) {
-- 
1.8.3.1



[PATCH net-next V2 3/8] net/sched: cls_flower: Provide a filter to replace/destroy hardware filter functions

2016-12-01 Thread Hadar Hen Zion
Instead of providing many arguments to fl_hw_{replace/destroy}_filter
functions, just provide cls_fl_filter struct that includes all the relevant
args.

This patches doesn't add any new functionality.

Signed-off-by: Hadar Hen Zion <had...@mellanox.com>
Acked-by: Jiri Pirko <j...@mellanox.com>
---
 net/sched/cls_flower.c | 27 +++
 1 file changed, 11 insertions(+), 16 deletions(-)

diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index 5e70f65..13b349f 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -201,7 +201,7 @@ static void fl_destroy_filter(struct rcu_head *head)
kfree(f);
 }
 
-static void fl_hw_destroy_filter(struct tcf_proto *tp, unsigned long cookie)
+static void fl_hw_destroy_filter(struct tcf_proto *tp, struct cls_fl_filter *f)
 {
struct net_device *dev = tp->q->dev_queue->dev;
struct tc_cls_flower_offload offload = {0};
@@ -211,7 +211,7 @@ static void fl_hw_destroy_filter(struct tcf_proto *tp, 
unsigned long cookie)
return;
 
offload.command = TC_CLSFLOWER_DESTROY;
-   offload.cookie = cookie;
+   offload.cookie = (unsigned long)f;
 
tc.type = TC_SETUP_CLSFLOWER;
tc.cls_flower = 
@@ -222,9 +222,7 @@ static void fl_hw_destroy_filter(struct tcf_proto *tp, 
unsigned long cookie)
 static int fl_hw_replace_filter(struct tcf_proto *tp,
struct flow_dissector *dissector,
struct fl_flow_key *mask,
-   struct fl_flow_key *key,
-   struct tcf_exts *actions,
-   unsigned long cookie, u32 flags)
+   struct cls_fl_filter *f)
 {
struct net_device *dev = tp->q->dev_queue->dev;
struct tc_cls_flower_offload offload = {0};
@@ -232,14 +230,14 @@ static int fl_hw_replace_filter(struct tcf_proto *tp,
int err;
 
if (!tc_can_offload(dev, tp))
-   return tc_skip_sw(flags) ? -EINVAL : 0;
+   return tc_skip_sw(f->flags) ? -EINVAL : 0;
 
offload.command = TC_CLSFLOWER_REPLACE;
-   offload.cookie = cookie;
+   offload.cookie = (unsigned long)f;
offload.dissector = dissector;
offload.mask = mask;
-   offload.key = key;
-   offload.exts = actions;
+   offload.key = >key;
+   offload.exts = >exts;
 
tc.type = TC_SETUP_CLSFLOWER;
tc.cls_flower = 
@@ -247,7 +245,7 @@ static int fl_hw_replace_filter(struct tcf_proto *tp,
err = dev->netdev_ops->ndo_setup_tc(dev, tp->q->handle, tp->protocol,
);
 
-   if (tc_skip_sw(flags))
+   if (tc_skip_sw(f->flags))
return err;
 
return 0;
@@ -276,7 +274,7 @@ static void __fl_delete(struct tcf_proto *tp, struct 
cls_fl_filter *f)
 {
list_del_rcu(>list);
if (!tc_skip_hw(f->flags))
-   fl_hw_destroy_filter(tp, (unsigned long)f);
+   fl_hw_destroy_filter(tp, f);
tcf_unbind_filter(tp, >res);
call_rcu(>rcu, fl_destroy_filter);
 }
@@ -748,10 +746,7 @@ static int fl_change(struct net *net, struct sk_buff 
*in_skb,
err = fl_hw_replace_filter(tp,
   >dissector,
   ,
-  >key,
-  >exts,
-  (unsigned long)fnew,
-  fnew->flags);
+  fnew);
if (err)
goto errout;
}
@@ -760,7 +755,7 @@ static int fl_change(struct net *net, struct sk_buff 
*in_skb,
rhashtable_remove_fast(>ht, >ht_node,
   head->ht_params);
if (!tc_skip_hw(fold->flags))
-   fl_hw_destroy_filter(tp, (unsigned long)fold);
+   fl_hw_destroy_filter(tp, fold);
}
 
*arg = (unsigned long) fnew;
-- 
1.8.3.1



[PATCH net-next V2 4/8] net/sched: act_mirred: Add new tc_action_ops get_dev()

2016-12-01 Thread Hadar Hen Zion
Adding support to a new tc_action_ops.
get_dev is a general option which allows to get the underline
device when trying to offload a tc rule.

In case of mirred action the returned device is the mirred (egress)
device.

Signed-off-by: Hadar Hen Zion <had...@mellanox.com>
Reviewed-by: Jiri Pirko <j...@mellanox.com>
---
 include/net/act_api.h  |  2 ++
 net/sched/act_mirred.c | 12 
 2 files changed, 14 insertions(+)

diff --git a/include/net/act_api.h b/include/net/act_api.h
index d8eae87..9dddf77 100644
--- a/include/net/act_api.h
+++ b/include/net/act_api.h
@@ -119,6 +119,8 @@ struct tc_action_ops {
int (*walk)(struct net *, struct sk_buff *,
struct netlink_callback *, int, const struct 
tc_action_ops *);
void(*stats_update)(struct tc_action *, u64, u32, u64);
+   int (*get_dev)(const struct tc_action *a, struct net *net,
+  struct net_device **mirred_dev);
 };
 
 struct tc_action_net {
diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
index 1af7baa..bb09ba3 100644
--- a/net/sched/act_mirred.c
+++ b/net/sched/act_mirred.c
@@ -315,6 +315,17 @@ static int mirred_device_event(struct notifier_block 
*unused,
.notifier_call = mirred_device_event,
 };
 
+static int tcf_mirred_device(const struct tc_action *a, struct net *net,
+struct net_device **mirred_dev)
+{
+   int ifindex = tcf_mirred_ifindex(a);
+
+   *mirred_dev = __dev_get_by_index(net, ifindex);
+   if (!mirred_dev)
+   return -EINVAL;
+   return 0;
+}
+
 static struct tc_action_ops act_mirred_ops = {
.kind   =   "mirred",
.type   =   TCA_ACT_MIRRED,
@@ -327,6 +338,7 @@ static int mirred_device_event(struct notifier_block 
*unused,
.walk   =   tcf_mirred_walker,
.lookup =   tcf_mirred_search,
.size   =   sizeof(struct tcf_mirred),
+   .get_dev=   tcf_mirred_device,
 };
 
 static __net_init int mirred_init_net(struct net *net)
-- 
1.8.3.1



[PATCH net-next V2 0/8] Offloading tc rules using underline Hardware device

2016-12-01 Thread Hadar Hen Zion
This series adds flower classifier support in offloading tc rules when the
Software ingress device is different from the Hardware ingress device, 
such as when dealing with IP tunnels  

The first two patches are a small fixes to flower, checking the skip_hw flag
wasn't set before calling the Hardware offloading functions which will try to
offload the rule.

The next two patches are infrastructure patches, a preparation for the fourth
patch which is adding support in flower to offload rules when the ingress
device is not a Hardware device and therefore can't offload.
In this case ndo_setup_tc is called with the mirred (egress) device.

The last three patchs are adding mlx5e support to offload rules using the new
"egress_device" flag.

Thanks,
Hadar

Changes from v0:
- check if CONFIG_NET_CLS_ACT is defined befor calling tc_action_ops get_dev()

Hadar Hen Zion (8):
  net/sched: Add separate check for skip_hw flag
  net/sched: cls_flower: Try to offload only if skip_hw flag isn't set
  net/sched: cls_flower: Provide a filter to replace/destroy hardware
filter functions
  net/sched: act_mirred: Add new tc_action_ops get_dev()
  net/sched: cls_flower: Add offload support using egress Hardware
device
  net/mlx5e: Bring back representor's ndos that were accidentally
removed
  net/mlx5e: Save the represntor netdevice as part of the representor
  net/mlx5e: Support adding ingress tc rule when egress device flag is
set

 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c   | 25 +--
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.h  |  3 +-
 .../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 12 ++-
 include/linux/netdevice.h  |  1 +
 include/net/act_api.h  |  2 +
 include/net/pkt_cls.h  | 21 +-
 net/sched/act_mirred.c | 12 +++
 net/sched/cls_api.c| 24 ++
 net/sched/cls_flower.c | 87 --
 10 files changed, 135 insertions(+), 54 deletions(-)

-- 
1.8.3.1



[PATCH net-next V2 7/8] net/mlx5e: Save the represntor netdevice as part of the representor

2016-12-01 Thread Hadar Hen Zion
Replace the representor private data to a net_device pointer holding the
representor netdevice, instead of void pointer holding mlx5e_priv.

It will be used by a new eswitch service function, returning the uplink 
representor
netdevice.

Signed-off-by: Hadar Hen Zion <had...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c  | 15 ---
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.h |  3 ++-
 .../net/ethernet/mellanox/mlx5/core/eswitch_offloads.c| 12 +++-
 4 files changed, 22 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 6b492ca..37c0d84 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -3796,7 +3796,7 @@ static void mlx5e_nic_enable(struct mlx5e_priv *priv)
rep.load = mlx5e_nic_rep_load;
rep.unload = mlx5e_nic_rep_unload;
rep.vport = FDB_UPLINK_VPORT;
-   rep.priv_data = priv;
+   rep.netdev = netdev;
mlx5_eswitch_register_vport_rep(esw, 0, );
}
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index 9b1e351..0868677 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -208,7 +208,8 @@ int mlx5e_add_sqs_fwd_rules(struct mlx5e_priv *priv)
 
 int mlx5e_nic_rep_load(struct mlx5_eswitch *esw, struct mlx5_eswitch_rep *rep)
 {
-   struct mlx5e_priv *priv = rep->priv_data;
+   struct net_device *netdev = rep->netdev;
+   struct mlx5e_priv *priv = netdev_priv(netdev);
 
if (test_bit(MLX5E_STATE_OPENED, >state))
return mlx5e_add_sqs_fwd_rules(priv);
@@ -226,7 +227,8 @@ void mlx5e_remove_sqs_fwd_rules(struct mlx5e_priv *priv)
 void mlx5e_nic_rep_unload(struct mlx5_eswitch *esw,
  struct mlx5_eswitch_rep *rep)
 {
-   struct mlx5e_priv *priv = rep->priv_data;
+   struct net_device *netdev = rep->netdev;
+   struct mlx5e_priv *priv = netdev_priv(netdev);
 
if (test_bit(MLX5E_STATE_OPENED, >state))
mlx5e_remove_sqs_fwd_rules(priv);
@@ -555,7 +557,7 @@ int mlx5e_vport_rep_load(struct mlx5_eswitch *esw,
return -EINVAL;
}
 
-   rep->priv_data = netdev_priv(netdev);
+   rep->netdev = netdev;
 
err = mlx5e_attach_netdev(esw->dev, netdev);
if (err) {
@@ -577,7 +579,7 @@ int mlx5e_vport_rep_load(struct mlx5_eswitch *esw,
mlx5e_detach_netdev(esw->dev, netdev);
 
 err_destroy_netdev:
-   mlx5e_destroy_netdev(esw->dev, rep->priv_data);
+   mlx5e_destroy_netdev(esw->dev, netdev_priv(netdev));
 
return err;
 
@@ -586,10 +588,9 @@ int mlx5e_vport_rep_load(struct mlx5_eswitch *esw,
 void mlx5e_vport_rep_unload(struct mlx5_eswitch *esw,
struct mlx5_eswitch_rep *rep)
 {
-   struct mlx5e_priv *priv = rep->priv_data;
-   struct net_device *netdev = priv->netdev;
+   struct net_device *netdev = rep->netdev;
 
unregister_netdev(netdev);
mlx5e_detach_netdev(esw->dev, netdev);
-   mlx5e_destroy_netdev(esw->dev, priv);
+   mlx5e_destroy_netdev(esw->dev, netdev_priv(netdev));
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h 
b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
index cf1aa56..8661dd3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
@@ -186,7 +186,7 @@ struct mlx5_eswitch_rep {
 struct mlx5_eswitch_rep *rep);
u16vport;
u8 hw_id[ETH_ALEN];
-   void  *priv_data;
+   struct net_device  *netdev;
 
struct mlx5_flow_handle *vport_rx_rule;
struct list_head   vport_sqs_list;
@@ -318,6 +318,7 @@ void mlx5_eswitch_register_vport_rep(struct mlx5_eswitch 
*esw,
 struct mlx5_eswitch_rep *rep);
 void mlx5_eswitch_unregister_vport_rep(struct mlx5_eswitch *esw,
   int vport_index);
+struct net_device *mlx5_eswitch_get_uplink_netdev(struct mlx5_eswitch *esw);
 
 int mlx5_eswitch_add_vlan_action(struct mlx5_eswitch *esw,
 struct mlx5_esw_flow_attr *attr);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c 
b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index 5c01550..466e161 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -970,7 +970,7 @@ void mlx5_eswitch_register_vport_r

[PATCH net-next V2 1/8] net/sched: Add separate check for skip_hw flag

2016-12-01 Thread Hadar Hen Zion
Creating a difference between two possible cases:
1. Not offloading tc rule since the user sets 'skip_hw' flag.
2. Not offloading tc rule since the device doesn't support offloading.

This patch doesn't add any new functionality.

Signed-off-by: Hadar Hen Zion <had...@mellanox.com>
Reviewed-by: Jiri Pirko <j...@mellanox.com>
---
 include/net/pkt_cls.h | 19 +++
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 767b03a..45ad9aa 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -425,16 +425,14 @@ struct tc_cls_u32_offload {
};
 };
 
-static inline bool tc_should_offload(const struct net_device *dev,
-const struct tcf_proto *tp, u32 flags)
+static inline bool tc_can_offload(const struct net_device *dev,
+ const struct tcf_proto *tp)
 {
const struct Qdisc *sch = tp->q;
const struct Qdisc_class_ops *cops = sch->ops->cl_ops;
 
if (!(dev->features & NETIF_F_HW_TC))
return false;
-   if (flags & TCA_CLS_FLAGS_SKIP_HW)
-   return false;
if (!dev->netdev_ops->ndo_setup_tc)
return false;
if (cops && cops->tcf_cl_offload)
@@ -443,6 +441,19 @@ static inline bool tc_should_offload(const struct 
net_device *dev,
return true;
 }
 
+static inline bool tc_skip_hw(u32 flags)
+{
+   return (flags & TCA_CLS_FLAGS_SKIP_HW) ? true : false;
+}
+
+static inline bool tc_should_offload(const struct net_device *dev,
+const struct tcf_proto *tp, u32 flags)
+{
+   if (tc_skip_hw(flags))
+   return false;
+   return tc_can_offload(dev, tp);
+}
+
 static inline bool tc_skip_sw(u32 flags)
 {
return (flags & TCA_CLS_FLAGS_SKIP_SW) ? true : false;
-- 
1.8.3.1



[PATCH net-next 5/8] net/sched: cls_flower: Add offload support using egress Hardware device

2016-11-30 Thread Hadar Hen Zion
In order to support hardware offloading when the device given by the tc
rule is different from the Hardware underline device, extract the mirred
(egress) device from the tc action when a filter is added, using the new
tc_action_ops, get_dev().

Flower caches the information about the mirred device and use it for
calling ndo_setup_tc in filter change, update stats and delete.

Calling ndo_setup_tc of the mirred (egress) device instead of the
ingress device will allow a resolution between the software ingress
device and the underline hardware device.

The resolution will take place inside the offloading driver using
'egress_device' flag added to tc_to_netdev struct which is provided to
the offloading driver.

Signed-off-by: Hadar Hen Zion <had...@mellanox.com>
Acked-by: Jiri Pirko <j...@mellanox.com>
---
 include/linux/netdevice.h |  1 +
 include/net/pkt_cls.h |  2 ++
 net/sched/cls_api.c   | 22 ++
 net/sched/cls_flower.c| 41 -
 4 files changed, 49 insertions(+), 17 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 4ffcd87..00c351c 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -802,6 +802,7 @@ struct tc_to_netdev {
struct tc_cls_matchall_offload *cls_mall;
struct tc_cls_bpf_offload *cls_bpf;
};
+   bool egress_dev;
 };
 
 /* These structures hold the attributes of xdp state that are being passed
diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 45ad9aa..f0a0514 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -171,6 +171,8 @@ void tcf_exts_change(struct tcf_proto *tp, struct tcf_exts 
*dst,
 struct tcf_exts *src);
 int tcf_exts_dump(struct sk_buff *skb, struct tcf_exts *exts);
 int tcf_exts_dump_stats(struct sk_buff *skb, struct tcf_exts *exts);
+int tcf_exts_get_dev(struct net_device *dev, struct tcf_exts *exts,
+struct net_device **hw_dev);
 
 /**
  * struct tcf_pkt_info - packet information
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index b05d4a2..e7aeab9 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -682,6 +682,28 @@ int tcf_exts_dump_stats(struct sk_buff *skb, struct 
tcf_exts *exts)
 }
 EXPORT_SYMBOL(tcf_exts_dump_stats);
 
+int tcf_exts_get_dev(struct net_device *dev, struct tcf_exts *exts,
+struct net_device **hw_dev)
+{
+   const struct tc_action *a;
+   LIST_HEAD(actions);
+
+   if (tc_no_actions(exts))
+   return -EINVAL;
+
+   tcf_exts_to_list(exts, );
+   list_for_each_entry(a, , list) {
+   if (a->ops->get_dev) {
+   a->ops->get_dev(a, dev_net(dev), hw_dev);
+   break;
+   }
+   }
+   if (*hw_dev)
+   return 0;
+   return -EOPNOTSUPP;
+}
+EXPORT_SYMBOL(tcf_exts_get_dev);
+
 static int __init tc_filter_init(void)
 {
rtnl_register(PF_UNSPEC, RTM_NEWTFILTER, tc_ctl_tfilter, NULL, NULL);
diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index 13b349f..1cacfa5 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -78,6 +78,8 @@ struct cls_fl_filter {
u32 handle;
u32 flags;
struct rcu_head rcu;
+   struct tc_to_netdev tc;
+   struct net_device *hw_dev;
 };
 
 static unsigned short int fl_mask_range(const struct fl_flow_mask *mask)
@@ -203,9 +205,9 @@ static void fl_destroy_filter(struct rcu_head *head)
 
 static void fl_hw_destroy_filter(struct tcf_proto *tp, struct cls_fl_filter *f)
 {
-   struct net_device *dev = tp->q->dev_queue->dev;
struct tc_cls_flower_offload offload = {0};
-   struct tc_to_netdev tc;
+   struct net_device *dev = f->hw_dev;
+   struct tc_to_netdev *tc = >tc;
 
if (!tc_can_offload(dev, tp))
return;
@@ -213,10 +215,10 @@ static void fl_hw_destroy_filter(struct tcf_proto *tp, 
struct cls_fl_filter *f)
offload.command = TC_CLSFLOWER_DESTROY;
offload.cookie = (unsigned long)f;
 
-   tc.type = TC_SETUP_CLSFLOWER;
-   tc.cls_flower = 
+   tc->type = TC_SETUP_CLSFLOWER;
+   tc->cls_flower = 
 
-   dev->netdev_ops->ndo_setup_tc(dev, tp->q->handle, tp->protocol, );
+   dev->netdev_ops->ndo_setup_tc(dev, tp->q->handle, tp->protocol, tc);
 }
 
 static int fl_hw_replace_filter(struct tcf_proto *tp,
@@ -226,11 +228,17 @@ static int fl_hw_replace_filter(struct tcf_proto *tp,
 {
struct net_device *dev = tp->q->dev_queue->dev;
struct tc_cls_flower_offload offload = {0};
-   struct tc_to_netdev tc;
+   struct tc_to_netdev *tc = >tc;
int err;
 
-   if (!tc_can_offload(dev, tp))
-   return tc_skip_sw(f->flags) ? -EINVAL : 0;
+   if (!tc_can_offload(dev, tp)) {
+   if (tcf_exts_g

[PATCH net-next 8/8] net/mlx5e: Support adding ingress tc rule when egress device flag is set

2016-11-30 Thread Hadar Hen Zion
When ndo_setup_tc is called with an egress_dev flag set, it means that
the ndo call was executed on the mirred action (egress) device and not
on the ingress device.

In order to support this kind of ndo_setup_tc call, and insert the
correct decap rule to the hardware, the uplink device on the same eswitch
should be found.

Currently, we use this resolution between the mirred device and the
uplink on the same eswitch to offload vxlan shared device decap rules.

Signed-off-by: Hadar Hen Zion <had...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index 0868677..8503788 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -289,6 +289,14 @@ static int mlx5e_rep_ndo_setup_tc(struct net_device *dev, 
u32 handle,
if (TC_H_MAJ(handle) != TC_H_MAJ(TC_H_INGRESS))
return -EOPNOTSUPP;
 
+   if (tc->egress_dev) {
+   struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
+   struct net_device *uplink_dev = 
mlx5_eswitch_get_uplink_netdev(esw);
+
+   return uplink_dev->netdev_ops->ndo_setup_tc(uplink_dev, handle,
+   proto, tc);
+   }
+
switch (tc->type) {
case TC_SETUP_CLSFLOWER:
switch (tc->cls_flower->command) {
-- 
1.8.3.1



[PATCH net-next 6/8] net/mlx5e: Bring back representor's ndos that were accidentally removed

2016-11-30 Thread Hadar Hen Zion
The VF Representor udp tunnel ndo entries were removed by mistake,
return them.

Fixes: 370bad0f9a52 ('net/mlx5e: Support HW (offloaded) and SW counters for 
SRIOV switchdev mode')
Signed-off-by: Hadar Hen Zion <had...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index 5e33f6b..9b1e351 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -384,6 +384,8 @@ int mlx5e_get_offload_stats(int attr_id, const struct 
net_device *dev,
.ndo_get_phys_port_name  = mlx5e_rep_get_phys_port_name,
.ndo_setup_tc= mlx5e_rep_ndo_setup_tc,
.ndo_get_stats64 = mlx5e_rep_get_stats,
+   .ndo_udp_tunnel_add  = mlx5e_add_vxlan_port,
+   .ndo_udp_tunnel_del  = mlx5e_del_vxlan_port,
.ndo_has_offload_stats   = mlx5e_has_offload_stats,
.ndo_get_offload_stats   = mlx5e_get_offload_stats,
 };
-- 
1.8.3.1



[PATCH net-next 1/8] net/sched: Add separate check for skip_hw flag

2016-11-30 Thread Hadar Hen Zion
Creating a difference between two possible cases:
1. Not offloading tc rule since the user sets 'skip_hw' flag.
2. Not offloading tc rule since the device doesn't support offloading.

This patch doesn't add any new functionality.

Signed-off-by: Hadar Hen Zion <had...@mellanox.com>
Reviewed-by: Jiri Pirko <j...@mellanox.com>
---
 include/net/pkt_cls.h | 19 +++
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 767b03a..45ad9aa 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -425,16 +425,14 @@ struct tc_cls_u32_offload {
};
 };
 
-static inline bool tc_should_offload(const struct net_device *dev,
-const struct tcf_proto *tp, u32 flags)
+static inline bool tc_can_offload(const struct net_device *dev,
+ const struct tcf_proto *tp)
 {
const struct Qdisc *sch = tp->q;
const struct Qdisc_class_ops *cops = sch->ops->cl_ops;
 
if (!(dev->features & NETIF_F_HW_TC))
return false;
-   if (flags & TCA_CLS_FLAGS_SKIP_HW)
-   return false;
if (!dev->netdev_ops->ndo_setup_tc)
return false;
if (cops && cops->tcf_cl_offload)
@@ -443,6 +441,19 @@ static inline bool tc_should_offload(const struct 
net_device *dev,
return true;
 }
 
+static inline bool tc_skip_hw(u32 flags)
+{
+   return (flags & TCA_CLS_FLAGS_SKIP_HW) ? true : false;
+}
+
+static inline bool tc_should_offload(const struct net_device *dev,
+const struct tcf_proto *tp, u32 flags)
+{
+   if (tc_skip_hw(flags))
+   return false;
+   return tc_can_offload(dev, tp);
+}
+
 static inline bool tc_skip_sw(u32 flags)
 {
return (flags & TCA_CLS_FLAGS_SKIP_SW) ? true : false;
-- 
1.8.3.1



[PATCH net-next 2/8] net/sched: cls_flower: Try to offload only if skip_hw flag isn't set

2016-11-30 Thread Hadar Hen Zion
Check skip_hw flag isn't set before calling
fl_hw_{replace/destroy}_filter and fl_hw_update_stats functions.

Replace the call to tc_should_offload with tc_can_offload.
tc_can_offload only checks if the device supports offloading, the check for
skip_hw flag is done earlier in the flow.

Signed-off-by: Hadar Hen Zion <had...@mellanox.com>
Acked-by: Jiri Pirko <j...@mellanox.com>
---
 net/sched/cls_flower.c | 35 ---
 1 file changed, 20 insertions(+), 15 deletions(-)

diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index e8dd09a..5e70f65 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -207,7 +207,7 @@ static void fl_hw_destroy_filter(struct tcf_proto *tp, 
unsigned long cookie)
struct tc_cls_flower_offload offload = {0};
struct tc_to_netdev tc;
 
-   if (!tc_should_offload(dev, tp, 0))
+   if (!tc_can_offload(dev, tp))
return;
 
offload.command = TC_CLSFLOWER_DESTROY;
@@ -231,7 +231,7 @@ static int fl_hw_replace_filter(struct tcf_proto *tp,
struct tc_to_netdev tc;
int err;
 
-   if (!tc_should_offload(dev, tp, flags))
+   if (!tc_can_offload(dev, tp))
return tc_skip_sw(flags) ? -EINVAL : 0;
 
offload.command = TC_CLSFLOWER_REPLACE;
@@ -259,7 +259,7 @@ static void fl_hw_update_stats(struct tcf_proto *tp, struct 
cls_fl_filter *f)
struct tc_cls_flower_offload offload = {0};
struct tc_to_netdev tc;
 
-   if (!tc_should_offload(dev, tp, 0))
+   if (!tc_can_offload(dev, tp))
return;
 
offload.command = TC_CLSFLOWER_STATS;
@@ -275,7 +275,8 @@ static void fl_hw_update_stats(struct tcf_proto *tp, struct 
cls_fl_filter *f)
 static void __fl_delete(struct tcf_proto *tp, struct cls_fl_filter *f)
 {
list_del_rcu(>list);
-   fl_hw_destroy_filter(tp, (unsigned long)f);
+   if (!tc_skip_hw(f->flags))
+   fl_hw_destroy_filter(tp, (unsigned long)f);
tcf_unbind_filter(tp, >res);
call_rcu(>rcu, fl_destroy_filter);
 }
@@ -743,20 +744,23 @@ static int fl_change(struct net *net, struct sk_buff 
*in_skb,
goto errout;
}
 
-   err = fl_hw_replace_filter(tp,
-  >dissector,
-  ,
-  >key,
-  >exts,
-  (unsigned long)fnew,
-  fnew->flags);
-   if (err)
-   goto errout;
+   if (!tc_skip_hw(fnew->flags)) {
+   err = fl_hw_replace_filter(tp,
+  >dissector,
+  ,
+  >key,
+  >exts,
+  (unsigned long)fnew,
+  fnew->flags);
+   if (err)
+   goto errout;
+   }
 
if (fold) {
rhashtable_remove_fast(>ht, >ht_node,
   head->ht_params);
-   fl_hw_destroy_filter(tp, (unsigned long)fold);
+   if (!tc_skip_hw(fold->flags))
+   fl_hw_destroy_filter(tp, (unsigned long)fold);
}
 
*arg = (unsigned long) fnew;
@@ -879,7 +883,8 @@ static int fl_dump(struct net *net, struct tcf_proto *tp, 
unsigned long fh,
goto nla_put_failure;
}
 
-   fl_hw_update_stats(tp, f);
+   if (!tc_skip_hw(f->flags))
+   fl_hw_update_stats(tp, f);
 
if (fl_dump_key_val(skb, key->eth.dst, TCA_FLOWER_KEY_ETH_DST,
mask->eth.dst, TCA_FLOWER_KEY_ETH_DST_MASK,
-- 
1.8.3.1



[PATCH net-next 7/8] net/mlx5e: Save the represntor netdevice as part of the representor

2016-11-30 Thread Hadar Hen Zion
Replace the representor private data to a net_device pointer holding the
representor netdevice, instead of void pointer holding mlx5e_priv.

It will be used by a new eswitch service function, returning the uplink 
representor
netdevice.

Signed-off-by: Hadar Hen Zion <had...@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c  | 15 ---
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.h |  3 ++-
 .../net/ethernet/mellanox/mlx5/core/eswitch_offloads.c| 12 +++-
 4 files changed, 22 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 6b492ca..37c0d84 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -3796,7 +3796,7 @@ static void mlx5e_nic_enable(struct mlx5e_priv *priv)
rep.load = mlx5e_nic_rep_load;
rep.unload = mlx5e_nic_rep_unload;
rep.vport = FDB_UPLINK_VPORT;
-   rep.priv_data = priv;
+   rep.netdev = netdev;
mlx5_eswitch_register_vport_rep(esw, 0, );
}
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index 9b1e351..0868677 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -208,7 +208,8 @@ int mlx5e_add_sqs_fwd_rules(struct mlx5e_priv *priv)
 
 int mlx5e_nic_rep_load(struct mlx5_eswitch *esw, struct mlx5_eswitch_rep *rep)
 {
-   struct mlx5e_priv *priv = rep->priv_data;
+   struct net_device *netdev = rep->netdev;
+   struct mlx5e_priv *priv = netdev_priv(netdev);
 
if (test_bit(MLX5E_STATE_OPENED, >state))
return mlx5e_add_sqs_fwd_rules(priv);
@@ -226,7 +227,8 @@ void mlx5e_remove_sqs_fwd_rules(struct mlx5e_priv *priv)
 void mlx5e_nic_rep_unload(struct mlx5_eswitch *esw,
  struct mlx5_eswitch_rep *rep)
 {
-   struct mlx5e_priv *priv = rep->priv_data;
+   struct net_device *netdev = rep->netdev;
+   struct mlx5e_priv *priv = netdev_priv(netdev);
 
if (test_bit(MLX5E_STATE_OPENED, >state))
mlx5e_remove_sqs_fwd_rules(priv);
@@ -555,7 +557,7 @@ int mlx5e_vport_rep_load(struct mlx5_eswitch *esw,
return -EINVAL;
}
 
-   rep->priv_data = netdev_priv(netdev);
+   rep->netdev = netdev;
 
err = mlx5e_attach_netdev(esw->dev, netdev);
if (err) {
@@ -577,7 +579,7 @@ int mlx5e_vport_rep_load(struct mlx5_eswitch *esw,
mlx5e_detach_netdev(esw->dev, netdev);
 
 err_destroy_netdev:
-   mlx5e_destroy_netdev(esw->dev, rep->priv_data);
+   mlx5e_destroy_netdev(esw->dev, netdev_priv(netdev));
 
return err;
 
@@ -586,10 +588,9 @@ int mlx5e_vport_rep_load(struct mlx5_eswitch *esw,
 void mlx5e_vport_rep_unload(struct mlx5_eswitch *esw,
struct mlx5_eswitch_rep *rep)
 {
-   struct mlx5e_priv *priv = rep->priv_data;
-   struct net_device *netdev = priv->netdev;
+   struct net_device *netdev = rep->netdev;
 
unregister_netdev(netdev);
mlx5e_detach_netdev(esw->dev, netdev);
-   mlx5e_destroy_netdev(esw->dev, priv);
+   mlx5e_destroy_netdev(esw->dev, netdev_priv(netdev));
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h 
b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
index cf1aa56..8661dd3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
@@ -186,7 +186,7 @@ struct mlx5_eswitch_rep {
 struct mlx5_eswitch_rep *rep);
u16vport;
u8 hw_id[ETH_ALEN];
-   void  *priv_data;
+   struct net_device  *netdev;
 
struct mlx5_flow_handle *vport_rx_rule;
struct list_head   vport_sqs_list;
@@ -318,6 +318,7 @@ void mlx5_eswitch_register_vport_rep(struct mlx5_eswitch 
*esw,
 struct mlx5_eswitch_rep *rep);
 void mlx5_eswitch_unregister_vport_rep(struct mlx5_eswitch *esw,
   int vport_index);
+struct net_device *mlx5_eswitch_get_uplink_netdev(struct mlx5_eswitch *esw);
 
 int mlx5_eswitch_add_vlan_action(struct mlx5_eswitch *esw,
 struct mlx5_esw_flow_attr *attr);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c 
b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index 5c01550..466e161 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -970,7 +970,7 @@ void mlx5_eswitch_register_vport_r

[PATCH net-next 4/8] net/sched: act_mirred: Add new tc_action_ops get_dev()

2016-11-30 Thread Hadar Hen Zion
Adding support to a new tc_action_ops.
get_dev is a general option which allows to get the underline
device when trying to offload a tc rule.

In case of mirred action the returned device is the mirred (egress)
device.

Signed-off-by: Hadar Hen Zion <had...@mellanox.com>
Reviewed-by: Jiri Pirko <j...@mellanox.com>
---
 include/net/act_api.h  |  2 ++
 net/sched/act_mirred.c | 12 
 2 files changed, 14 insertions(+)

diff --git a/include/net/act_api.h b/include/net/act_api.h
index d8eae87..9dddf77 100644
--- a/include/net/act_api.h
+++ b/include/net/act_api.h
@@ -119,6 +119,8 @@ struct tc_action_ops {
int (*walk)(struct net *, struct sk_buff *,
struct netlink_callback *, int, const struct 
tc_action_ops *);
void(*stats_update)(struct tc_action *, u64, u32, u64);
+   int (*get_dev)(const struct tc_action *a, struct net *net,
+  struct net_device **mirred_dev);
 };
 
 struct tc_action_net {
diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
index 1af7baa..bb09ba3 100644
--- a/net/sched/act_mirred.c
+++ b/net/sched/act_mirred.c
@@ -315,6 +315,17 @@ static int mirred_device_event(struct notifier_block 
*unused,
.notifier_call = mirred_device_event,
 };
 
+static int tcf_mirred_device(const struct tc_action *a, struct net *net,
+struct net_device **mirred_dev)
+{
+   int ifindex = tcf_mirred_ifindex(a);
+
+   *mirred_dev = __dev_get_by_index(net, ifindex);
+   if (!mirred_dev)
+   return -EINVAL;
+   return 0;
+}
+
 static struct tc_action_ops act_mirred_ops = {
.kind   =   "mirred",
.type   =   TCA_ACT_MIRRED,
@@ -327,6 +338,7 @@ static int mirred_device_event(struct notifier_block 
*unused,
.walk   =   tcf_mirred_walker,
.lookup =   tcf_mirred_search,
.size   =   sizeof(struct tcf_mirred),
+   .get_dev=   tcf_mirred_device,
 };
 
 static __net_init int mirred_init_net(struct net *net)
-- 
1.8.3.1



[PATCH net-next 3/8] net/sched: cls_flower: Provide a filter to replace/destroy hardware filter functions

2016-11-30 Thread Hadar Hen Zion
Instead of providing many arguments to fl_hw_{replace/destroy}_filter
functions, just provide cls_fl_filter struct that includes all the relevant
args.

This patches doesn't add any new functionality.

Signed-off-by: Hadar Hen Zion <had...@mellanox.com>
Acked-by: Jiri Pirko <j...@mellanox.com>
---
 net/sched/cls_flower.c | 27 +++
 1 file changed, 11 insertions(+), 16 deletions(-)

diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index 5e70f65..13b349f 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -201,7 +201,7 @@ static void fl_destroy_filter(struct rcu_head *head)
kfree(f);
 }
 
-static void fl_hw_destroy_filter(struct tcf_proto *tp, unsigned long cookie)
+static void fl_hw_destroy_filter(struct tcf_proto *tp, struct cls_fl_filter *f)
 {
struct net_device *dev = tp->q->dev_queue->dev;
struct tc_cls_flower_offload offload = {0};
@@ -211,7 +211,7 @@ static void fl_hw_destroy_filter(struct tcf_proto *tp, 
unsigned long cookie)
return;
 
offload.command = TC_CLSFLOWER_DESTROY;
-   offload.cookie = cookie;
+   offload.cookie = (unsigned long)f;
 
tc.type = TC_SETUP_CLSFLOWER;
tc.cls_flower = 
@@ -222,9 +222,7 @@ static void fl_hw_destroy_filter(struct tcf_proto *tp, 
unsigned long cookie)
 static int fl_hw_replace_filter(struct tcf_proto *tp,
struct flow_dissector *dissector,
struct fl_flow_key *mask,
-   struct fl_flow_key *key,
-   struct tcf_exts *actions,
-   unsigned long cookie, u32 flags)
+   struct cls_fl_filter *f)
 {
struct net_device *dev = tp->q->dev_queue->dev;
struct tc_cls_flower_offload offload = {0};
@@ -232,14 +230,14 @@ static int fl_hw_replace_filter(struct tcf_proto *tp,
int err;
 
if (!tc_can_offload(dev, tp))
-   return tc_skip_sw(flags) ? -EINVAL : 0;
+   return tc_skip_sw(f->flags) ? -EINVAL : 0;
 
offload.command = TC_CLSFLOWER_REPLACE;
-   offload.cookie = cookie;
+   offload.cookie = (unsigned long)f;
offload.dissector = dissector;
offload.mask = mask;
-   offload.key = key;
-   offload.exts = actions;
+   offload.key = >key;
+   offload.exts = >exts;
 
tc.type = TC_SETUP_CLSFLOWER;
tc.cls_flower = 
@@ -247,7 +245,7 @@ static int fl_hw_replace_filter(struct tcf_proto *tp,
err = dev->netdev_ops->ndo_setup_tc(dev, tp->q->handle, tp->protocol,
);
 
-   if (tc_skip_sw(flags))
+   if (tc_skip_sw(f->flags))
return err;
 
return 0;
@@ -276,7 +274,7 @@ static void __fl_delete(struct tcf_proto *tp, struct 
cls_fl_filter *f)
 {
list_del_rcu(>list);
if (!tc_skip_hw(f->flags))
-   fl_hw_destroy_filter(tp, (unsigned long)f);
+   fl_hw_destroy_filter(tp, f);
tcf_unbind_filter(tp, >res);
call_rcu(>rcu, fl_destroy_filter);
 }
@@ -748,10 +746,7 @@ static int fl_change(struct net *net, struct sk_buff 
*in_skb,
err = fl_hw_replace_filter(tp,
   >dissector,
   ,
-  >key,
-  >exts,
-  (unsigned long)fnew,
-  fnew->flags);
+  fnew);
if (err)
goto errout;
}
@@ -760,7 +755,7 @@ static int fl_change(struct net *net, struct sk_buff 
*in_skb,
rhashtable_remove_fast(>ht, >ht_node,
   head->ht_params);
if (!tc_skip_hw(fold->flags))
-   fl_hw_destroy_filter(tp, (unsigned long)fold);
+   fl_hw_destroy_filter(tp, fold);
}
 
*arg = (unsigned long) fnew;
-- 
1.8.3.1



[PATCH net-next 0/8] Offloading tc rules using underline Hardware device

2016-11-30 Thread Hadar Hen Zion
This series adds flower classifier support in offloading tc rules when the
Software ingress device is different from the Hardware ingress device, 
such as when dealing with IP tunnels  

The first two patches are a small fixes to flower, checking the skip_hw flag
wasn't set before calling the Hardware offloading functions which will try to
offload the rule.

The next two patches are infrastructure patches, a preparation for the fourth
patch which is adding support in flower to offload rules when the ingress
device is not a Hardware device and therefore can't offload.
In this case ndo_setup_tc is called with the mirred (egress) device.

The last three patchs are adding mlx5e support to offload rules using the new
"egress_device" flag.

Thanks,
Hadar

Hadar Hen Zion (8):
  net/sched: Add separate check for skip_hw flag
  net/sched: cls_flower: Try to offload only if skip_hw flag isn't set
  net/sched: cls_flower: Provide a filter to replace/destroy hardware
filter functions
  net/sched: act_mirred: Add new tc_action_ops get_dev()
  net/sched: cls_flower: Add offload support using egress Hardware
device
  net/mlx5e: Bring back representor's ndos that were accidentally
removed
  net/mlx5e: Save the represntor netdevice as part of the representor
  net/mlx5e: Support adding ingress tc rule when egress device flag is
set

 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c   | 25 +--
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.h  |  3 +-
 .../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 12 ++-
 include/linux/netdevice.h  |  1 +
 include/net/act_api.h  |  2 +
 include/net/pkt_cls.h  | 21 +-
 net/sched/act_mirred.c | 12 +++
 net/sched/cls_api.c| 22 ++
 net/sched/cls_flower.c | 87 --
 10 files changed, 133 insertions(+), 54 deletions(-)

-- 
1.8.3.1



Re: [PATCH net-next V2] net/sched: pkt_cls: change tc actions order to be as the user sets

2016-09-27 Thread Hadar Hen Zion
On Tue, Sep 27, 2016 at 11:09 AM, Hadar Hen Zion <had...@mellanox.com> wrote:
> Currently the created tc actions list is reversed against the order
> set by the user.
> Change the actions list order to be the same as was set by the user.
>
> This patch doesn't affect dump actions behavior.
> For dumping, action->order parameter is used so the list order doesn't
> matter.
>
> Signed-off-by: Hadar Hen Zion <had...@mellanox.com>
> Acked-by: Jamal Hadi Salim <j...@mojatatu.com>


Changes from V1:
- Add a comment to the change log


> ---
>  include/net/pkt_cls.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
> index 5ccaa4b..767b03a 100644
> --- a/include/net/pkt_cls.h
> +++ b/include/net/pkt_cls.h
> @@ -123,7 +123,7 @@ static inline void tcf_exts_to_list(const struct tcf_exts 
> *exts,
> for (i = 0; i < exts->nr_actions; i++) {
> struct tc_action *a = exts->actions[i];
>
> -   list_add(>list, actions);
> +   list_add_tail(>list, actions);
> }
>  #endif
>  }
> --
> 1.8.3.1
>


[PATCH net-next] net/sched: cls_flower: Use a proper mask value for enc key id parameter

2016-09-27 Thread Hadar Hen Zion
The current code use the encapsulation key id value as the mask of that
parameter which is wrong. Fix that by using a full mask.

Fixes: bc3103f1ed40 ('net/sched: cls_flower: Classify packet in ip tunnels')
Signed-off-by: Hadar Hen Zion <had...@mellanox.com>
---
 net/sched/cls_flower.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index 2af09c8..f6f40fb 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -481,7 +481,7 @@ static int fl_set_key(struct net *net, struct nlattr **tb,
}
 
fl_set_key_val(tb, >enc_key_id.keyid, TCA_FLOWER_KEY_ENC_KEY_ID,
-  >enc_key_id.keyid, TCA_FLOWER_KEY_ENC_KEY_ID,
+  >enc_key_id.keyid, TCA_FLOWER_UNSPEC,
   sizeof(key->enc_key_id.keyid));
 
return 0;
@@ -919,7 +919,7 @@ static int fl_dump(struct net *net, struct tcf_proto *tp, 
unsigned long fh,
goto nla_put_failure;
 
if (fl_dump_key_val(skb, >enc_key_id, TCA_FLOWER_KEY_ENC_KEY_ID,
-   >enc_key_id, TCA_FLOWER_KEY_ENC_KEY_ID,
+   >enc_key_id, TCA_FLOWER_UNSPEC,
sizeof(key->enc_key_id)))
goto nla_put_failure;
 
-- 
1.8.3.1



[PATCH net-next V2] net/sched: pkt_cls: change tc actions order to be as the user sets

2016-09-27 Thread Hadar Hen Zion
Currently the created tc actions list is reversed against the order
set by the user.
Change the actions list order to be the same as was set by the user.

This patch doesn't affect dump actions behavior.
For dumping, action->order parameter is used so the list order doesn't
matter.

Signed-off-by: Hadar Hen Zion <had...@mellanox.com>
Acked-by: Jamal Hadi Salim <j...@mojatatu.com>
---
 include/net/pkt_cls.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 5ccaa4b..767b03a 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -123,7 +123,7 @@ static inline void tcf_exts_to_list(const struct tcf_exts 
*exts,
for (i = 0; i < exts->nr_actions; i++) {
struct tc_action *a = exts->actions[i];
 
-   list_add(>list, actions);
+   list_add_tail(>list, actions);
}
 #endif
 }
-- 
1.8.3.1



Re: [PATCH net-next] net/sched: pkt_cls: change tc actions order to be as the user sets

2016-09-27 Thread Hadar Hen Zion
On Mon, Sep 26, 2016 at 11:34 PM, Cong Wang <xiyou.wangc...@gmail.com> wrote:
> On Sun, Sep 25, 2016 at 11:02 PM, Hadar Hen Zion
> <had...@dev.mellanox.co.il> wrote:
>> On Mon, Sep 26, 2016 at 7:31 AM, Cong Wang <xiyou.wangc...@gmail.com> wrote:
>>> On Sun, Sep 25, 2016 at 7:39 AM, Jamal Hadi Salim <j...@mojatatu.com> wrote:
>>>> On 16-09-25 10:08 AM, Hadar Hen Zion wrote:
>>>>>
>>>>> Currently the created tc actions list is reversed against the order
>>>>> set by the user.
>>>>> Change the actions list order to be the same as was set by the user.
>>>>>
>>>>
>>>>
>>>> Did something break? It seems to matter most for dumping. But even that
>>>> didnt breaking. Looking at the latest net tree, i tried:
>>>>
>>>
>>> The reason is we use action->order as an nested attribute, so
>>> the order in the list doesn't matter, only action->order itself matters.
>>
>> The order in the list matters for offload drivers who use the
>> "tcf_exts_to_list" function and action->order parameter isn't usable
>> for them.
>> Why not keeping the actions in the same order as the user? isn't it
>> more elegant?
>
> I don't object this patch since it affects offloading, I just explained
> why it doesn't affect dumping.
>
> Please add this to your changelog, to make it obvious.

Sure, I'll add it.

Hadar

>
> Thanks!


Re: [PATCH net-next] net/sched: pkt_cls: change tc actions order to be as the user sets

2016-09-26 Thread Hadar Hen Zion
On Mon, Sep 26, 2016 at 7:31 AM, Cong Wang <xiyou.wangc...@gmail.com> wrote:
> On Sun, Sep 25, 2016 at 7:39 AM, Jamal Hadi Salim <j...@mojatatu.com> wrote:
>> On 16-09-25 10:08 AM, Hadar Hen Zion wrote:
>>>
>>> Currently the created tc actions list is reversed against the order
>>> set by the user.
>>> Change the actions list order to be the same as was set by the user.
>>>
>>
>>
>> Did something break? It seems to matter most for dumping. But even that
>> didnt breaking. Looking at the latest net tree, i tried:
>>
>
> The reason is we use action->order as an nested attribute, so
> the order in the list doesn't matter, only action->order itself matters.

The order in the list matters for offload drivers who use the
"tcf_exts_to_list" function and action->order parameter isn't usable
for them.
Why not keeping the actions in the same order as the user? isn't it
more elegant?

Hadar


>
> See tcf_action_dump():
>
>nest = nla_nest_start(skb, a->order);


[PATCH net-next] net/sched: pkt_cls: change tc actions order to be as the user sets

2016-09-25 Thread Hadar Hen Zion
Currently the created tc actions list is reversed against the order
set by the user.
Change the actions list order to be the same as was set by the user.

Signed-off-by: Hadar Hen Zion <had...@mellanox.com>
---
 include/net/pkt_cls.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 5ccaa4b..767b03a 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -123,7 +123,7 @@ static inline void tcf_exts_to_list(const struct tcf_exts 
*exts,
for (i = 0; i < exts->nr_actions; i++) {
struct tc_action *a = exts->actions[i];
 
-   list_add(>list, actions);
+   list_add_tail(>list, actions);
}
 #endif
 }
-- 
1.8.3.1



[PATCH net-next] net/sched: act_tunnel_key: Remove rcu_read_lock protection

2016-09-12 Thread Hadar Hen Zion
Remove rcu_read_lock protection from tunnel_key_dump and use
rtnl_dereference, dump operation is protected by  rtnl lock.

Also, remove rcu_read_lock from tunnel_key_release and use
rcu_dereference_protected.

Both operations are running exclusively and a writer couldn't modify
t->params while those functions are executed.

Fixes: 54d94fd89d90 ('net/sched: Introduce act_tunnel_key')
Signed-off-by: Hadar Hen Zion <had...@mellanox.com>
---
 net/sched/act_tunnel_key.c | 17 -
 1 file changed, 4 insertions(+), 13 deletions(-)

diff --git a/net/sched/act_tunnel_key.c b/net/sched/act_tunnel_key.c
index dceff74..af47bdf 100644
--- a/net/sched/act_tunnel_key.c
+++ b/net/sched/act_tunnel_key.c
@@ -194,15 +194,12 @@ static void tunnel_key_release(struct tc_action *a, int 
bind)
struct tcf_tunnel_key *t = to_tunnel_key(a);
struct tcf_tunnel_key_params *params;
 
-   rcu_read_lock();
-   params = rcu_dereference(t->params);
+   params = rcu_dereference_protected(t->params, 1);
 
if (params->tcft_action == TCA_TUNNEL_KEY_ACT_SET)
dst_release(>tcft_enc_metadata->dst);
 
kfree_rcu(params, rcu);
-
-   rcu_read_unlock();
 }
 
 static int tunnel_key_dump_addresses(struct sk_buff *skb,
@@ -245,10 +242,8 @@ static int tunnel_key_dump(struct sk_buff *skb, struct 
tc_action *a,
.bindcnt  = t->tcf_bindcnt - bind,
};
struct tcf_t tm;
-   int ret = -1;
 
-   rcu_read_lock();
-   params = rcu_dereference(t->params);
+   params = rtnl_dereference(t->params);
 
opt.t_action = params->tcft_action;
opt.action = params->action;
@@ -272,15 +267,11 @@ static int tunnel_key_dump(struct sk_buff *skb, struct 
tc_action *a,
  , TCA_TUNNEL_KEY_PAD))
goto nla_put_failure;
 
-   ret = skb->len;
-   goto out;
+   return skb->len;
 
 nla_put_failure:
nlmsg_trim(skb, b);
-out:
-   rcu_read_unlock();
-
-   return ret;
+   return -1;
 }
 
 static int tunnel_key_walker(struct net *net, struct sk_buff *skb,
-- 
1.8.3.1



[PATCH net-next V7 3/4] net/sched: cls_flower: Classify packet in ip tunnels

2016-09-08 Thread Hadar Hen Zion
From: Amir Vadai <a...@vadai.me>

Introduce classifying by metadata extracted by the tunnel device.
Outer header fields - source/dest ip and tunnel id, are extracted from
the metadata when classifying.

For example, the following will add a filter on the ingress Qdisc of shared
vxlan device named 'vxlan0'. To forward packets with outer src ip
11.11.0.2, dst ip 11.11.0.1 and tunnel id 11. The packets will be
forwarded to tap device 'vnet0' (after metadata is released):

$ tc filter add dev vxlan0 protocol ip parent : \
flower \
  enc_src_ip 11.11.0.2 \
  enc_dst_ip 11.11.0.1 \
  enc_key_id 11 \
  dst_ip 11.11.11.1 \
action tunnel_key release \
action mirred egress redirect dev vnet0

The action tunnel_key, will be introduced in the next patch in this
series.

Signed-off-by: Amir Vadai <a...@vadai.me>
Signed-off-by: Hadar Hen Zion <had...@mellanox.com>
Acked-by: Jiri Pirko <j...@mellanox.com>
---
 include/uapi/linux/pkt_cls.h |  11 +
 net/sched/cls_flower.c   | 100 ++-
 2 files changed, 110 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index 51b5b24..f9c287c 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -431,6 +431,17 @@ enum {
TCA_FLOWER_KEY_VLAN_ID,
TCA_FLOWER_KEY_VLAN_PRIO,
TCA_FLOWER_KEY_VLAN_ETH_TYPE,
+
+   TCA_FLOWER_KEY_ENC_KEY_ID,  /* be32 */
+   TCA_FLOWER_KEY_ENC_IPV4_SRC,/* be32 */
+   TCA_FLOWER_KEY_ENC_IPV4_SRC_MASK,/* be32 */
+   TCA_FLOWER_KEY_ENC_IPV4_DST,/* be32 */
+   TCA_FLOWER_KEY_ENC_IPV4_DST_MASK,/* be32 */
+   TCA_FLOWER_KEY_ENC_IPV6_SRC,/* struct in6_addr */
+   TCA_FLOWER_KEY_ENC_IPV6_SRC_MASK,/* struct in6_addr */
+   TCA_FLOWER_KEY_ENC_IPV6_DST,/* struct in6_addr */
+   TCA_FLOWER_KEY_ENC_IPV6_DST_MASK,/* struct in6_addr */
+
__TCA_FLOWER_MAX,
 };
 
diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index cf9ad5b..b084b2a 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -23,9 +23,13 @@
 #include 
 #include 
 
+#include 
+#include 
+
 struct fl_flow_key {
int indev_ifindex;
struct flow_dissector_key_control control;
+   struct flow_dissector_key_control enc_control;
struct flow_dissector_key_basic basic;
struct flow_dissector_key_eth_addrs eth;
struct flow_dissector_key_vlan vlan;
@@ -35,6 +39,11 @@ struct fl_flow_key {
struct flow_dissector_key_ipv6_addrs ipv6;
};
struct flow_dissector_key_ports tp;
+   struct flow_dissector_key_keyid enc_key_id;
+   union {
+   struct flow_dissector_key_ipv4_addrs enc_ipv4;
+   struct flow_dissector_key_ipv6_addrs enc_ipv6;
+   };
 } __aligned(BITS_PER_LONG / 8); /* Ensure that we can do comparisons as longs. 
*/
 
 struct fl_flow_mask_range {
@@ -124,11 +133,31 @@ static int fl_classify(struct sk_buff *skb, const struct 
tcf_proto *tp,
struct cls_fl_filter *f;
struct fl_flow_key skb_key;
struct fl_flow_key skb_mkey;
+   struct ip_tunnel_info *info;
 
if (!atomic_read(>ht.nelems))
return -1;
 
fl_clear_masked_range(_key, >mask);
+
+   info = skb_tunnel_info(skb);
+   if (info) {
+   struct ip_tunnel_key *key = >key;
+
+   switch (ip_tunnel_info_af(info)) {
+   case AF_INET:
+   skb_key.enc_ipv4.src = key->u.ipv4.src;
+   skb_key.enc_ipv4.dst = key->u.ipv4.dst;
+   break;
+   case AF_INET6:
+   skb_key.enc_ipv6.src = key->u.ipv6.src;
+   skb_key.enc_ipv6.dst = key->u.ipv6.dst;
+   break;
+   }
+
+   skb_key.enc_key_id.keyid = tunnel_id_to_key32(key->tun_id);
+   }
+
skb_key.indev_ifindex = skb->skb_iif;
/* skb_flow_dissect() does not set n_proto in case an unknown protocol,
 * so do it rather here.
@@ -297,7 +326,15 @@ static const struct nla_policy fl_policy[TCA_FLOWER_MAX + 
1] = {
[TCA_FLOWER_KEY_VLAN_ID]= { .type = NLA_U16 },
[TCA_FLOWER_KEY_VLAN_PRIO]  = { .type = NLA_U8 },
[TCA_FLOWER_KEY_VLAN_ETH_TYPE]  = { .type = NLA_U16 },
-
+   [TCA_FLOWER_KEY_ENC_KEY_ID] = { .type = NLA_U32 },
+   [TCA_FLOWER_KEY_ENC_IPV4_SRC]   = { .type = NLA_U32 },
+   [TCA_FLOWER_KEY_ENC_IPV4_SRC_MASK] = { .type = NLA_U32 },
+   [TCA_FLOWER_KEY_ENC_IPV4_DST]   = { .type = NLA_U32 },
+   [TCA_FLOWER_KEY_ENC_IPV4_DST_MASK] = { .type = NLA_U32 },
+   [TCA_FLOWER_KEY_ENC_IPV6_SRC]   = { .len = sizeof(struct in6_addr) },
+   [TCA_FLOWER_KEY_ENC_IPV6_SRC_MASK] = { .len = sizeof(struct in6_addr) },
+   [TCA_FLOWER_KEY_ENC_IPV6_DST]   = { .len = sizeof(struct in6_addr) },

Re: [PATCH net-next V6 4/4] net/sched: Introduce act_tunnel_key

2016-09-08 Thread Hadar Hen Zion
On Wed, Sep 7, 2016 at 7:27 PM, Cong Wang <xiyou.wangc...@gmail.com> wrote:
> On Wed, Sep 7, 2016 at 1:08 AM, Hadar Hen Zion <had...@mellanox.com> wrote:
>> +struct tcf_tunnel_key_params {
>> +   struct rcu_head rcu;
>> +   int tcft_action;
>> +   int action;
>> +   struct metadata_dst *tcft_enc_metadata;
>> +};
>> +
>> +struct tcf_tunnel_key {
>> +   struct tc_action  common;
>> +   struct tcf_tunnel_key_params __rcu *params;
>> +};
>> +
> ...
>
> This is unnecessary if we make the tc action API aware of RCU.
>
>> +
>> +static void tunnel_key_release(struct tc_action *a, int bind)
>> +{
>> +   struct tcf_tunnel_key *t = to_tunnel_key(a);
>> +   struct tcf_tunnel_key_params *params;
>> +
>> +   rcu_read_lock();
>> +   params = rcu_dereference(t->params);
>> +
>> +   if (params->tcft_action == TCA_TUNNEL_KEY_ACT_SET)
>> +   dst_release(>tcft_enc_metadata->dst);
>> +
>> +   rcu_read_unlock();
>> +
>
> So you allocate memory for t->params in ->init() but not
> release it here?

Right, I'll fix it in the next version.

>
> Also, ->cleanup() should be called with RTNL, no need to
> take read lock here.

RTNL lock isn't taken when cleanup is called.

>
> BTW, again you do NOT need to make it RCU, the whole
> tc action API should be, as my patchset does, I will take care
> of this as a part of my patchset. Eric is wasting your time on
> this, with no benefits, the code will be replaced soon.


[PATCH net-next V7 1/4] net/ip_tunnels: Introduce tunnel_id_to_key32() and key32_to_tunnel_id()

2016-09-08 Thread Hadar Hen Zion
From: Amir Vadai <a...@vadai.me>

Add utility functions to convert a 32 bits key into a 64 bits tunnel and
vice versa.
These functions will be used instead of cloning code in GRE and VXLAN,
and in tc act_iptunnel which will be introduced in a following patch in
this patchset.

Signed-off-by: Amir Vadai <a...@vadai.me>
Signed-off-by: Hadar Hen Zion <had...@mellanox.com>
Reviewed-by: Shmulik Ladkani <shmulik.ladk...@gmail.com>
Acked-by: Jiri Benc <jb...@redhat.com>
Acked-by: Jiri Pirko <j...@mellanox.com>
---
 drivers/net/vxlan.c  |  4 ++--
 include/net/ip_tunnels.h | 19 +++
 include/net/vxlan.h  | 18 --
 net/ipv4/ip_gre.c| 23 ++-
 4 files changed, 23 insertions(+), 41 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 199dec0..4bfeb97 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -1291,7 +1291,7 @@ static int vxlan_rcv(struct sock *sk, struct sk_buff *skb)
struct metadata_dst *tun_dst;
 
tun_dst = udp_tun_rx_dst(skb, vxlan_get_sk_family(vs), 
TUNNEL_KEY,
-vxlan_vni_to_tun_id(vni), sizeof(*md));
+key32_to_tunnel_id(vni), sizeof(*md));
 
if (!tun_dst)
goto drop;
@@ -1945,7 +1945,7 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct 
net_device *dev,
goto drop;
}
dst_port = info->key.tp_dst ? : vxlan->cfg.dst_port;
-   vni = vxlan_tun_id_to_vni(info->key.tun_id);
+   vni = tunnel_id_to_key32(info->key.tun_id);
remote_ip.sa.sa_family = ip_tunnel_info_af(info);
if (remote_ip.sa.sa_family == AF_INET) {
remote_ip.sin.sin_addr.s_addr = info->key.u.ipv4.dst;
diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
index a5e7035..e598c63 100644
--- a/include/net/ip_tunnels.h
+++ b/include/net/ip_tunnels.h
@@ -222,6 +222,25 @@ static inline unsigned short ip_tunnel_info_af(const 
struct ip_tunnel_info
return tun_info->mode & IP_TUNNEL_INFO_IPV6 ? AF_INET6 : AF_INET;
 }
 
+static inline __be64 key32_to_tunnel_id(__be32 key)
+{
+#ifdef __BIG_ENDIAN
+   return (__force __be64)key;
+#else
+   return (__force __be64)((__force u64)key << 32);
+#endif
+}
+
+/* Returns the least-significant 32 bits of a __be64. */
+static inline __be32 tunnel_id_to_key32(__be64 tun_id)
+{
+#ifdef __BIG_ENDIAN
+   return (__force __be32)tun_id;
+#else
+   return (__force __be32)((__force u64)tun_id >> 32);
+#endif
+}
+
 #ifdef CONFIG_INET
 
 int ip_tunnel_init(struct net_device *dev);
diff --git a/include/net/vxlan.h b/include/net/vxlan.h
index b96d036..0255613 100644
--- a/include/net/vxlan.h
+++ b/include/net/vxlan.h
@@ -350,24 +350,6 @@ static inline __be32 vxlan_vni_field(__be32 vni)
 #endif
 }
 
-static inline __be32 vxlan_tun_id_to_vni(__be64 tun_id)
-{
-#if defined(__BIG_ENDIAN)
-   return (__force __be32)tun_id;
-#else
-   return (__force __be32)((__force u64)tun_id >> 32);
-#endif
-}
-
-static inline __be64 vxlan_vni_to_tun_id(__be32 vni)
-{
-#if defined(__BIG_ENDIAN)
-   return (__force __be64)vni;
-#else
-   return (__force __be64)((u64)(__force u32)vni << 32);
-#endif
-}
-
 static inline size_t vxlan_rco_start(__be32 vni_field)
 {
return be32_to_cpu(vni_field & VXLAN_RCO_MASK) << VXLAN_RCO_SHIFT;
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index 113cc43..576f705 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -246,25 +246,6 @@ static void gre_err(struct sk_buff *skb, u32 info)
ipgre_err(skb, info, );
 }
 
-static __be64 key_to_tunnel_id(__be32 key)
-{
-#ifdef __BIG_ENDIAN
-   return (__force __be64)((__force u32)key);
-#else
-   return (__force __be64)((__force u64)key << 32);
-#endif
-}
-
-/* Returns the least-significant 32 bits of a __be64. */
-static __be32 tunnel_id_to_key(__be64 x)
-{
-#ifdef __BIG_ENDIAN
-   return (__force __be32)x;
-#else
-   return (__force __be32)((__force u64)x >> 32);
-#endif
-}
-
 static int __ipgre_rcv(struct sk_buff *skb, const struct tnl_ptk_info *tpi,
   struct ip_tunnel_net *itn, int hdr_len, bool raw_proto)
 {
@@ -290,7 +271,7 @@ static int __ipgre_rcv(struct sk_buff *skb, const struct 
tnl_ptk_info *tpi,
__be64 tun_id;
 
flags = tpi->flags & (TUNNEL_CSUM | TUNNEL_KEY);
-   tun_id = key_to_tunnel_id(tpi->key);
+   tun_id = key32_to_tunnel_id(tpi->key);
tun_dst = ip_tun_rx_dst(skb, flags, tun_id, 0);
if (!tun_dst)
return PACKET_REJECT;
@@ -446,7 +427,7 @@ static void gre_fb_xmit(struct sk_buff *skb, stru

[PATCH net-next V7 2/4] net/dst: Utility functions to build dst_metadata without supplying an skb

2016-09-08 Thread Hadar Hen Zion
From: Amir Vadai <a...@vadai.me>

Extract __ip_tun_set_dst() and __ipv6_tun_set_dst() out of
ip_tun_rx_dst() and ipv6_tun_rx_dst(), to be used without supplying an
skb.

Signed-off-by: Amir Vadai <a...@vadai.me>
Signed-off-by: Hadar Hen Zion <had...@mellanox.com>
Acked-by: Jiri Pirko <j...@mellanox.com>
Reviewed-by: Shmulik Ladkani <shmulik.ladk...@gmail.com>
---
 include/net/dst_metadata.h | 52 ++
 1 file changed, 39 insertions(+), 13 deletions(-)

diff --git a/include/net/dst_metadata.h b/include/net/dst_metadata.h
index 5db9f59..6965c8f 100644
--- a/include/net/dst_metadata.h
+++ b/include/net/dst_metadata.h
@@ -112,12 +112,13 @@ static inline struct ip_tunnel_info 
*skb_tunnel_info_unclone(struct sk_buff *skb
return >u.tun_info;
 }
 
-static inline struct metadata_dst *ip_tun_rx_dst(struct sk_buff *skb,
-__be16 flags,
-__be64 tunnel_id,
-int md_size)
+static inline struct metadata_dst *__ip_tun_set_dst(__be32 saddr,
+   __be32 daddr,
+   __u8 tos, __u8 ttl,
+   __be16 flags,
+   __be64 tunnel_id,
+   int md_size)
 {
-   const struct iphdr *iph = ip_hdr(skb);
struct metadata_dst *tun_dst;
 
tun_dst = tun_rx_dst(md_size);
@@ -125,17 +126,30 @@ static inline struct metadata_dst *ip_tun_rx_dst(struct 
sk_buff *skb,
return NULL;
 
ip_tunnel_key_init(_dst->u.tun_info.key,
-  iph->saddr, iph->daddr, iph->tos, iph->ttl,
+  saddr, daddr, tos, ttl,
   0, 0, 0, tunnel_id, flags);
return tun_dst;
 }
 
-static inline struct metadata_dst *ipv6_tun_rx_dst(struct sk_buff *skb,
+static inline struct metadata_dst *ip_tun_rx_dst(struct sk_buff *skb,
 __be16 flags,
 __be64 tunnel_id,
 int md_size)
 {
-   const struct ipv6hdr *ip6h = ipv6_hdr(skb);
+   const struct iphdr *iph = ip_hdr(skb);
+
+   return __ip_tun_set_dst(iph->saddr, iph->daddr, iph->tos, iph->ttl,
+   flags, tunnel_id, md_size);
+}
+
+static inline struct metadata_dst *__ipv6_tun_set_dst(const struct in6_addr 
*saddr,
+ const struct in6_addr 
*daddr,
+ __u8 tos, __u8 ttl,
+ __be32 label,
+ __be16 flags,
+ __be64 tunnel_id,
+ int md_size)
+{
struct metadata_dst *tun_dst;
struct ip_tunnel_info *info;
 
@@ -150,14 +164,26 @@ static inline struct metadata_dst *ipv6_tun_rx_dst(struct 
sk_buff *skb,
info->key.tp_src = 0;
info->key.tp_dst = 0;
 
-   info->key.u.ipv6.src = ip6h->saddr;
-   info->key.u.ipv6.dst = ip6h->daddr;
+   info->key.u.ipv6.src = *saddr;
+   info->key.u.ipv6.dst = *daddr;
 
-   info->key.tos = ipv6_get_dsfield(ip6h);
-   info->key.ttl = ip6h->hop_limit;
-   info->key.label = ip6_flowlabel(ip6h);
+   info->key.tos = tos;
+   info->key.ttl = ttl;
+   info->key.label = label;
 
return tun_dst;
 }
 
+static inline struct metadata_dst *ipv6_tun_rx_dst(struct sk_buff *skb,
+  __be16 flags,
+  __be64 tunnel_id,
+  int md_size)
+{
+   const struct ipv6hdr *ip6h = ipv6_hdr(skb);
+
+   return __ipv6_tun_set_dst(>saddr, >daddr,
+ ipv6_get_dsfield(ip6h), ip6h->hop_limit,
+ ip6_flowlabel(ip6h), flags, tunnel_id,
+ md_size);
+}
 #endif /* __NET_DST_METADATA_H */
-- 
1.8.3.1



[PATCH net-next V7 0/4] net/sched: ip tunnel metadata set/release/classify by using TC

2016-09-08 Thread Hadar Hen Zion
Hi,

This patchset introduces ip tunnel manipulation support using the TC subsystem.

In the decap flow, it enables the user to redirect packets from a shared tunnel
device and classify by outer and inner headers. The outer headers are extracted
from the metadata and used by the flower filter. A new action act_tunnel_key,
releases the metadata.

In the encap flow, act_tunnel_key creates a metadata object to be used by the
shared tunnel device. The actual redirection to the tunnel device is done using
act_mirred.

For example:
$ tc qdisc add dev vnet0 ingress
$ tc filter add dev vnet0 protocol ip parent : \
flower \
 ip_proto 1 \
action tunnel_key set \
 src_ip 11.11.0.1 \
 dst_ip 11.11.0.2 \
 id 11 \
action mirred egress redirect dev vxlan0

$ tc qdisc add dev vxlan0 ingress
$ tc filter add dev vxlan0 protocol ip parent : \
flower \
 enc_src_ip 11.11.0.2 \
 enc_dst_ip 11.11.0.1 \
 enc_key_id 11 \
action tunnel_key release \
action mirred egress redirect dev vnet0

Amir & Hadar

Changes from V6:
- Add kfree_rcu to tunnel_key_release function
- Use reverse Christmas tree order in tunnel_key_init function

Changes from V5:
- Add __rcu notation to struct tcf_tunnel_key_params in struct tcf_tunnel_key
- Fix indentation in include/net/dst_metadata.h
- Fix syntx error in commit message

Changes from V4:
- Fix tunnel_key_init function error flow.
- Add 'action' variable to struct tcf_tunnel_key_params and use it instead of
  tcf_action variable which is not protected by rcu lock.

Changes from V3:
- Use percpu stats
- No spinlock on datapatch - protecting parameters with rcu
- Fix buggy handling of set/release dst
- Use nla_get_in_addr and nla_put_in_addr
- Fix change logs
- Pass in6_addr by pointer
- Rename utility functions to start with double underscore

Changes from V2:
- Use union in struct fl_flow_key for enc_ipv6 and enc_ipv4.
- Rename functions _ip_tun_rx_dst and _ipv6_tun_rx_dst to _ip_tun_set_dst and
  _ipv6_tun_set_dst accordingly.
- Remove local parameter 'encapdecap' from tunnel_key_init function.
- Don't copy in6_addr values in tunnel_key_dump_addresses function, use 
pointers.

Changes from V1:
- More cleanups to key32_to_tunnel_id() and tunnel_id_to_key32()
- IPv6 Support added
- Set TUNNEL_KEY flag to make GRE work
- Handle zero tunnel id properly in act_tunnel_key
- Don't leave junk in decap action
- Fix bug in act_tunnel_key initialization where (exists & ocr) is true
- Remove BUG() from code
- Rename action to tunnel_key
- Improve grep-ability of code
- Reuse code from ip_tun_rx_dst() and ipv6_tun_rx_dst()

Changes from RFC:
- Add a new action instead of making mirred too complex
- No need to specify UDP port in action - it is already in the tunnel device
  configuration
- Added a decap operation to drop tunnel metadata

Amir Vadai (4):
  net/ip_tunnels: Introduce tunnel_id_to_key32() and
key32_to_tunnel_id()
  net/dst: Utility functions to build dst_metadata without supplying an
skb
  net/sched: cls_flower: Classify packet in ip tunnels
  net/sched: Introduce act_tunnel_key

 drivers/net/vxlan.c   |   4 +-
 include/net/dst_metadata.h|  52 +++--
 include/net/ip_tunnels.h  |  19 ++
 include/net/tc_act/tc_tunnel_key.h|  31 +++
 include/net/vxlan.h   |  18 --
 include/uapi/linux/pkt_cls.h  |  11 +
 include/uapi/linux/tc_act/tc_tunnel_key.h |  42 
 net/ipv4/ip_gre.c |  23 +-
 net/sched/Kconfig |  11 +
 net/sched/Makefile|   1 +
 net/sched/act_tunnel_key.c| 351 ++
 net/sched/cls_flower.c| 100 -
 12 files changed, 608 insertions(+), 55 deletions(-)
 create mode 100644 include/net/tc_act/tc_tunnel_key.h
 create mode 100644 include/uapi/linux/tc_act/tc_tunnel_key.h
 create mode 100644 net/sched/act_tunnel_key.c

-- 
1.8.3.1



[PATCH net-next V7 4/4] net/sched: Introduce act_tunnel_key

2016-09-08 Thread Hadar Hen Zion
From: Amir Vadai <a...@vadai.me>

This action could be used before redirecting packets to a shared tunnel
device, or when redirecting packets arriving from a such a device.

The action will release the metadata created by the tunnel device
(decap), or set the metadata with the specified values for encap
operation.

For example, the following flower filter will forward all ICMP packets
destined to 11.11.11.2 through the shared vxlan device 'vxlan0'. Before
redirecting, a metadata for the vxlan tunnel is created using the
tunnel_key action and it's arguments:

$ tc filter add dev net0 protocol ip parent : \
flower \
  ip_proto 1 \
  dst_ip 11.11.11.2 \
action tunnel_key set \
  src_ip 11.11.0.1 \
  dst_ip 11.11.0.2 \
  id 11 \
action mirred egress redirect dev vxlan0

Signed-off-by: Amir Vadai <a...@vadai.me>
Signed-off-by: Hadar Hen Zion <had...@mellanox.com>
Reviewed-by: Shmulik Ladkani <shmulik.ladk...@gmail.com>
Acked-by: Jamal Hadi Salim <j...@mojatatu.com>
---
 include/net/tc_act/tc_tunnel_key.h|  31 +++
 include/uapi/linux/tc_act/tc_tunnel_key.h |  42 
 net/sched/Kconfig |  11 +
 net/sched/Makefile|   1 +
 net/sched/act_tunnel_key.c| 351 ++
 5 files changed, 436 insertions(+)
 create mode 100644 include/net/tc_act/tc_tunnel_key.h
 create mode 100644 include/uapi/linux/tc_act/tc_tunnel_key.h
 create mode 100644 net/sched/act_tunnel_key.c

diff --git a/include/net/tc_act/tc_tunnel_key.h 
b/include/net/tc_act/tc_tunnel_key.h
new file mode 100644
index 000..6fd2255
--- /dev/null
+++ b/include/net/tc_act/tc_tunnel_key.h
@@ -0,0 +1,31 @@
+/*
+ * Copyright (c) 2016, Amir Vadai <a...@vadai.me>
+ * Copyright (c) 2016, Mellanox Technologies. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#ifndef __NET_TC_TUNNEL_KEY_H
+#define __NET_TC_TUNNEL_KEY_H
+
+#include 
+
+struct tcf_tunnel_key_params {
+   struct rcu_head rcu;
+   int tcft_action;
+   int action;
+   struct metadata_dst *tcft_enc_metadata;
+};
+
+struct tcf_tunnel_key {
+   struct tc_action  common;
+   struct tcf_tunnel_key_params __rcu *params;
+};
+
+#define to_tunnel_key(a) ((struct tcf_tunnel_key *)a)
+
+#endif /* __NET_TC_TUNNEL_KEY_H */
+
diff --git a/include/uapi/linux/tc_act/tc_tunnel_key.h 
b/include/uapi/linux/tc_act/tc_tunnel_key.h
new file mode 100644
index 000..f9ddf53
--- /dev/null
+++ b/include/uapi/linux/tc_act/tc_tunnel_key.h
@@ -0,0 +1,42 @@
+/*
+ * Copyright (c) 2016, Amir Vadai <a...@vadai.me>
+ * Copyright (c) 2016, Mellanox Technologies. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#ifndef __LINUX_TC_TUNNEL_KEY_H
+#define __LINUX_TC_TUNNEL_KEY_H
+
+#include 
+
+#define TCA_ACT_TUNNEL_KEY 17
+
+#define TCA_TUNNEL_KEY_ACT_SET 1
+#define TCA_TUNNEL_KEY_ACT_RELEASE  2
+
+struct tc_tunnel_key {
+   tc_gen;
+   int t_action;
+};
+
+enum {
+   TCA_TUNNEL_KEY_UNSPEC,
+   TCA_TUNNEL_KEY_TM,
+   TCA_TUNNEL_KEY_PARMS,
+   TCA_TUNNEL_KEY_ENC_IPV4_SRC,/* be32 */
+   TCA_TUNNEL_KEY_ENC_IPV4_DST,/* be32 */
+   TCA_TUNNEL_KEY_ENC_IPV6_SRC,/* struct in6_addr */
+   TCA_TUNNEL_KEY_ENC_IPV6_DST,/* struct in6_addr */
+   TCA_TUNNEL_KEY_ENC_KEY_ID,  /* be64 */
+   TCA_TUNNEL_KEY_PAD,
+   __TCA_TUNNEL_KEY_MAX,
+};
+
+#define TCA_TUNNEL_KEY_MAX (__TCA_TUNNEL_KEY_MAX - 1)
+
+#endif
+
diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index ccf931b..72e3426 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -761,6 +761,17 @@ config NET_ACT_IFE
  To compile this code as a module, choose M here: the
  module will be called act_ife.
 
+config NET_ACT_TUNNEL_KEY
+tristate "IP tunnel metadata manipulation"
+depends on NET_CLS_ACT
+---help---
+ Say Y here to set/release ip tunnel metadata.
+
+ If unsure, say N.
+
+ To compile this code as a module, choose M here: the
+ module will be called act_tunnel_key.
+
 config NET_IFE_SKBMARK
 tristate "Support to encoding decoding skb mark on IFE action"
 depends on NET_ACT_IFE
diff --git a/net/sched/Makefile b/net/sched/Makefile
index ae088a5..b9d046b 100644
--- a/net/sched/Makefile
+++ b/net/sched/Makefile
@@ -22,6 +22,7 @@ obj-$(CONFIG_NET_ACT_CONNMARK)+= ac

Re: [PATCH net-next V6 0/4] net/sched: ip tunnel metadata set/release/classify by using TC

2016-09-08 Thread Hadar Hen Zion
On Thu, Sep 8, 2016 at 3:32 AM, David Miller <da...@davemloft.net> wrote:
> From: Hadar Hen Zion <had...@mellanox.com>
> Date: Wed,  7 Sep 2016 11:08:02 +0300
>
>> This patchset introduces ip tunnel manipulation support using the TC 
>> subsystem.
>
> Please address the feedback given by Eric Dumazet for patch #4,
> thank you.

Sure, I'll fix it and re-send.

Thanks,
Hadar


[PATCH net-next V6 4/4] net/sched: Introduce act_tunnel_key

2016-09-07 Thread Hadar Hen Zion
From: Amir Vadai <a...@vadai.me>

This action could be used before redirecting packets to a shared tunnel
device, or when redirecting packets arriving from a such a device.

The action will release the metadata created by the tunnel device
(decap), or set the metadata with the specified values for encap
operation.

For example, the following flower filter will forward all ICMP packets
destined to 11.11.11.2 through the shared vxlan device 'vxlan0'. Before
redirecting, a metadata for the vxlan tunnel is created using the
tunnel_key action and it's arguments:

$ tc filter add dev net0 protocol ip parent : \
flower \
  ip_proto 1 \
  dst_ip 11.11.11.2 \
action tunnel_key set \
  src_ip 11.11.0.1 \
  dst_ip 11.11.0.2 \
  id 11 \
action mirred egress redirect dev vxlan0

Signed-off-by: Amir Vadai <a...@vadai.me>
Signed-off-by: Hadar Hen Zion <had...@mellanox.com>
Reviewed-by: Shmulik Ladkani <shmulik.ladk...@gmail.com>
---
 include/net/tc_act/tc_tunnel_key.h|  31 +++
 include/uapi/linux/tc_act/tc_tunnel_key.h |  42 
 net/sched/Kconfig |  11 +
 net/sched/Makefile|   1 +
 net/sched/act_tunnel_key.c| 348 ++
 5 files changed, 433 insertions(+)
 create mode 100644 include/net/tc_act/tc_tunnel_key.h
 create mode 100644 include/uapi/linux/tc_act/tc_tunnel_key.h
 create mode 100644 net/sched/act_tunnel_key.c

diff --git a/include/net/tc_act/tc_tunnel_key.h 
b/include/net/tc_act/tc_tunnel_key.h
new file mode 100644
index 000..6fd2255
--- /dev/null
+++ b/include/net/tc_act/tc_tunnel_key.h
@@ -0,0 +1,31 @@
+/*
+ * Copyright (c) 2016, Amir Vadai <a...@vadai.me>
+ * Copyright (c) 2016, Mellanox Technologies. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#ifndef __NET_TC_TUNNEL_KEY_H
+#define __NET_TC_TUNNEL_KEY_H
+
+#include 
+
+struct tcf_tunnel_key_params {
+   struct rcu_head rcu;
+   int tcft_action;
+   int action;
+   struct metadata_dst *tcft_enc_metadata;
+};
+
+struct tcf_tunnel_key {
+   struct tc_action  common;
+   struct tcf_tunnel_key_params __rcu *params;
+};
+
+#define to_tunnel_key(a) ((struct tcf_tunnel_key *)a)
+
+#endif /* __NET_TC_TUNNEL_KEY_H */
+
diff --git a/include/uapi/linux/tc_act/tc_tunnel_key.h 
b/include/uapi/linux/tc_act/tc_tunnel_key.h
new file mode 100644
index 000..f9ddf53
--- /dev/null
+++ b/include/uapi/linux/tc_act/tc_tunnel_key.h
@@ -0,0 +1,42 @@
+/*
+ * Copyright (c) 2016, Amir Vadai <a...@vadai.me>
+ * Copyright (c) 2016, Mellanox Technologies. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#ifndef __LINUX_TC_TUNNEL_KEY_H
+#define __LINUX_TC_TUNNEL_KEY_H
+
+#include 
+
+#define TCA_ACT_TUNNEL_KEY 17
+
+#define TCA_TUNNEL_KEY_ACT_SET 1
+#define TCA_TUNNEL_KEY_ACT_RELEASE  2
+
+struct tc_tunnel_key {
+   tc_gen;
+   int t_action;
+};
+
+enum {
+   TCA_TUNNEL_KEY_UNSPEC,
+   TCA_TUNNEL_KEY_TM,
+   TCA_TUNNEL_KEY_PARMS,
+   TCA_TUNNEL_KEY_ENC_IPV4_SRC,/* be32 */
+   TCA_TUNNEL_KEY_ENC_IPV4_DST,/* be32 */
+   TCA_TUNNEL_KEY_ENC_IPV6_SRC,/* struct in6_addr */
+   TCA_TUNNEL_KEY_ENC_IPV6_DST,/* struct in6_addr */
+   TCA_TUNNEL_KEY_ENC_KEY_ID,  /* be64 */
+   TCA_TUNNEL_KEY_PAD,
+   __TCA_TUNNEL_KEY_MAX,
+};
+
+#define TCA_TUNNEL_KEY_MAX (__TCA_TUNNEL_KEY_MAX - 1)
+
+#endif
+
diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index ccf931b..72e3426 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -761,6 +761,17 @@ config NET_ACT_IFE
  To compile this code as a module, choose M here: the
  module will be called act_ife.
 
+config NET_ACT_TUNNEL_KEY
+tristate "IP tunnel metadata manipulation"
+depends on NET_CLS_ACT
+---help---
+ Say Y here to set/release ip tunnel metadata.
+
+ If unsure, say N.
+
+ To compile this code as a module, choose M here: the
+ module will be called act_tunnel_key.
+
 config NET_IFE_SKBMARK
 tristate "Support to encoding decoding skb mark on IFE action"
 depends on NET_ACT_IFE
diff --git a/net/sched/Makefile b/net/sched/Makefile
index ae088a5..b9d046b 100644
--- a/net/sched/Makefile
+++ b/net/sched/Makefile
@@ -22,6 +22,7 @@ obj-$(CONFIG_NET_ACT_CONNMARK)+= act_connmark.o
 obj-$(CONFIG_NET_ACT_IFE)  += act_ife.o

[PATCH net-next V6 2/4] net/dst: Utility functions to build dst_metadata without supplying an skb

2016-09-07 Thread Hadar Hen Zion
From: Amir Vadai <a...@vadai.me>

Extract __ip_tun_set_dst() and __ipv6_tun_set_dst() out of
ip_tun_rx_dst() and ipv6_tun_rx_dst(), to be used without supplying an
skb.

Signed-off-by: Amir Vadai <a...@vadai.me>
Signed-off-by: Hadar Hen Zion <had...@mellanox.com>
Acked-by: Jiri Pirko <j...@mellanox.com>
Reviewed-by: Shmulik Ladkani <shmulik.ladk...@gmail.com>
---
 include/net/dst_metadata.h | 52 ++
 1 file changed, 39 insertions(+), 13 deletions(-)

diff --git a/include/net/dst_metadata.h b/include/net/dst_metadata.h
index 5db9f59..6965c8f 100644
--- a/include/net/dst_metadata.h
+++ b/include/net/dst_metadata.h
@@ -112,12 +112,13 @@ static inline struct ip_tunnel_info 
*skb_tunnel_info_unclone(struct sk_buff *skb
return >u.tun_info;
 }
 
-static inline struct metadata_dst *ip_tun_rx_dst(struct sk_buff *skb,
-__be16 flags,
-__be64 tunnel_id,
-int md_size)
+static inline struct metadata_dst *__ip_tun_set_dst(__be32 saddr,
+   __be32 daddr,
+   __u8 tos, __u8 ttl,
+   __be16 flags,
+   __be64 tunnel_id,
+   int md_size)
 {
-   const struct iphdr *iph = ip_hdr(skb);
struct metadata_dst *tun_dst;
 
tun_dst = tun_rx_dst(md_size);
@@ -125,17 +126,30 @@ static inline struct metadata_dst *ip_tun_rx_dst(struct 
sk_buff *skb,
return NULL;
 
ip_tunnel_key_init(_dst->u.tun_info.key,
-  iph->saddr, iph->daddr, iph->tos, iph->ttl,
+  saddr, daddr, tos, ttl,
   0, 0, 0, tunnel_id, flags);
return tun_dst;
 }
 
-static inline struct metadata_dst *ipv6_tun_rx_dst(struct sk_buff *skb,
+static inline struct metadata_dst *ip_tun_rx_dst(struct sk_buff *skb,
 __be16 flags,
 __be64 tunnel_id,
 int md_size)
 {
-   const struct ipv6hdr *ip6h = ipv6_hdr(skb);
+   const struct iphdr *iph = ip_hdr(skb);
+
+   return __ip_tun_set_dst(iph->saddr, iph->daddr, iph->tos, iph->ttl,
+   flags, tunnel_id, md_size);
+}
+
+static inline struct metadata_dst *__ipv6_tun_set_dst(const struct in6_addr 
*saddr,
+ const struct in6_addr 
*daddr,
+ __u8 tos, __u8 ttl,
+ __be32 label,
+ __be16 flags,
+ __be64 tunnel_id,
+ int md_size)
+{
struct metadata_dst *tun_dst;
struct ip_tunnel_info *info;
 
@@ -150,14 +164,26 @@ static inline struct metadata_dst *ipv6_tun_rx_dst(struct 
sk_buff *skb,
info->key.tp_src = 0;
info->key.tp_dst = 0;
 
-   info->key.u.ipv6.src = ip6h->saddr;
-   info->key.u.ipv6.dst = ip6h->daddr;
+   info->key.u.ipv6.src = *saddr;
+   info->key.u.ipv6.dst = *daddr;
 
-   info->key.tos = ipv6_get_dsfield(ip6h);
-   info->key.ttl = ip6h->hop_limit;
-   info->key.label = ip6_flowlabel(ip6h);
+   info->key.tos = tos;
+   info->key.ttl = ttl;
+   info->key.label = label;
 
return tun_dst;
 }
 
+static inline struct metadata_dst *ipv6_tun_rx_dst(struct sk_buff *skb,
+  __be16 flags,
+  __be64 tunnel_id,
+  int md_size)
+{
+   const struct ipv6hdr *ip6h = ipv6_hdr(skb);
+
+   return __ipv6_tun_set_dst(>saddr, >daddr,
+ ipv6_get_dsfield(ip6h), ip6h->hop_limit,
+ ip6_flowlabel(ip6h), flags, tunnel_id,
+ md_size);
+}
 #endif /* __NET_DST_METADATA_H */
-- 
1.8.3.1



[PATCH net-next V6 1/4] net/ip_tunnels: Introduce tunnel_id_to_key32() and key32_to_tunnel_id()

2016-09-07 Thread Hadar Hen Zion
From: Amir Vadai <a...@vadai.me>

Add utility functions to convert a 32 bits key into a 64 bits tunnel and
vice versa.
These functions will be used instead of cloning code in GRE and VXLAN,
and in tc act_iptunnel which will be introduced in a following patch in
this patchset.

Signed-off-by: Amir Vadai <a...@vadai.me>
Signed-off-by: Hadar Hen Zion <had...@mellanox.com>
Reviewed-by: Shmulik Ladkani <shmulik.ladk...@gmail.com>
Acked-by: Jiri Benc <jb...@redhat.com>
Acked-by: Jiri Pirko <j...@mellanox.com>
---
 drivers/net/vxlan.c  |  4 ++--
 include/net/ip_tunnels.h | 19 +++
 include/net/vxlan.h  | 18 --
 net/ipv4/ip_gre.c| 23 ++-
 4 files changed, 23 insertions(+), 41 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 199dec0..4bfeb97 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -1291,7 +1291,7 @@ static int vxlan_rcv(struct sock *sk, struct sk_buff *skb)
struct metadata_dst *tun_dst;
 
tun_dst = udp_tun_rx_dst(skb, vxlan_get_sk_family(vs), 
TUNNEL_KEY,
-vxlan_vni_to_tun_id(vni), sizeof(*md));
+key32_to_tunnel_id(vni), sizeof(*md));
 
if (!tun_dst)
goto drop;
@@ -1945,7 +1945,7 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct 
net_device *dev,
goto drop;
}
dst_port = info->key.tp_dst ? : vxlan->cfg.dst_port;
-   vni = vxlan_tun_id_to_vni(info->key.tun_id);
+   vni = tunnel_id_to_key32(info->key.tun_id);
remote_ip.sa.sa_family = ip_tunnel_info_af(info);
if (remote_ip.sa.sa_family == AF_INET) {
remote_ip.sin.sin_addr.s_addr = info->key.u.ipv4.dst;
diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
index a5e7035..e598c63 100644
--- a/include/net/ip_tunnels.h
+++ b/include/net/ip_tunnels.h
@@ -222,6 +222,25 @@ static inline unsigned short ip_tunnel_info_af(const 
struct ip_tunnel_info
return tun_info->mode & IP_TUNNEL_INFO_IPV6 ? AF_INET6 : AF_INET;
 }
 
+static inline __be64 key32_to_tunnel_id(__be32 key)
+{
+#ifdef __BIG_ENDIAN
+   return (__force __be64)key;
+#else
+   return (__force __be64)((__force u64)key << 32);
+#endif
+}
+
+/* Returns the least-significant 32 bits of a __be64. */
+static inline __be32 tunnel_id_to_key32(__be64 tun_id)
+{
+#ifdef __BIG_ENDIAN
+   return (__force __be32)tun_id;
+#else
+   return (__force __be32)((__force u64)tun_id >> 32);
+#endif
+}
+
 #ifdef CONFIG_INET
 
 int ip_tunnel_init(struct net_device *dev);
diff --git a/include/net/vxlan.h b/include/net/vxlan.h
index b96d036..0255613 100644
--- a/include/net/vxlan.h
+++ b/include/net/vxlan.h
@@ -350,24 +350,6 @@ static inline __be32 vxlan_vni_field(__be32 vni)
 #endif
 }
 
-static inline __be32 vxlan_tun_id_to_vni(__be64 tun_id)
-{
-#if defined(__BIG_ENDIAN)
-   return (__force __be32)tun_id;
-#else
-   return (__force __be32)((__force u64)tun_id >> 32);
-#endif
-}
-
-static inline __be64 vxlan_vni_to_tun_id(__be32 vni)
-{
-#if defined(__BIG_ENDIAN)
-   return (__force __be64)vni;
-#else
-   return (__force __be64)((u64)(__force u32)vni << 32);
-#endif
-}
-
 static inline size_t vxlan_rco_start(__be32 vni_field)
 {
return be32_to_cpu(vni_field & VXLAN_RCO_MASK) << VXLAN_RCO_SHIFT;
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index 113cc43..576f705 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -246,25 +246,6 @@ static void gre_err(struct sk_buff *skb, u32 info)
ipgre_err(skb, info, );
 }
 
-static __be64 key_to_tunnel_id(__be32 key)
-{
-#ifdef __BIG_ENDIAN
-   return (__force __be64)((__force u32)key);
-#else
-   return (__force __be64)((__force u64)key << 32);
-#endif
-}
-
-/* Returns the least-significant 32 bits of a __be64. */
-static __be32 tunnel_id_to_key(__be64 x)
-{
-#ifdef __BIG_ENDIAN
-   return (__force __be32)x;
-#else
-   return (__force __be32)((__force u64)x >> 32);
-#endif
-}
-
 static int __ipgre_rcv(struct sk_buff *skb, const struct tnl_ptk_info *tpi,
   struct ip_tunnel_net *itn, int hdr_len, bool raw_proto)
 {
@@ -290,7 +271,7 @@ static int __ipgre_rcv(struct sk_buff *skb, const struct 
tnl_ptk_info *tpi,
__be64 tun_id;
 
flags = tpi->flags & (TUNNEL_CSUM | TUNNEL_KEY);
-   tun_id = key_to_tunnel_id(tpi->key);
+   tun_id = key32_to_tunnel_id(tpi->key);
tun_dst = ip_tun_rx_dst(skb, flags, tun_id, 0);
if (!tun_dst)
return PACKET_REJECT;
@@ -446,7 +427,7 @@ static void gre_fb_xmit(struct sk_buff *skb, stru

[PATCH net-next V6 0/4] net/sched: ip tunnel metadata set/release/classify by using TC

2016-09-07 Thread Hadar Hen Zion
Hi,

This patchset introduces ip tunnel manipulation support using the TC subsystem.

In the decap flow, it enables the user to redirect packets from a shared tunnel
device and classify by outer and inner headers. The outer headers are extracted
from the metadata and used by the flower filter. A new action act_tunnel_key,
releases the metadata.

In the encap flow, act_tunnel_key creates a metadata object to be used by the
shared tunnel device. The actual redirection to the tunnel device is done using
act_mirred.

For example:
$ tc qdisc add dev vnet0 ingress
$ tc filter add dev vnet0 protocol ip parent : \
flower \
 ip_proto 1 \
action tunnel_key set \
 src_ip 11.11.0.1 \
 dst_ip 11.11.0.2 \
 id 11 \
action mirred egress redirect dev vxlan0

$ tc qdisc add dev vxlan0 ingress
$ tc filter add dev vxlan0 protocol ip parent : \
flower \
 enc_src_ip 11.11.0.2 \
 enc_dst_ip 11.11.0.1 \
 enc_key_id 11 \
action tunnel_key release \
action mirred egress redirect dev vnet0

Amir & Hadar

Changes from V5:
- Add __rcu notation to struct tcf_tunnel_key_params in struct tcf_tunnel_key
- Fix indentation in include/net/dst_metadata.h
- Fix syntx error in commit message

Changes from V4:
- Fix tunnel_key_init function error flow.
- Add 'action' variable to struct tcf_tunnel_key_params and use it instead of
  tcf_action variable which is not protected by rcu lock.

Changes from V3:
- Use percpu stats
- No spinlock on datapatch - protecting parameters with rcu
- Fix buggy handling of set/release dst
- Use nla_get_in_addr and nla_put_in_addr
- Fix change logs
- Pass in6_addr by pointer
- Rename utility functions to start with double underscore

Changes from V2:
- Use union in struct fl_flow_key for enc_ipv6 and enc_ipv4.
- Rename functions _ip_tun_rx_dst and _ipv6_tun_rx_dst to _ip_tun_set_dst and
  _ipv6_tun_set_dst accordingly.
- Remove local parameter 'encapdecap' from tunnel_key_init function.
- Don't copy in6_addr values in tunnel_key_dump_addresses function, use 
pointers.

Changes from V1:
- More cleanups to key32_to_tunnel_id() and tunnel_id_to_key32()
- IPv6 Support added
- Set TUNNEL_KEY flag to make GRE work
- Handle zero tunnel id properly in act_tunnel_key
- Don't leave junk in decap action
- Fix bug in act_tunnel_key initialization where (exists & ocr) is true
- Remove BUG() from code
- Rename action to tunnel_key
- Improve grep-ability of code
- Reuse code from ip_tun_rx_dst() and ipv6_tun_rx_dst()

Changes from RFC:
- Add a new action instead of making mirred too complex
- No need to specify UDP port in action - it is already in the tunnel device
  configuration
- Added a decap operation to drop tunnel metadata

Amir Vadai (4):
  net/ip_tunnels: Introduce tunnel_id_to_key32() and
key32_to_tunnel_id()
  net/dst: Utility functions to build dst_metadata without supplying an
skb
  net/sched: cls_flower: Classify packet in ip tunnels
  net/sched: Introduce act_tunnel_key

 drivers/net/vxlan.c   |   4 +-
 include/net/dst_metadata.h|  52 +++--
 include/net/ip_tunnels.h  |  19 ++
 include/net/tc_act/tc_tunnel_key.h|  31 +++
 include/net/vxlan.h   |  18 --
 include/uapi/linux/pkt_cls.h  |  11 +
 include/uapi/linux/tc_act/tc_tunnel_key.h |  42 
 net/ipv4/ip_gre.c |  23 +-
 net/sched/Kconfig |  11 +
 net/sched/Makefile|   1 +
 net/sched/act_tunnel_key.c| 348 ++
 net/sched/cls_flower.c| 100 -
 12 files changed, 605 insertions(+), 55 deletions(-)
 create mode 100644 include/net/tc_act/tc_tunnel_key.h
 create mode 100644 include/uapi/linux/tc_act/tc_tunnel_key.h
 create mode 100644 net/sched/act_tunnel_key.c

-- 
1.8.3.1



[PATCH net-next V6 3/4] net/sched: cls_flower: Classify packet in ip tunnels

2016-09-07 Thread Hadar Hen Zion
From: Amir Vadai <a...@vadai.me>

Introduce classifying by metadata extracted by the tunnel device.
Outer header fields - source/dest ip and tunnel id, are extracted from
the metadata when classifying.

For example, the following will add a filter on the ingress Qdisc of shared
vxlan device named 'vxlan0'. To forward packets with outer src ip
11.11.0.2, dst ip 11.11.0.1 and tunnel id 11. The packets will be
forwarded to tap device 'vnet0' (after metadata is released):

$ filter add dev vxlan0 protocol ip parent : \
flower \
  enc_src_ip 11.11.0.2 \
  enc_dst_ip 11.11.0.1 \
  enc_key_id 11 \
  dst_ip 11.11.11.1 \
action tunnel_key release \
action mirred egress redirect dev vnet0

The action tunnel_key, will be introduced in the next patch in this
series.

Signed-off-by: Amir Vadai <a...@vadai.me>
Signed-off-by: Hadar Hen Zion <had...@mellanox.com>
Acked-by: Jiri Pirko <j...@mellanox.com>
---
 include/uapi/linux/pkt_cls.h |  11 +
 net/sched/cls_flower.c   | 100 ++-
 2 files changed, 110 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index 51b5b24..f9c287c 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -431,6 +431,17 @@ enum {
TCA_FLOWER_KEY_VLAN_ID,
TCA_FLOWER_KEY_VLAN_PRIO,
TCA_FLOWER_KEY_VLAN_ETH_TYPE,
+
+   TCA_FLOWER_KEY_ENC_KEY_ID,  /* be32 */
+   TCA_FLOWER_KEY_ENC_IPV4_SRC,/* be32 */
+   TCA_FLOWER_KEY_ENC_IPV4_SRC_MASK,/* be32 */
+   TCA_FLOWER_KEY_ENC_IPV4_DST,/* be32 */
+   TCA_FLOWER_KEY_ENC_IPV4_DST_MASK,/* be32 */
+   TCA_FLOWER_KEY_ENC_IPV6_SRC,/* struct in6_addr */
+   TCA_FLOWER_KEY_ENC_IPV6_SRC_MASK,/* struct in6_addr */
+   TCA_FLOWER_KEY_ENC_IPV6_DST,/* struct in6_addr */
+   TCA_FLOWER_KEY_ENC_IPV6_DST_MASK,/* struct in6_addr */
+
__TCA_FLOWER_MAX,
 };
 
diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index cf9ad5b..b084b2a 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -23,9 +23,13 @@
 #include 
 #include 
 
+#include 
+#include 
+
 struct fl_flow_key {
int indev_ifindex;
struct flow_dissector_key_control control;
+   struct flow_dissector_key_control enc_control;
struct flow_dissector_key_basic basic;
struct flow_dissector_key_eth_addrs eth;
struct flow_dissector_key_vlan vlan;
@@ -35,6 +39,11 @@ struct fl_flow_key {
struct flow_dissector_key_ipv6_addrs ipv6;
};
struct flow_dissector_key_ports tp;
+   struct flow_dissector_key_keyid enc_key_id;
+   union {
+   struct flow_dissector_key_ipv4_addrs enc_ipv4;
+   struct flow_dissector_key_ipv6_addrs enc_ipv6;
+   };
 } __aligned(BITS_PER_LONG / 8); /* Ensure that we can do comparisons as longs. 
*/
 
 struct fl_flow_mask_range {
@@ -124,11 +133,31 @@ static int fl_classify(struct sk_buff *skb, const struct 
tcf_proto *tp,
struct cls_fl_filter *f;
struct fl_flow_key skb_key;
struct fl_flow_key skb_mkey;
+   struct ip_tunnel_info *info;
 
if (!atomic_read(>ht.nelems))
return -1;
 
fl_clear_masked_range(_key, >mask);
+
+   info = skb_tunnel_info(skb);
+   if (info) {
+   struct ip_tunnel_key *key = >key;
+
+   switch (ip_tunnel_info_af(info)) {
+   case AF_INET:
+   skb_key.enc_ipv4.src = key->u.ipv4.src;
+   skb_key.enc_ipv4.dst = key->u.ipv4.dst;
+   break;
+   case AF_INET6:
+   skb_key.enc_ipv6.src = key->u.ipv6.src;
+   skb_key.enc_ipv6.dst = key->u.ipv6.dst;
+   break;
+   }
+
+   skb_key.enc_key_id.keyid = tunnel_id_to_key32(key->tun_id);
+   }
+
skb_key.indev_ifindex = skb->skb_iif;
/* skb_flow_dissect() does not set n_proto in case an unknown protocol,
 * so do it rather here.
@@ -297,7 +326,15 @@ static const struct nla_policy fl_policy[TCA_FLOWER_MAX + 
1] = {
[TCA_FLOWER_KEY_VLAN_ID]= { .type = NLA_U16 },
[TCA_FLOWER_KEY_VLAN_PRIO]  = { .type = NLA_U8 },
[TCA_FLOWER_KEY_VLAN_ETH_TYPE]  = { .type = NLA_U16 },
-
+   [TCA_FLOWER_KEY_ENC_KEY_ID] = { .type = NLA_U32 },
+   [TCA_FLOWER_KEY_ENC_IPV4_SRC]   = { .type = NLA_U32 },
+   [TCA_FLOWER_KEY_ENC_IPV4_SRC_MASK] = { .type = NLA_U32 },
+   [TCA_FLOWER_KEY_ENC_IPV4_DST]   = { .type = NLA_U32 },
+   [TCA_FLOWER_KEY_ENC_IPV4_DST_MASK] = { .type = NLA_U32 },
+   [TCA_FLOWER_KEY_ENC_IPV6_SRC]   = { .len = sizeof(struct in6_addr) },
+   [TCA_FLOWER_KEY_ENC_IPV6_SRC_MASK] = { .len = sizeof(struct in6_addr) },
+   [TCA_FLOWER_KEY_ENC_IPV6_DST]   = { .len = sizeof(struct in6_addr) },

Re: [PATCH net-next V5 4/4] net/sched: Introduce act_tunnel_key

2016-09-06 Thread Hadar Hen Zion
On Tue, Sep 6, 2016 at 5:11 PM, Eric Dumazet <eric.duma...@gmail.com> wrote:
> On Sun, 2016-09-04 at 13:55 +0300, Hadar Hen Zion wrote:
>> From: Amir Vadai <a...@vadai.me>
>
> ...
>
>> +struct tcf_tunnel_key_params {
>> + struct rcu_head rcu;
>> + int tcft_action;
>> + int action;
>> + struct metadata_dst *tcft_enc_metadata;
>> +};
>> +
>> +struct tcf_tunnel_key {
>> + struct tc_action  common;
>> + struct tcf_tunnel_key_params *params;
>
> In order to please sparse you must add __rcu qualifier, as in :
>
> struct tcf_tunnel_key_params __rcu *params;

Thanks!


>
>> +};
>> +
>
> Thanks.
>
>
>


Re: [PATCH net-next V5 4/4] net/sched: Introduce act_tunnel_key

2016-09-06 Thread Hadar Hen Zion
On Tue, Sep 6, 2016 at 1:49 PM, Jamal Hadi Salim <j...@mojatatu.com> wrote:
> On 16-09-04 06:55 AM, Hadar Hen Zion wrote:
>>
>> From: Amir Vadai <a...@vadai.me>
>>
>> This action could be used before redirecting packets to a shared tunnel
>> device, or when redirecting packets arriving from a such a device.
>>
>> The action will release the metadata created by the tunnel device
>> (decap), or set the metadata with the specified values for encap
>> operation.
>>
>> For example, the following flower filter will forward all ICMP packets
>> destined to 11.11.11.2 through the shared vxlan device 'vxlan0'. Before
>> redirecting, a metadata for the vxlan tunnel is created using the
>> tunnel_key action and it's arguments:
>>
>> $ filter add dev net0 protocol ip parent : \
>> flower \
>>   ip_proto 1 \
>>   dst_ip 11.11.11.2 \
>> action tunnel_key set \
>>   src_ip 11.11.0.1 \
>>   dst_ip 11.11.0.2 \
>>   id 11 \
>> action mirred egress redirect dev vxlan0
>>
>
>
> Syntax error above. Regardless:

ack, will be fixed.

> Please verify by running a test and send a packet or two
> and verify that stats are incremented (I know it may sound silly to
> ask but it is important).

Already tested that tc filter stats are working and incremented as expected :-)
.
>
>
>> +static int tunnel_key_act(struct sk_buff *skb, const struct tc_action *a,
>> + struct tcf_result *res)
>> +{
>> +   struct tcf_tunnel_key *t = to_tunnel_key(a);
>> +   struct tcf_tunnel_key_params *params;
>> +   int action;
>> +
>> +   rcu_read_lock();
>> +
>> +   params = rcu_dereference(t->params);
>> +
>> +   tcf_lastuse_update(>tcf_tm);
>> +   bstats_cpu_update(this_cpu_ptr(t->common.cpu_bstats), skb);
>> +   action = params->action;
>> +
>> +   switch (params->tcft_action) {
>> +   case TCA_TUNNEL_KEY_ACT_RELEASE:
>> +   skb_dst_drop(skb);
>> +   break;
>> +   case TCA_TUNNEL_KEY_ACT_SET:
>> +   skb_dst_drop(skb);
>> +   skb_dst_set(skb,
>> dst_clone(>tcft_enc_metadata->dst));
>> +   break;
>> +   default:
>> +   WARN_ONCE(1, "Bad tunnel_key action.\n");
>> +   break;
>
>
>
> slow path (_init()) is already checking for a bad tcft_act so it seems
> unnecessary to have the default.
> If you have to keep default would be useful to print the value as well.

ack.

>
> Other than that looks good.
> Acked-by: Jamal Hadi Salim <j...@mojatatu.com>
>
> cheers,
> jamal


Re: [PATCH net-next V5 4/4] net/sched: Introduce act_tunnel_key

2016-09-05 Thread Hadar Hen Zion
On Sun, Sep 4, 2016 at 9:19 PM, Rosen, Rami  wrote:
> Hi, Hadar,
>
>>For example, the following flower filter will forward all ICMP packets 
>>destined to 11.11.11.2 >through the shared vxlan device 'vxlan0'. Before 
>>redirecting, a metadata for the vxlan tunnel >is created using the tunnel_key 
>>action and it's arguments:
>
> Shouldn't it be "tc filter add dev ..."?

yes, I'll fix it to next version.

Thanks,
Hadar

>
>>$ filter add dev net0 protocol ip parent : \
>>flower \
>>  ip_proto 1 \
>>  dst_ip 11.11.11.2 \
>>action tunnel_key set \
>>  src_ip 11.11.0.1 \
>>  dst_ip 11.11.0.2 \
>>  id 11 \
>>action mirred egress redirect dev vxlan0
>
> Regards,
> Rami Rosen
> Intel Corporation
>


Re: [PATCH net-next V5 2/4] net/dst: Utility functions to build dst_metadata without supplying an skb

2016-09-04 Thread Hadar Hen Zion
On Sun, Sep 4, 2016 at 2:14 PM, Sergei Shtylyov
<sergei.shtyl...@cogentembedded.com> wrote:
> Hello.
>
>
> On 9/4/2016 1:55 PM, Hadar Hen Zion wrote:
>
>> From: Amir Vadai <a...@vadai.me>
>>
>> Extract __ip_tun_set_dst() and __ipv6_tun_set_dst() out of
>> ip_tun_rx_dst() and ipv6_tun_rx_dst(), to be used without supplying an
>> skb.
>>
>> Signed-off-by: Amir Vadai <a...@vadai.me>
>> Signed-off-by: Hadar Hen Zion <had...@mellanox.com>
>> Acked-by: Jiri Pirko <j...@mellanox.com>
>> Reviewed-by: Shmulik Ladkani <shmulik.ladk...@gmail.com>
>> ---
>>  include/net/dst_metadata.h | 45
>> -
>>  1 file changed, 32 insertions(+), 13 deletions(-)
>>
>> diff --git a/include/net/dst_metadata.h b/include/net/dst_metadata.h
>> index 5db9f59..49e8847 100644
>> --- a/include/net/dst_metadata.h
>> +++ b/include/net/dst_metadata.h
>> @@ -112,12 +112,10 @@ static inline struct ip_tunnel_info
>> *skb_tunnel_info_unclone(struct sk_buff *skb
>> return >u.tun_info;
>>  }
>>
>> -static inline struct metadata_dst *ip_tun_rx_dst(struct sk_buff *skb,
>> -__be16 flags,
>> -__be64 tunnel_id,
>> -int md_size)
>> +static inline struct metadata_dst *
>> +__ip_tun_set_dst(__be32 saddr, __be32 daddr, __u8 tos, __u8 ttl,
>> +__be16 flags, __be64 tunnel_id, int md_size)
>
>
>The continuation lines should start under the 1st '__be32' on the broken
> up line. See how it was before your patch.

ack

>
>>  {
>> -   const struct iphdr *iph = ip_hdr(skb);
>> struct metadata_dst *tun_dst;
>>
>> tun_dst = tun_rx_dst(md_size);
>> @@ -125,17 +123,27 @@ static inline struct metadata_dst
>> *ip_tun_rx_dst(struct sk_buff *skb,
>
> [...]
>>
>> -static inline struct metadata_dst *ipv6_tun_rx_dst(struct sk_buff *skb,
>> +static inline struct metadata_dst *ip_tun_rx_dst(struct sk_buff *skb,
>>  __be16 flags,
>>  __be64 tunnel_id,
>>  int md_size)
>>  {
>> -   const struct ipv6hdr *ip6h = ipv6_hdr(skb);
>> +   const struct iphdr *iph = ip_hdr(skb);
>> +
>> +   return __ip_tun_set_dst(iph->saddr, iph->daddr, iph->tos,
>> iph->ttl,
>> +   flags, tunnel_id, md_size);
>> +}
>> +
>> +static inline struct metadata_dst *
>> +__ipv6_tun_set_dst(const struct in6_addr *saddr, const struct in6_addr
>> *daddr,
>> +  __u8 tos, __u8 ttl, __be32 label, __be16 flags,
>> +  __be64 tunnel_id, int md_size)
>
>
>The continuation lines should start under the 1st *const* on the broken
> up line.

ack

>
>> +{
>> struct metadata_dst *tun_dst;
>> struct ip_tunnel_info *info;
>>
>> @@ -150,14 +158,25 @@ static inline struct metadata_dst
>> *ipv6_tun_rx_dst(struct sk_buff *skb,
>
> [...]
>>
>> +static inline struct metadata_dst *
>> +ipv6_tun_rx_dst(struct sk_buff *skb, __be16 flags, __be64 tunnel_id,
>> +   int md_size)
>> +{
>> +   const struct ipv6hdr *ip6h = ipv6_hdr(skb);
>> +
>> +   return __ipv6_tun_set_dst(>saddr, >daddr,
>> +   ipv6_get_dsfield(ip6h), ip6h->hop_limit,
>> +   ip6_flowlabel(ip6h), flags, tunnel_id,
>> +   md_size);
>
>
>The continuation lines should start exactly under the 1st & on the broken
> up line.
>That's DaveM's preference, I don't remember if checkpatch.pl reports that
> for the networking code...

checkpatch doesn't report :(
I'll fix it to the next version.

>
> [...]
>
> MBR, Sergei
>


[PATCH net-next V5 3/4] net/sched: cls_flower: Classify packet in ip tunnels

2016-09-04 Thread Hadar Hen Zion
From: Amir Vadai <a...@vadai.me>

Introduce classifying by metadata extracted by the tunnel device.
Outer header fields - source/dest ip and tunnel id, are extracted from
the metadata when classifying.

For example, the following will add a filter on the ingress Qdisc of shared
vxlan device named 'vxlan0'. To forward packets with outer src ip
11.11.0.2, dst ip 11.11.0.1 and tunnel id 11. The packets will be
forwarded to tap device 'vnet0' (after metadata is released):

$ filter add dev vxlan0 protocol ip parent : \
flower \
  enc_src_ip 11.11.0.2 \
  enc_dst_ip 11.11.0.1 \
  enc_key_id 11 \
  dst_ip 11.11.11.1 \
action tunnel_key release \
action mirred egress redirect dev vnet0

The action tunnel_key, will be introduced in the next patch in this
series.

Signed-off-by: Amir Vadai <a...@vadai.me>
Signed-off-by: Hadar Hen Zion <had...@mellanox.com>
Acked-by: Jiri Pirko <j...@mellanox.com>
---
 include/uapi/linux/pkt_cls.h |  11 +
 net/sched/cls_flower.c   | 100 ++-
 2 files changed, 110 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index 51b5b24..f9c287c 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -431,6 +431,17 @@ enum {
TCA_FLOWER_KEY_VLAN_ID,
TCA_FLOWER_KEY_VLAN_PRIO,
TCA_FLOWER_KEY_VLAN_ETH_TYPE,
+
+   TCA_FLOWER_KEY_ENC_KEY_ID,  /* be32 */
+   TCA_FLOWER_KEY_ENC_IPV4_SRC,/* be32 */
+   TCA_FLOWER_KEY_ENC_IPV4_SRC_MASK,/* be32 */
+   TCA_FLOWER_KEY_ENC_IPV4_DST,/* be32 */
+   TCA_FLOWER_KEY_ENC_IPV4_DST_MASK,/* be32 */
+   TCA_FLOWER_KEY_ENC_IPV6_SRC,/* struct in6_addr */
+   TCA_FLOWER_KEY_ENC_IPV6_SRC_MASK,/* struct in6_addr */
+   TCA_FLOWER_KEY_ENC_IPV6_DST,/* struct in6_addr */
+   TCA_FLOWER_KEY_ENC_IPV6_DST_MASK,/* struct in6_addr */
+
__TCA_FLOWER_MAX,
 };
 
diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index cf9ad5b..b084b2a 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -23,9 +23,13 @@
 #include 
 #include 
 
+#include 
+#include 
+
 struct fl_flow_key {
int indev_ifindex;
struct flow_dissector_key_control control;
+   struct flow_dissector_key_control enc_control;
struct flow_dissector_key_basic basic;
struct flow_dissector_key_eth_addrs eth;
struct flow_dissector_key_vlan vlan;
@@ -35,6 +39,11 @@ struct fl_flow_key {
struct flow_dissector_key_ipv6_addrs ipv6;
};
struct flow_dissector_key_ports tp;
+   struct flow_dissector_key_keyid enc_key_id;
+   union {
+   struct flow_dissector_key_ipv4_addrs enc_ipv4;
+   struct flow_dissector_key_ipv6_addrs enc_ipv6;
+   };
 } __aligned(BITS_PER_LONG / 8); /* Ensure that we can do comparisons as longs. 
*/
 
 struct fl_flow_mask_range {
@@ -124,11 +133,31 @@ static int fl_classify(struct sk_buff *skb, const struct 
tcf_proto *tp,
struct cls_fl_filter *f;
struct fl_flow_key skb_key;
struct fl_flow_key skb_mkey;
+   struct ip_tunnel_info *info;
 
if (!atomic_read(>ht.nelems))
return -1;
 
fl_clear_masked_range(_key, >mask);
+
+   info = skb_tunnel_info(skb);
+   if (info) {
+   struct ip_tunnel_key *key = >key;
+
+   switch (ip_tunnel_info_af(info)) {
+   case AF_INET:
+   skb_key.enc_ipv4.src = key->u.ipv4.src;
+   skb_key.enc_ipv4.dst = key->u.ipv4.dst;
+   break;
+   case AF_INET6:
+   skb_key.enc_ipv6.src = key->u.ipv6.src;
+   skb_key.enc_ipv6.dst = key->u.ipv6.dst;
+   break;
+   }
+
+   skb_key.enc_key_id.keyid = tunnel_id_to_key32(key->tun_id);
+   }
+
skb_key.indev_ifindex = skb->skb_iif;
/* skb_flow_dissect() does not set n_proto in case an unknown protocol,
 * so do it rather here.
@@ -297,7 +326,15 @@ static const struct nla_policy fl_policy[TCA_FLOWER_MAX + 
1] = {
[TCA_FLOWER_KEY_VLAN_ID]= { .type = NLA_U16 },
[TCA_FLOWER_KEY_VLAN_PRIO]  = { .type = NLA_U8 },
[TCA_FLOWER_KEY_VLAN_ETH_TYPE]  = { .type = NLA_U16 },
-
+   [TCA_FLOWER_KEY_ENC_KEY_ID] = { .type = NLA_U32 },
+   [TCA_FLOWER_KEY_ENC_IPV4_SRC]   = { .type = NLA_U32 },
+   [TCA_FLOWER_KEY_ENC_IPV4_SRC_MASK] = { .type = NLA_U32 },
+   [TCA_FLOWER_KEY_ENC_IPV4_DST]   = { .type = NLA_U32 },
+   [TCA_FLOWER_KEY_ENC_IPV4_DST_MASK] = { .type = NLA_U32 },
+   [TCA_FLOWER_KEY_ENC_IPV6_SRC]   = { .len = sizeof(struct in6_addr) },
+   [TCA_FLOWER_KEY_ENC_IPV6_SRC_MASK] = { .len = sizeof(struct in6_addr) },
+   [TCA_FLOWER_KEY_ENC_IPV6_DST]   = { .len = sizeof(struct in6_addr) },

[PATCH net-next V5 0/4] net/sched: ip tunnel metadata set/release/classify by using TC

2016-09-04 Thread Hadar Hen Zion
Hi,

This patchset introduces ip tunnel manipulation support using the TC subsystem.

In the decap flow, it enables the user to redirect packets from a shared tunnel
device and classify by outer and inner headers. The outer headers are extracted
from the metadata and used by the flower filter. A new action act_tunnel_key,
releases the metadata.

In the encap flow, act_tunnel_key creates a metadata object to be used by the
shared tunnel device. The actual redirection to the tunnel device is done using
act_mirred.

For example:
$ tc qdisc add dev vnet0 ingress
$ tc filter add dev vnet0 protocol ip parent : \
flower \
 ip_proto 1 \
action tunnel_key set \
 src_ip 11.11.0.1 \
 dst_ip 11.11.0.2 \
 id 11 \
action mirred egress redirect dev vxlan0

$ tc qdisc add dev vxlan0 ingress
$ tc filter add dev vxlan0 protocol ip parent : \
flower \
 enc_src_ip 11.11.0.2 \
 enc_dst_ip 11.11.0.1 \
 enc_key_id 11 \
action tunnel_key release \
action mirred egress redirect dev vnet0

Amir & Hadar

Changes from V4:
- Fix tunnel_key_init function error flow.
- Add 'action' variable to struct tcf_tunnel_key_params and use it instead of
  tcf_action variable which is not protected by rcu lock.

Changes from V3:
- Use percpu stats
- No spinlock on datapatch - protecting parameters with rcu
- Fix buggy handling of set/release dst
- Use nla_get_in_addr and nla_put_in_addr
- Fix change logs
- Pass in6_addr by pointer
- Rename utility functions to start with double underscore

Changes from V2:
- Use union in struct fl_flow_key for enc_ipv6 and enc_ipv4.
- Rename functions _ip_tun_rx_dst and _ipv6_tun_rx_dst to _ip_tun_set_dst and
  _ipv6_tun_set_dst accordingly.
- Remove local parameter 'encapdecap' from tunnel_key_init function.
- Don't copy in6_addr values in tunnel_key_dump_addresses function, use 
pointers.

Changes from V1:
- More cleanups to key32_to_tunnel_id() and tunnel_id_to_key32()
- IPv6 Support added
- Set TUNNEL_KEY flag to make GRE work
- Handle zero tunnel id properly in act_tunnel_key
- Don't leave junk in decap action
- Fix bug in act_tunnel_key initialization where (exists & ocr) is true
- Remove BUG() from code
- Rename action to tunnel_key
- Improve grep-ability of code
- Reuse code from ip_tun_rx_dst() and ipv6_tun_rx_dst()

Changes from RFC:
- Add a new action instead of making mirred too complex
- No need to specify UDP port in action - it is already in the tunnel device
  configuration
- Added a decap operation to drop tunnel metadata

Amir Vadai (4):
  net/ip_tunnels: Introduce tunnel_id_to_key32() and
key32_to_tunnel_id()
  net/dst: Utility functions to build dst_metadata without supplying an
skb
  net/sched: cls_flower: Classify packet in ip tunnels
  net/sched: Introduce act_tunnel_key

 drivers/net/vxlan.c   |   4 +-
 include/net/dst_metadata.h|  45 ++--
 include/net/ip_tunnels.h  |  19 ++
 include/net/tc_act/tc_tunnel_key.h|  31 +++
 include/net/vxlan.h   |  18 --
 include/uapi/linux/pkt_cls.h  |  11 +
 include/uapi/linux/tc_act/tc_tunnel_key.h |  42 
 net/ipv4/ip_gre.c |  23 +-
 net/sched/Kconfig |  11 +
 net/sched/Makefile|   1 +
 net/sched/act_tunnel_key.c| 348 ++
 net/sched/cls_flower.c| 100 -
 12 files changed, 598 insertions(+), 55 deletions(-)
 create mode 100644 include/net/tc_act/tc_tunnel_key.h
 create mode 100644 include/uapi/linux/tc_act/tc_tunnel_key.h
 create mode 100644 net/sched/act_tunnel_key.c

-- 
1.8.3.1



[PATCH net-next V5 4/4] net/sched: Introduce act_tunnel_key

2016-09-04 Thread Hadar Hen Zion
From: Amir Vadai <a...@vadai.me>

This action could be used before redirecting packets to a shared tunnel
device, or when redirecting packets arriving from a such a device.

The action will release the metadata created by the tunnel device
(decap), or set the metadata with the specified values for encap
operation.

For example, the following flower filter will forward all ICMP packets
destined to 11.11.11.2 through the shared vxlan device 'vxlan0'. Before
redirecting, a metadata for the vxlan tunnel is created using the
tunnel_key action and it's arguments:

$ filter add dev net0 protocol ip parent : \
flower \
  ip_proto 1 \
  dst_ip 11.11.11.2 \
action tunnel_key set \
  src_ip 11.11.0.1 \
  dst_ip 11.11.0.2 \
  id 11 \
action mirred egress redirect dev vxlan0

Signed-off-by: Amir Vadai <a...@vadai.me>
Signed-off-by: Hadar Hen Zion <had...@mellanox.com>
---
 include/net/tc_act/tc_tunnel_key.h|  31 +++
 include/uapi/linux/tc_act/tc_tunnel_key.h |  42 
 net/sched/Kconfig |  11 +
 net/sched/Makefile|   1 +
 net/sched/act_tunnel_key.c| 348 ++
 5 files changed, 433 insertions(+)
 create mode 100644 include/net/tc_act/tc_tunnel_key.h
 create mode 100644 include/uapi/linux/tc_act/tc_tunnel_key.h
 create mode 100644 net/sched/act_tunnel_key.c

diff --git a/include/net/tc_act/tc_tunnel_key.h 
b/include/net/tc_act/tc_tunnel_key.h
new file mode 100644
index 000..7c34652
--- /dev/null
+++ b/include/net/tc_act/tc_tunnel_key.h
@@ -0,0 +1,31 @@
+/*
+ * Copyright (c) 2016, Amir Vadai <a...@vadai.me>
+ * Copyright (c) 2016, Mellanox Technologies. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#ifndef __NET_TC_TUNNEL_KEY_H
+#define __NET_TC_TUNNEL_KEY_H
+
+#include 
+
+struct tcf_tunnel_key_params {
+   struct rcu_head rcu;
+   int tcft_action;
+   int action;
+   struct metadata_dst *tcft_enc_metadata;
+};
+
+struct tcf_tunnel_key {
+   struct tc_action  common;
+   struct tcf_tunnel_key_params *params;
+};
+
+#define to_tunnel_key(a) ((struct tcf_tunnel_key *)a)
+
+#endif /* __NET_TC_TUNNEL_KEY_H */
+
diff --git a/include/uapi/linux/tc_act/tc_tunnel_key.h 
b/include/uapi/linux/tc_act/tc_tunnel_key.h
new file mode 100644
index 000..f9ddf53
--- /dev/null
+++ b/include/uapi/linux/tc_act/tc_tunnel_key.h
@@ -0,0 +1,42 @@
+/*
+ * Copyright (c) 2016, Amir Vadai <a...@vadai.me>
+ * Copyright (c) 2016, Mellanox Technologies. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#ifndef __LINUX_TC_TUNNEL_KEY_H
+#define __LINUX_TC_TUNNEL_KEY_H
+
+#include 
+
+#define TCA_ACT_TUNNEL_KEY 17
+
+#define TCA_TUNNEL_KEY_ACT_SET 1
+#define TCA_TUNNEL_KEY_ACT_RELEASE  2
+
+struct tc_tunnel_key {
+   tc_gen;
+   int t_action;
+};
+
+enum {
+   TCA_TUNNEL_KEY_UNSPEC,
+   TCA_TUNNEL_KEY_TM,
+   TCA_TUNNEL_KEY_PARMS,
+   TCA_TUNNEL_KEY_ENC_IPV4_SRC,/* be32 */
+   TCA_TUNNEL_KEY_ENC_IPV4_DST,/* be32 */
+   TCA_TUNNEL_KEY_ENC_IPV6_SRC,/* struct in6_addr */
+   TCA_TUNNEL_KEY_ENC_IPV6_DST,/* struct in6_addr */
+   TCA_TUNNEL_KEY_ENC_KEY_ID,  /* be64 */
+   TCA_TUNNEL_KEY_PAD,
+   __TCA_TUNNEL_KEY_MAX,
+};
+
+#define TCA_TUNNEL_KEY_MAX (__TCA_TUNNEL_KEY_MAX - 1)
+
+#endif
+
diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index ccf931b..72e3426 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -761,6 +761,17 @@ config NET_ACT_IFE
  To compile this code as a module, choose M here: the
  module will be called act_ife.
 
+config NET_ACT_TUNNEL_KEY
+tristate "IP tunnel metadata manipulation"
+depends on NET_CLS_ACT
+---help---
+ Say Y here to set/release ip tunnel metadata.
+
+ If unsure, say N.
+
+ To compile this code as a module, choose M here: the
+ module will be called act_tunnel_key.
+
 config NET_IFE_SKBMARK
 tristate "Support to encoding decoding skb mark on IFE action"
 depends on NET_ACT_IFE
diff --git a/net/sched/Makefile b/net/sched/Makefile
index ae088a5..b9d046b 100644
--- a/net/sched/Makefile
+++ b/net/sched/Makefile
@@ -22,6 +22,7 @@ obj-$(CONFIG_NET_ACT_CONNMARK)+= act_connmark.o
 obj-$(CONFIG_NET_ACT_IFE)  += act_ife.o
 obj-$(CONFIG_NET_IFE_SKBMARK)  += act_meta_mark.o
 obj-$(C

[PATCH net-next V5 2/4] net/dst: Utility functions to build dst_metadata without supplying an skb

2016-09-04 Thread Hadar Hen Zion
From: Amir Vadai <a...@vadai.me>

Extract __ip_tun_set_dst() and __ipv6_tun_set_dst() out of
ip_tun_rx_dst() and ipv6_tun_rx_dst(), to be used without supplying an
skb.

Signed-off-by: Amir Vadai <a...@vadai.me>
Signed-off-by: Hadar Hen Zion <had...@mellanox.com>
Acked-by: Jiri Pirko <j...@mellanox.com>
Reviewed-by: Shmulik Ladkani <shmulik.ladk...@gmail.com>
---
 include/net/dst_metadata.h | 45 -
 1 file changed, 32 insertions(+), 13 deletions(-)

diff --git a/include/net/dst_metadata.h b/include/net/dst_metadata.h
index 5db9f59..49e8847 100644
--- a/include/net/dst_metadata.h
+++ b/include/net/dst_metadata.h
@@ -112,12 +112,10 @@ static inline struct ip_tunnel_info 
*skb_tunnel_info_unclone(struct sk_buff *skb
return >u.tun_info;
 }
 
-static inline struct metadata_dst *ip_tun_rx_dst(struct sk_buff *skb,
-__be16 flags,
-__be64 tunnel_id,
-int md_size)
+static inline struct metadata_dst *
+__ip_tun_set_dst(__be32 saddr, __be32 daddr, __u8 tos, __u8 ttl,
+__be16 flags, __be64 tunnel_id, int md_size)
 {
-   const struct iphdr *iph = ip_hdr(skb);
struct metadata_dst *tun_dst;
 
tun_dst = tun_rx_dst(md_size);
@@ -125,17 +123,27 @@ static inline struct metadata_dst *ip_tun_rx_dst(struct 
sk_buff *skb,
return NULL;
 
ip_tunnel_key_init(_dst->u.tun_info.key,
-  iph->saddr, iph->daddr, iph->tos, iph->ttl,
+  saddr, daddr, tos, ttl,
   0, 0, 0, tunnel_id, flags);
return tun_dst;
 }
 
-static inline struct metadata_dst *ipv6_tun_rx_dst(struct sk_buff *skb,
+static inline struct metadata_dst *ip_tun_rx_dst(struct sk_buff *skb,
 __be16 flags,
 __be64 tunnel_id,
 int md_size)
 {
-   const struct ipv6hdr *ip6h = ipv6_hdr(skb);
+   const struct iphdr *iph = ip_hdr(skb);
+
+   return __ip_tun_set_dst(iph->saddr, iph->daddr, iph->tos, iph->ttl,
+   flags, tunnel_id, md_size);
+}
+
+static inline struct metadata_dst *
+__ipv6_tun_set_dst(const struct in6_addr *saddr, const struct in6_addr *daddr,
+  __u8 tos, __u8 ttl, __be32 label, __be16 flags,
+  __be64 tunnel_id, int md_size)
+{
struct metadata_dst *tun_dst;
struct ip_tunnel_info *info;
 
@@ -150,14 +158,25 @@ static inline struct metadata_dst *ipv6_tun_rx_dst(struct 
sk_buff *skb,
info->key.tp_src = 0;
info->key.tp_dst = 0;
 
-   info->key.u.ipv6.src = ip6h->saddr;
-   info->key.u.ipv6.dst = ip6h->daddr;
+   info->key.u.ipv6.src = *saddr;
+   info->key.u.ipv6.dst = *daddr;
 
-   info->key.tos = ipv6_get_dsfield(ip6h);
-   info->key.ttl = ip6h->hop_limit;
-   info->key.label = ip6_flowlabel(ip6h);
+   info->key.tos = tos;
+   info->key.ttl = ttl;
+   info->key.label = label;
 
return tun_dst;
 }
 
+static inline struct metadata_dst *
+ipv6_tun_rx_dst(struct sk_buff *skb, __be16 flags, __be64 tunnel_id,
+   int md_size)
+{
+   const struct ipv6hdr *ip6h = ipv6_hdr(skb);
+
+   return __ipv6_tun_set_dst(>saddr, >daddr,
+   ipv6_get_dsfield(ip6h), ip6h->hop_limit,
+   ip6_flowlabel(ip6h), flags, tunnel_id,
+   md_size);
+}
 #endif /* __NET_DST_METADATA_H */
-- 
1.8.3.1



[PATCH net-next V5 1/4] net/ip_tunnels: Introduce tunnel_id_to_key32() and key32_to_tunnel_id()

2016-09-04 Thread Hadar Hen Zion
From: Amir Vadai <a...@vadai.me>

Add utility functions to convert a 32 bits key into a 64 bits tunnel and
vice versa.
These functions will be used instead of cloning code in GRE and VXLAN,
and in tc act_iptunnel which will be introduced in a following patch in
this patchset.

Signed-off-by: Amir Vadai <a...@vadai.me>
Signed-off-by: Hadar Hen Zion <had...@mellanox.com>
Reviewed-by: Shmulik Ladkani <shmulik.ladk...@gmail.com>
Acked-by: Jiri Benc <jb...@redhat.com>
Acked-by: Jiri Pirko <j...@mellanox.com>
---
 drivers/net/vxlan.c  |  4 ++--
 include/net/ip_tunnels.h | 19 +++
 include/net/vxlan.h  | 18 --
 net/ipv4/ip_gre.c| 23 ++-
 4 files changed, 23 insertions(+), 41 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index f605a36..dc1a412 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -1291,7 +1291,7 @@ static int vxlan_rcv(struct sock *sk, struct sk_buff *skb)
struct metadata_dst *tun_dst;
 
tun_dst = udp_tun_rx_dst(skb, vxlan_get_sk_family(vs), 
TUNNEL_KEY,
-vxlan_vni_to_tun_id(vni), sizeof(*md));
+key32_to_tunnel_id(vni), sizeof(*md));
 
if (!tun_dst)
goto drop;
@@ -1945,7 +1945,7 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct 
net_device *dev,
goto drop;
}
dst_port = info->key.tp_dst ? : vxlan->cfg.dst_port;
-   vni = vxlan_tun_id_to_vni(info->key.tun_id);
+   vni = tunnel_id_to_key32(info->key.tun_id);
remote_ip.sa.sa_family = ip_tunnel_info_af(info);
if (remote_ip.sa.sa_family == AF_INET) {
remote_ip.sin.sin_addr.s_addr = info->key.u.ipv4.dst;
diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
index a5e7035..e598c63 100644
--- a/include/net/ip_tunnels.h
+++ b/include/net/ip_tunnels.h
@@ -222,6 +222,25 @@ static inline unsigned short ip_tunnel_info_af(const 
struct ip_tunnel_info
return tun_info->mode & IP_TUNNEL_INFO_IPV6 ? AF_INET6 : AF_INET;
 }
 
+static inline __be64 key32_to_tunnel_id(__be32 key)
+{
+#ifdef __BIG_ENDIAN
+   return (__force __be64)key;
+#else
+   return (__force __be64)((__force u64)key << 32);
+#endif
+}
+
+/* Returns the least-significant 32 bits of a __be64. */
+static inline __be32 tunnel_id_to_key32(__be64 tun_id)
+{
+#ifdef __BIG_ENDIAN
+   return (__force __be32)tun_id;
+#else
+   return (__force __be32)((__force u64)tun_id >> 32);
+#endif
+}
+
 #ifdef CONFIG_INET
 
 int ip_tunnel_init(struct net_device *dev);
diff --git a/include/net/vxlan.h b/include/net/vxlan.h
index b96d036..0255613 100644
--- a/include/net/vxlan.h
+++ b/include/net/vxlan.h
@@ -350,24 +350,6 @@ static inline __be32 vxlan_vni_field(__be32 vni)
 #endif
 }
 
-static inline __be32 vxlan_tun_id_to_vni(__be64 tun_id)
-{
-#if defined(__BIG_ENDIAN)
-   return (__force __be32)tun_id;
-#else
-   return (__force __be32)((__force u64)tun_id >> 32);
-#endif
-}
-
-static inline __be64 vxlan_vni_to_tun_id(__be32 vni)
-{
-#if defined(__BIG_ENDIAN)
-   return (__force __be64)vni;
-#else
-   return (__force __be64)((u64)(__force u32)vni << 32);
-#endif
-}
-
 static inline size_t vxlan_rco_start(__be32 vni_field)
 {
return be32_to_cpu(vni_field & VXLAN_RCO_MASK) << VXLAN_RCO_SHIFT;
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index 113cc43..576f705 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -246,25 +246,6 @@ static void gre_err(struct sk_buff *skb, u32 info)
ipgre_err(skb, info, );
 }
 
-static __be64 key_to_tunnel_id(__be32 key)
-{
-#ifdef __BIG_ENDIAN
-   return (__force __be64)((__force u32)key);
-#else
-   return (__force __be64)((__force u64)key << 32);
-#endif
-}
-
-/* Returns the least-significant 32 bits of a __be64. */
-static __be32 tunnel_id_to_key(__be64 x)
-{
-#ifdef __BIG_ENDIAN
-   return (__force __be32)x;
-#else
-   return (__force __be32)((__force u64)x >> 32);
-#endif
-}
-
 static int __ipgre_rcv(struct sk_buff *skb, const struct tnl_ptk_info *tpi,
   struct ip_tunnel_net *itn, int hdr_len, bool raw_proto)
 {
@@ -290,7 +271,7 @@ static int __ipgre_rcv(struct sk_buff *skb, const struct 
tnl_ptk_info *tpi,
__be64 tun_id;
 
flags = tpi->flags & (TUNNEL_CSUM | TUNNEL_KEY);
-   tun_id = key_to_tunnel_id(tpi->key);
+   tun_id = key32_to_tunnel_id(tpi->key);
tun_dst = ip_tun_rx_dst(skb, flags, tun_id, 0);
if (!tun_dst)
return PACKET_REJECT;
@@ -446,7 +427,7 @@ static void gre_fb_xmit(struct sk_buff *skb, stru

Re: [PATCH net-next V4 4/4] net/sched: Introduce act_tunnel_key

2016-09-01 Thread Hadar Hen Zion
On Thu, Sep 1, 2016 at 4:16 PM, Eric Dumazet <eric.duma...@gmail.com> wrote:
> On Thu, 2016-09-01 at 12:28 +0300, Hadar Hen Zion wrote:
>
>>
>> As you suggested above, I can do it by adding "int action" to struct
>> tcf_tunnel_key_paramse.
>> But, it means that act_tunnel_key would have a different behavior than
>> all the other actions and even though
>> "struct tc_action" has a designated parameters to store this action we
>> won't use it.
>> So it won't be completely clean...
>>
>> Do you think we have a cleaner way to protect it?
>
> Fact that the act_ modules had a spinlock made them all share the same
> structure.
>
> Now we want RCU protection, here is the thing.
>
> Say you want to access 3 different fields, A, B and C.
>
> If you put A and B in the rcu protected pointer, but leave C in the
> 'control part, protected by spinlock'
>
> Then your fast path wont be able to have a consistent view of 3
> variables A, B C.
>
> It might read an old value of A & B, and the recently updated C,
>
> Or it might read an old C, and the updated values of A & B

Yes, agree.

I'll add 'action' to struct tcf_tunnel_key_params.

Thanks,
Hadar


>
> As Cong very kindly pointed to us/me, if we want to be 'clean', we want
> to make sure we read a consistent 3-tuple.
>
> I will send updates when I have time to act_mirred.c
>
>


Re: [PATCH net-next V4 4/4] net/sched: Introduce act_tunnel_key

2016-09-01 Thread Hadar Hen Zion
On Wed, Aug 31, 2016 at 8:44 PM, Shmulik Ladkani
<shmulik.ladk...@gmail.com> wrote:
> Hi,
>
> On Wed, 31 Aug 2016 15:46:24 +0300 Hadar Hen Zion <had...@mellanox.com> wrote:
>> +static int tunnel_key_init(struct net *net, struct nlattr *nla,
>> +struct nlattr *est, struct tc_action **a,
>> +int ovr, int bind)
>> +{
>> + struct tc_action_net *tn = net_generic(net, tunnel_key_net_id);
>> + struct nlattr *tb[TCA_TUNNEL_KEY_MAX + 1];
>> + struct metadata_dst *metadata = NULL;
>> + struct tc_tunnel_key *parm;
>> + struct tcf_tunnel_key *t;
>> + struct tcf_tunnel_key_params *params_old;
>> + struct tcf_tunnel_key_params *params_new;
>> + __be64 key_id;
>> + bool exists = false;
>> + int ret = 0;
>> + int err;
>> +
>> + if (!nla)
>> + return -EINVAL;
>> +
>> + err = nla_parse_nested(tb, TCA_TUNNEL_KEY_MAX, nla, tunnel_key_policy);
>> + if (err < 0)
>> + return err;
>> +
>> + if (!tb[TCA_TUNNEL_KEY_PARMS])
>> + return -EINVAL;
>> +
>> + parm = nla_data(tb[TCA_TUNNEL_KEY_PARMS]);
>> + exists = tcf_hash_check(tn, parm->index, a, bind);
>> + if (exists && bind)
>> + return 0;
>> +
>> + switch (parm->t_action) {
>> + case TCA_TUNNEL_KEY_ACT_RELEASE:
>> + break;
>> + case TCA_TUNNEL_KEY_ACT_SET:
>> + if (!tb[TCA_TUNNEL_KEY_ENC_KEY_ID]) {
>> + ret = -EINVAL;
>> + goto err_out;
>> + }
>> +
>> + key_id = 
>> key32_to_tunnel_id(nla_get_be32(tb[TCA_TUNNEL_KEY_ENC_KEY_ID]));
>> +
>> + if (tb[TCA_TUNNEL_KEY_ENC_IPV4_SRC] &&
>> + tb[TCA_TUNNEL_KEY_ENC_IPV4_DST]) {
>> + __be32 saddr;
>> + __be32 daddr;
>> +
>> + saddr = 
>> nla_get_in_addr(tb[TCA_TUNNEL_KEY_ENC_IPV4_SRC]);
>> + daddr = 
>> nla_get_in_addr(tb[TCA_TUNNEL_KEY_ENC_IPV4_DST]);
>> +
>> + metadata = __ip_tun_set_dst(saddr, daddr, 0, 0,
>> + TUNNEL_KEY, key_id, 0);
>> + } else if (tb[TCA_TUNNEL_KEY_ENC_IPV6_SRC] &&
>> +tb[TCA_TUNNEL_KEY_ENC_IPV6_DST]) {
>> + struct in6_addr saddr;
>> + struct in6_addr daddr;
>> +
>> + saddr = 
>> nla_get_in6_addr(tb[TCA_TUNNEL_KEY_ENC_IPV6_SRC]);
>> + daddr = 
>> nla_get_in6_addr(tb[TCA_TUNNEL_KEY_ENC_IPV6_DST]);
>> +
>> + metadata = __ipv6_tun_set_dst(, , 0, 0, 0,
>> +   TUNNEL_KEY, key_id, 0);
>> + }
>> +
>> + if (!metadata) {
>> + ret = -EINVAL;
>> + goto err_out;
>> + }
>> +
>> + metadata->u.tun_info.mode |= IP_TUNNEL_INFO_TX;
>> + break;
>> + default:
>> + goto err_out;
>> + }
>> +
>> + if (!exists) {
>> + ret = tcf_hash_create(tn, parm->index, est, a,
>> +   _tunnel_key_ops, bind, true);
>> + if (ret)
>> + return ret;
>> +
>> + ret = ACT_P_CREATED;
>> + } else {
>> + tcf_hash_release(*a, bind);
>> + if (!ovr)
>> + return -EEXIST;
>> + }
>> +
>> + t = to_tunnel_key(*a);
>> +
>> + ASSERT_RTNL();
>> + params_new = kzalloc(sizeof(*params_new),
>> +  GFP_KERNEL);
>
> nit: Fits oneline. Fix if patch needs other amendments.

Sure, will do.
>
>> + if (unlikely(!params_new)) {
>> + if (ovr)
>> + tcf_hash_release(*a, bind);
>> + return -ENOMEM;
>
> Seems we need to call tcf_hash_release regardless 'ovr':
> In case (!exist), we've created a new hash few lines above.
> Therefore in failure, don't we need a tcf_hash_release()?
> Am I missing something?

You are right, "if (ovr)" line should be removed.
>
>> + }
>> +
>> + params_old = rtnl_dereference(t->params);
>> +
>> + t->tcf_action = parm->action;
>> + params_n

Re: [PATCH net-next V4 4/4] net/sched: Introduce act_tunnel_key

2016-09-01 Thread Hadar Hen Zion
On Wed, Aug 31, 2016 at 9:39 PM, Eric Dumazet <eduma...@google.com> wrote:
> On Wed, Aug 31, 2016 at 5:46 AM, Hadar Hen Zion <had...@mellanox.com> wrote:
>>
>> From: Amir Vadai <a...@vadai.me>
>>
>> This action could be used before redirecting packets to a shared tunnel
>> device, or when redirecting packets arriving from a such a device.
>>
>>
>> +
>> +struct tcf_tunnel_key_params {
>> +   struct rcu_head rcu;
>> +   int tcft_action;
>
> Also add " int action;"
>
> (see why later)
>
>> +   struct metadata_dst *tcft_enc_metadata;
>> +};
>> +
>
>
>
>> +
>> +static int tunnel_key_act(struct sk_buff *skb, const struct tc_action *a,
>> + struct tcf_result *res)
>> +{
>> +   struct tcf_tunnel_key *t = to_tunnel_key(a);
>> +   struct tcf_tunnel_key_params *params;
>> +   int action;
>> +
>> +   rcu_read_lock();
>> +
>> +   params = rcu_dereference(t->params);
>> +
>> +   tcf_lastuse_update(>tcf_tm);
>> +   bstats_cpu_update(this_cpu_ptr(t->common.cpu_bstats), skb);
>> +   action = t->tcf_action;
>
> Ideally, you should read param->action instead of t->tcf_action to be
> completely clean.

As you suggested above, I can do it by adding "int action" to struct
tcf_tunnel_key_paramse.
But, it means that act_tunnel_key would have a different behavior than
all the other actions and even though
"struct tc_action" has a designated parameters to store this action we
won't use it.
So it won't be completely clean...

Do you think we have a cleaner way to protect it?

>
>> +
>> +   switch (params->tcft_action) {
>> +   case TCA_TUNNEL_KEY_ACT_RELEASE:
>> +   skb_dst_drop(skb);
>> +   break;
>> +   case TCA_TUNNEL_KEY_ACT_SET:
>> +   skb_dst_drop(skb);
>> +   skb_dst_set(skb, dst_clone(>tcft_enc_metadata->dst));
>> +   break;
>> +   default:
>> +   WARN_ONCE(1, "Bad tunnel_key action.\n");
>> +   break;
>> +   }
>> +
>> +   rcu_read_unlock();
>> +
>> +   return action;
>> +}
>>


[PATCH iproute2 V2 0/2] tc: flower, m_vlan: Introduce vlan tag support

2016-09-01 Thread Hadar Hen Zion
Hi,

This patchset introduce vlan tag support to the tc flower classifier.
In addition to adding vlan priority to vlan push action.

- The first patch adds classification according to vlan id and vlan priority to 
the flower.
- The second patch adds support for vlan priority to the current vlan push 
action.

Changes from v1:
- Remove VLAN_PRIO_MASK and VLAN_VID_MASK defines from tc_vlan.h file


Hadar Hen Zion (2):
  tc: flower: Introduce vlan support
  tc: m_vlan: Add priority option to push vlan action

 man/man8/tc-flower.8 | 25 -
 man/man8/tc-vlan.8   |  5 
 tc/f_flower.c| 78 ++--
 tc/m_vlan.c  | 22 ++-
 4 files changed, 125 insertions(+), 5 deletions(-)

-- 
1.8.3.1



[PATCH iproute2 V2 2/2] tc: m_vlan: Add priority option to push vlan action

2016-09-01 Thread Hadar Hen Zion
The current vlan push action supports only vid and protocol options.
Add priority option.

Example script that adds vlan push action with vid and priority:

tc filter add dev veth0 protocol ip parent : \
flower \
indev veth0 \
action vlan push id 100 priority 5

Signed-off-by: Hadar Hen Zion <had...@mellanox.com>
Acked-by: Jiri Pirko <j...@mellanox.com>
---
 man/man8/tc-vlan.8 |  5 +
 tc/m_vlan.c| 22 +-
 2 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/man/man8/tc-vlan.8 b/man/man8/tc-vlan.8
index 4bfd72b..4d0c5c8 100644
--- a/man/man8/tc-vlan.8
+++ b/man/man8/tc-vlan.8
@@ -12,6 +12,8 @@ vlan - vlan manipulation module
 .IR PUSH " := "
 .BR push " [ " protocol
 .IR VLANPROTO " ]"
+.BR " [ " priority
+.IR VLANPRIO " ] "
 .BI id " VLANID"
 
 .ti -8
@@ -55,6 +57,9 @@ for hexadecimal interpretation, etc.).
 Choose the VLAN protocol to use. At the time of writing, the kernel accepts 
only
 .BR 802.1Q " or " 802.1ad .
 .TP
+.BI priority " VLANPRIO"
+Choose the VLAN priority to use. Decimal number in range of 0-7.
+.TP
 .I CONTROL
 How to continue after executing this action.
 .RS
diff --git a/tc/m_vlan.c b/tc/m_vlan.c
index ac63d9e..05a63b4 100644
--- a/tc/m_vlan.c
+++ b/tc/m_vlan.c
@@ -22,7 +22,7 @@
 static void explain(void)
 {
fprintf(stderr, "Usage: vlan pop\n");
-   fprintf(stderr, "   vlan push [ protocol VLANPROTO ] id VLANID 
[CONTROL]\n");
+   fprintf(stderr, "   vlan push [ protocol VLANPROTO ] id VLANID [ 
priority VLANPRIO ] [CONTROL]\n");
fprintf(stderr, "   VLANPROTO is one of 802.1Q or 802.1AD\n");
fprintf(stderr, "with default: 802.1Q\n");
fprintf(stderr, "   CONTROL := reclassify | pipe | drop | continue 
| pass\n");
@@ -45,6 +45,8 @@ static int parse_vlan(struct action_util *a, int *argc_p, 
char ***argv_p,
int id_set = 0;
__u16 proto;
int proto_set = 0;
+   __u8 prio;
+   int prio_set = 0;
struct tc_vlan parm = { 0 };
 
if (matches(*argv, "vlan") != 0)
@@ -91,6 +93,17 @@ static int parse_vlan(struct action_util *a, int *argc_p, 
char ***argv_p,
if (ll_proto_a2n(, *argv))
invarg("protocol is invalid", *argv);
proto_set = 1;
+   } else if (matches(*argv, "priority") == 0) {
+   if (action != TCA_VLAN_ACT_PUSH) {
+   fprintf(stderr, "\"%s\" is only valid for 
push\n",
+   *argv);
+   explain();
+   return -1;
+   }
+   NEXT_ARG();
+   if (get_u8(, *argv, 0) || (prio & ~0x7))
+   invarg("prio is invalid", *argv);
+   prio_set = 1;
} else if (matches(*argv, "help") == 0) {
usage();
} else {
@@ -138,6 +151,9 @@ static int parse_vlan(struct action_util *a, int *argc_p, 
char ***argv_p,
 
addattr_l(n, MAX_MSG, TCA_VLAN_PUSH_VLAN_PROTOCOL, , 2);
}
+   if (prio_set)
+   addattr8(n, MAX_MSG, TCA_VLAN_PUSH_VLAN_PRIORITY, prio);
+
tail->rta_len = (char *)NLMSG_TAIL(n) - (char *)tail;
 
*argc_p = argc;
@@ -180,6 +196,10 @@ static int print_vlan(struct action_util *au, FILE *f, 
struct rtattr *arg)

ll_proto_n2a(rta_getattr_u16(tb[TCA_VLAN_PUSH_VLAN_PROTOCOL]),
 b1, sizeof(b1)));
}
+   if (tb[TCA_VLAN_PUSH_VLAN_PRIORITY]) {
+   val = rta_getattr_u8(tb[TCA_VLAN_PUSH_VLAN_PRIORITY]);
+   fprintf(f, " priority %u", val);
+   }
break;
}
fprintf(f, " %s", action_n2a(parm->action));
-- 
1.8.3.1



[PATCH iproute2 V2 1/2] tc: flower: Introduce vlan support

2016-09-01 Thread Hadar Hen Zion
Classification according to vlan id and vlan priority.

Example script that adds vlan filter:

 # add ingress qdisc
 tc qdisc add dev ens4f0 ingress

 # add a flower filter with vlan id and priority classification
 tc filter add dev ens4f0 protocol 802.1Q parent : \
flower \
indev ens4f0 \
vlan_ethtype ipv4 \
vlan_id 100 \
vlan_prio 3 \
action vlan pop

Signed-off-by: Hadar Hen Zion <had...@mellanox.com>
Acked-by: Jiri Pirko <j...@mellanox.com>
---
 man/man8/tc-flower.8 | 25 -
 tc/f_flower.c| 78 ++--
 2 files changed, 99 insertions(+), 4 deletions(-)

diff --git a/man/man8/tc-flower.8 b/man/man8/tc-flower.8
index 9ae10e6..74f7664 100644
--- a/man/man8/tc-flower.8
+++ b/man/man8/tc-flower.8
@@ -23,7 +23,13 @@ flower \- flow based traffic control filter
 .R " | { "
 .BR dst_mac " | " src_mac " } "
 .IR mac_address " | "
-.BR eth_type " { " ipv4 " | " ipv6 " | "
+.BR eth_type " { " ipv4 " | " ipv6 " | " 802.1Q " | "
+.IR ETH_TYPE " } | "
+.B vlan_id
+.IR VID " | "
+.B vlan_prio
+.IR PRIORITY " | "
+.BR vlan_eth_type " { " ipv4 " | " ipv6 " | "
 .IR ETH_TYPE " } | "
 .BR ip_proto " { " tcp " | " udp " | "
 .IR IP_PROTO " } | { "
@@ -70,6 +76,23 @@ Do not process filter by hardware.
 Match on source or destination MAC address.
 .TP
 .BI eth_type " ETH_TYPE"
+Match on the next protocol.
+.I ETH_TYPE
+may be either
+.BR ipv4 , ipv6 , 802.1Q ,
+or an unsigned 16bit value in hexadecimal format.
+.TP
+.BI vlan_id " VID"
+Match on vlan tag id.
+.I VID
+is an unsigned 12bit value in decimal format.
+.TP
+.BI vlan_prio " priority"
+Match on vlan tag priority.
+.I PRIORITY
+is an unsigned 3bit value in decimal format.
+.TP
+.BI vlan_eth_type " VLAN_ETH_TYPE"
 Match on layer three protocol.
 .I ETH_TYPE
 may be either
diff --git a/tc/f_flower.c b/tc/f_flower.c
index 791ade7..2d31d1a 100644
--- a/tc/f_flower.c
+++ b/tc/f_flower.c
@@ -17,6 +17,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "utils.h"
 #include "tc_util.h"
@@ -30,6 +31,9 @@ static void explain(void)
fprintf(stderr, "\n");
fprintf(stderr, "Where: MATCH-LIST := [ MATCH-LIST ] MATCH\n");
fprintf(stderr, "   MATCH  := { indev DEV-NAME |\n");
+   fprintf(stderr, "   vlan_id VID |\n");
+   fprintf(stderr, "   vlan_prio PRIORITY |\n");
+   fprintf(stderr, "   vlan_ethtype [ ipv4 | ipv6 | 
ETH-TYPE ] |\n");
fprintf(stderr, "   dst_mac MAC-ADDR |\n");
fprintf(stderr, "   src_mac MAC-ADDR |\n");
fprintf(stderr, "   [ipv4 | ipv6 ] |\n");
@@ -61,6 +65,23 @@ static int flower_parse_eth_addr(char *str, int addr_type, 
int mask_type,
return 0;
 }
 
+static int flower_parse_vlan_eth_type(char *str, __be16 eth_type, int type,
+ __be16 *p_vlan_eth_type, struct nlmsghdr 
*n)
+{
+   __be16 vlan_eth_type;
+
+   if (eth_type != htons(ETH_P_8021Q)) {
+   fprintf(stderr, "Can't set \"vlan_ethtype\" if ethertype isn't 
802.1Q\n");
+   return -1;
+   }
+
+   if (ll_proto_a2n(_eth_type, str))
+   invarg("invalid vlan_ethtype", str);
+   addattr16(n, MAX_MSG, type, vlan_eth_type);
+   *p_vlan_eth_type = vlan_eth_type;
+   return 0;
+}
+
 static int flower_parse_ip_proto(char *str, __be16 eth_type, int type,
 __u8 *p_ip_proto, struct nlmsghdr *n)
 {
@@ -167,6 +188,7 @@ static int flower_parse_opt(struct filter_util *qu, char 
*handle,
struct tcmsg *t = NLMSG_DATA(n);
struct rtattr *tail;
__be16 eth_type = TC_H_MIN(t->tcm_info);
+   __be16 vlan_ethtype = 0;
__u8 ip_proto = 0xff;
__u32 flags = 0;
 
@@ -208,6 +230,41 @@ static int flower_parse_opt(struct filter_util *qu, char 
*handle,
NEXT_ARG();
strncpy(ifname, *argv, sizeof(ifname) - 1);
addattrstrz(n, MAX_MSG, TCA_FLOWER_INDEV, ifname);
+   } else if (matches(*argv, "vlan_id") == 0) {
+   __u16 vid;
+
+   NEXT_ARG();
+   if (eth_type != htons(ETH_P_8021Q)) {
+   fprintf(stderr, "Can't set \"vlan_id\" if 
ethertype isn't 802.1Q\n");
+   return -1;
+   

[PATCH net-next V4 2/4] net/dst: Utility functions to build dst_metadata without supplying an skb

2016-08-31 Thread Hadar Hen Zion
From: Amir Vadai <a...@vadai.me>

Extract __ip_tun_set_dst() and __ipv6_tun_set_dst() out of
ip_tun_rx_dst() and ipv6_tun_rx_dst(), to be used without supplying an
skb.

Signed-off-by: Amir Vadai <a...@vadai.me>
Signed-off-by: Hadar Hen Zion <had...@mellanox.com>
---
 include/net/dst_metadata.h | 45 -
 1 file changed, 32 insertions(+), 13 deletions(-)

diff --git a/include/net/dst_metadata.h b/include/net/dst_metadata.h
index 5db9f59..49e8847 100644
--- a/include/net/dst_metadata.h
+++ b/include/net/dst_metadata.h
@@ -112,12 +112,10 @@ static inline struct ip_tunnel_info 
*skb_tunnel_info_unclone(struct sk_buff *skb
return >u.tun_info;
 }
 
-static inline struct metadata_dst *ip_tun_rx_dst(struct sk_buff *skb,
-__be16 flags,
-__be64 tunnel_id,
-int md_size)
+static inline struct metadata_dst *
+__ip_tun_set_dst(__be32 saddr, __be32 daddr, __u8 tos, __u8 ttl,
+__be16 flags, __be64 tunnel_id, int md_size)
 {
-   const struct iphdr *iph = ip_hdr(skb);
struct metadata_dst *tun_dst;
 
tun_dst = tun_rx_dst(md_size);
@@ -125,17 +123,27 @@ static inline struct metadata_dst *ip_tun_rx_dst(struct 
sk_buff *skb,
return NULL;
 
ip_tunnel_key_init(_dst->u.tun_info.key,
-  iph->saddr, iph->daddr, iph->tos, iph->ttl,
+  saddr, daddr, tos, ttl,
   0, 0, 0, tunnel_id, flags);
return tun_dst;
 }
 
-static inline struct metadata_dst *ipv6_tun_rx_dst(struct sk_buff *skb,
+static inline struct metadata_dst *ip_tun_rx_dst(struct sk_buff *skb,
 __be16 flags,
 __be64 tunnel_id,
 int md_size)
 {
-   const struct ipv6hdr *ip6h = ipv6_hdr(skb);
+   const struct iphdr *iph = ip_hdr(skb);
+
+   return __ip_tun_set_dst(iph->saddr, iph->daddr, iph->tos, iph->ttl,
+   flags, tunnel_id, md_size);
+}
+
+static inline struct metadata_dst *
+__ipv6_tun_set_dst(const struct in6_addr *saddr, const struct in6_addr *daddr,
+  __u8 tos, __u8 ttl, __be32 label, __be16 flags,
+  __be64 tunnel_id, int md_size)
+{
struct metadata_dst *tun_dst;
struct ip_tunnel_info *info;
 
@@ -150,14 +158,25 @@ static inline struct metadata_dst *ipv6_tun_rx_dst(struct 
sk_buff *skb,
info->key.tp_src = 0;
info->key.tp_dst = 0;
 
-   info->key.u.ipv6.src = ip6h->saddr;
-   info->key.u.ipv6.dst = ip6h->daddr;
+   info->key.u.ipv6.src = *saddr;
+   info->key.u.ipv6.dst = *daddr;
 
-   info->key.tos = ipv6_get_dsfield(ip6h);
-   info->key.ttl = ip6h->hop_limit;
-   info->key.label = ip6_flowlabel(ip6h);
+   info->key.tos = tos;
+   info->key.ttl = ttl;
+   info->key.label = label;
 
return tun_dst;
 }
 
+static inline struct metadata_dst *
+ipv6_tun_rx_dst(struct sk_buff *skb, __be16 flags, __be64 tunnel_id,
+   int md_size)
+{
+   const struct ipv6hdr *ip6h = ipv6_hdr(skb);
+
+   return __ipv6_tun_set_dst(>saddr, >daddr,
+   ipv6_get_dsfield(ip6h), ip6h->hop_limit,
+   ip6_flowlabel(ip6h), flags, tunnel_id,
+   md_size);
+}
 #endif /* __NET_DST_METADATA_H */
-- 
1.8.3.1



[PATCH net-next V4 1/4] net/ip_tunnels: Introduce tunnel_id_to_key32() and key32_to_tunnel_id()

2016-08-31 Thread Hadar Hen Zion
From: Amir Vadai <a...@vadai.me>

Add utility functions to convert a 32 bits key into a 64 bits tunnel and
vice versa.
These functions will be used instead of cloning code in GRE and VXLAN,
and in tc act_iptunnel which will be introduced in a following patch in
this patchset.

Signed-off-by: Amir Vadai <a...@vadai.me>
Signed-off-by: Hadar Hen Zion <had...@mellanox.com>
Reviewed-by: Shmulik Ladkani <shmulik.ladk...@gmail.com>
Acked-by: Jiri Benc <jb...@redhat.com>
---
 drivers/net/vxlan.c  |  4 ++--
 include/net/ip_tunnels.h | 19 +++
 include/net/vxlan.h  | 18 --
 net/ipv4/ip_gre.c| 23 ++-
 4 files changed, 23 insertions(+), 41 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 3f7e0d2..2444daa 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -1293,7 +1293,7 @@ static int vxlan_rcv(struct sock *sk, struct sk_buff *skb)
struct metadata_dst *tun_dst;
 
tun_dst = udp_tun_rx_dst(skb, vxlan_get_sk_family(vs), 
TUNNEL_KEY,
-vxlan_vni_to_tun_id(vni), sizeof(*md));
+key32_to_tunnel_id(vni), sizeof(*md));
 
if (!tun_dst)
goto drop;
@@ -1947,7 +1947,7 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct 
net_device *dev,
goto drop;
}
dst_port = info->key.tp_dst ? : vxlan->cfg.dst_port;
-   vni = vxlan_tun_id_to_vni(info->key.tun_id);
+   vni = tunnel_id_to_key32(info->key.tun_id);
remote_ip.sa.sa_family = ip_tunnel_info_af(info);
if (remote_ip.sa.sa_family == AF_INET) {
remote_ip.sin.sin_addr.s_addr = info->key.u.ipv4.dst;
diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
index a5e7035..e598c63 100644
--- a/include/net/ip_tunnels.h
+++ b/include/net/ip_tunnels.h
@@ -222,6 +222,25 @@ static inline unsigned short ip_tunnel_info_af(const 
struct ip_tunnel_info
return tun_info->mode & IP_TUNNEL_INFO_IPV6 ? AF_INET6 : AF_INET;
 }
 
+static inline __be64 key32_to_tunnel_id(__be32 key)
+{
+#ifdef __BIG_ENDIAN
+   return (__force __be64)key;
+#else
+   return (__force __be64)((__force u64)key << 32);
+#endif
+}
+
+/* Returns the least-significant 32 bits of a __be64. */
+static inline __be32 tunnel_id_to_key32(__be64 tun_id)
+{
+#ifdef __BIG_ENDIAN
+   return (__force __be32)tun_id;
+#else
+   return (__force __be32)((__force u64)tun_id >> 32);
+#endif
+}
+
 #ifdef CONFIG_INET
 
 int ip_tunnel_init(struct net_device *dev);
diff --git a/include/net/vxlan.h b/include/net/vxlan.h
index b96d036..0255613 100644
--- a/include/net/vxlan.h
+++ b/include/net/vxlan.h
@@ -350,24 +350,6 @@ static inline __be32 vxlan_vni_field(__be32 vni)
 #endif
 }
 
-static inline __be32 vxlan_tun_id_to_vni(__be64 tun_id)
-{
-#if defined(__BIG_ENDIAN)
-   return (__force __be32)tun_id;
-#else
-   return (__force __be32)((__force u64)tun_id >> 32);
-#endif
-}
-
-static inline __be64 vxlan_vni_to_tun_id(__be32 vni)
-{
-#if defined(__BIG_ENDIAN)
-   return (__force __be64)vni;
-#else
-   return (__force __be64)((u64)(__force u32)vni << 32);
-#endif
-}
-
 static inline size_t vxlan_rco_start(__be32 vni_field)
 {
return be32_to_cpu(vni_field & VXLAN_RCO_MASK) << VXLAN_RCO_SHIFT;
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index 113cc43..576f705 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -246,25 +246,6 @@ static void gre_err(struct sk_buff *skb, u32 info)
ipgre_err(skb, info, );
 }
 
-static __be64 key_to_tunnel_id(__be32 key)
-{
-#ifdef __BIG_ENDIAN
-   return (__force __be64)((__force u32)key);
-#else
-   return (__force __be64)((__force u64)key << 32);
-#endif
-}
-
-/* Returns the least-significant 32 bits of a __be64. */
-static __be32 tunnel_id_to_key(__be64 x)
-{
-#ifdef __BIG_ENDIAN
-   return (__force __be32)x;
-#else
-   return (__force __be32)((__force u64)x >> 32);
-#endif
-}
-
 static int __ipgre_rcv(struct sk_buff *skb, const struct tnl_ptk_info *tpi,
   struct ip_tunnel_net *itn, int hdr_len, bool raw_proto)
 {
@@ -290,7 +271,7 @@ static int __ipgre_rcv(struct sk_buff *skb, const struct 
tnl_ptk_info *tpi,
__be64 tun_id;
 
flags = tpi->flags & (TUNNEL_CSUM | TUNNEL_KEY);
-   tun_id = key_to_tunnel_id(tpi->key);
+   tun_id = key32_to_tunnel_id(tpi->key);
tun_dst = ip_tun_rx_dst(skb, flags, tun_id, 0);
if (!tun_dst)
return PACKET_REJECT;
@@ -446,7 +427,7 @@ static void gre_fb_xmit(struct sk_buff *skb, struct 
net_device *dev,
 
flags 

[PATCH net-next V4 4/4] net/sched: Introduce act_tunnel_key

2016-08-31 Thread Hadar Hen Zion
From: Amir Vadai <a...@vadai.me>

This action could be used before redirecting packets to a shared tunnel
device, or when redirecting packets arriving from a such a device.

The action will release the metadata created by the tunnel device
(decap), or set the metadata with the specified values for encap
operation.

For example, the following flower filter will forward all ICMP packets
destined to 11.11.11.2 through the shared vxlan device 'vxlan0'. Before
redirecting, a metadata for the vxlan tunnel is created using the
tunnel_key action and it's arguments:

$ filter add dev net0 protocol ip parent : \
flower \
  ip_proto 1 \
  dst_ip 11.11.11.2 \
action tunnel_key set \
  src_ip 11.11.0.1 \
  dst_ip 11.11.0.2 \
  id 11 \
action mirred egress redirect dev vxlan0

Signed-off-by: Amir Vadai <a...@vadai.me>
Signed-off-by: Hadar Hen Zion <had...@mellanox.com>
---
 include/net/tc_act/tc_tunnel_key.h|  30 +++
 include/uapi/linux/tc_act/tc_tunnel_key.h |  42 
 net/sched/Kconfig |  11 +
 net/sched/Makefile|   1 +
 net/sched/act_tunnel_key.c| 349 ++
 5 files changed, 433 insertions(+)
 create mode 100644 include/net/tc_act/tc_tunnel_key.h
 create mode 100644 include/uapi/linux/tc_act/tc_tunnel_key.h
 create mode 100644 net/sched/act_tunnel_key.c

diff --git a/include/net/tc_act/tc_tunnel_key.h 
b/include/net/tc_act/tc_tunnel_key.h
new file mode 100644
index 000..8610504
--- /dev/null
+++ b/include/net/tc_act/tc_tunnel_key.h
@@ -0,0 +1,30 @@
+/*
+ * Copyright (c) 2016, Amir Vadai <a...@vadai.me>
+ * Copyright (c) 2016, Mellanox Technologies. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#ifndef __NET_TC_TUNNEL_KEY_H
+#define __NET_TC_TUNNEL_KEY_H
+
+#include 
+
+struct tcf_tunnel_key_params {
+   struct rcu_head rcu;
+   int tcft_action;
+   struct metadata_dst *tcft_enc_metadata;
+};
+
+struct tcf_tunnel_key {
+   struct tc_action  common;
+   struct tcf_tunnel_key_params *params;
+};
+
+#define to_tunnel_key(a) ((struct tcf_tunnel_key *)a)
+
+#endif /* __NET_TC_TUNNEL_KEY_H */
+
diff --git a/include/uapi/linux/tc_act/tc_tunnel_key.h 
b/include/uapi/linux/tc_act/tc_tunnel_key.h
new file mode 100644
index 000..f9ddf53
--- /dev/null
+++ b/include/uapi/linux/tc_act/tc_tunnel_key.h
@@ -0,0 +1,42 @@
+/*
+ * Copyright (c) 2016, Amir Vadai <a...@vadai.me>
+ * Copyright (c) 2016, Mellanox Technologies. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#ifndef __LINUX_TC_TUNNEL_KEY_H
+#define __LINUX_TC_TUNNEL_KEY_H
+
+#include 
+
+#define TCA_ACT_TUNNEL_KEY 17
+
+#define TCA_TUNNEL_KEY_ACT_SET 1
+#define TCA_TUNNEL_KEY_ACT_RELEASE  2
+
+struct tc_tunnel_key {
+   tc_gen;
+   int t_action;
+};
+
+enum {
+   TCA_TUNNEL_KEY_UNSPEC,
+   TCA_TUNNEL_KEY_TM,
+   TCA_TUNNEL_KEY_PARMS,
+   TCA_TUNNEL_KEY_ENC_IPV4_SRC,/* be32 */
+   TCA_TUNNEL_KEY_ENC_IPV4_DST,/* be32 */
+   TCA_TUNNEL_KEY_ENC_IPV6_SRC,/* struct in6_addr */
+   TCA_TUNNEL_KEY_ENC_IPV6_DST,/* struct in6_addr */
+   TCA_TUNNEL_KEY_ENC_KEY_ID,  /* be64 */
+   TCA_TUNNEL_KEY_PAD,
+   __TCA_TUNNEL_KEY_MAX,
+};
+
+#define TCA_TUNNEL_KEY_MAX (__TCA_TUNNEL_KEY_MAX - 1)
+
+#endif
+
diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index ccf931b..72e3426 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -761,6 +761,17 @@ config NET_ACT_IFE
  To compile this code as a module, choose M here: the
  module will be called act_ife.
 
+config NET_ACT_TUNNEL_KEY
+tristate "IP tunnel metadata manipulation"
+depends on NET_CLS_ACT
+---help---
+ Say Y here to set/release ip tunnel metadata.
+
+ If unsure, say N.
+
+ To compile this code as a module, choose M here: the
+ module will be called act_tunnel_key.
+
 config NET_IFE_SKBMARK
 tristate "Support to encoding decoding skb mark on IFE action"
 depends on NET_ACT_IFE
diff --git a/net/sched/Makefile b/net/sched/Makefile
index ae088a5..b9d046b 100644
--- a/net/sched/Makefile
+++ b/net/sched/Makefile
@@ -22,6 +22,7 @@ obj-$(CONFIG_NET_ACT_CONNMARK)+= act_connmark.o
 obj-$(CONFIG_NET_ACT_IFE)  += act_ife.o
 obj-$(CONFIG_NET_IFE_SKBMARK)  += act_meta_mark.o
 obj-$(CONFIG_NET_IFE_SKBPRIO)  += act_meta_skbprio.o
+obj-$(

[PATCH net-next V4 3/4] net/sched: cls_flower: Classify packet in ip tunnels

2016-08-31 Thread Hadar Hen Zion
From: Amir Vadai <a...@vadai.me>

Introduce classifying by metadata extracted by the tunnel device.
Outer header fields - source/dest ip and tunnel id, are extracted from
the metadata when classifying.

For example, the following will add a filter on the ingress Qdisc of shared
vxlan device named 'vxlan0'. To forward packets with outer src ip
11.11.0.2, dst ip 11.11.0.1 and tunnel id 11. The packets will be
forwarded to tap device 'vnet0' (after metadata is released):

$ filter add dev vxlan0 protocol ip parent : \
flower \
  enc_src_ip 11.11.0.2 \
  enc_dst_ip 11.11.0.1 \
  enc_key_id 11 \
  dst_ip 11.11.11.1 \
action tunnel_key release \
action mirred egress redirect dev vnet0

The action tunnel_key, will be introduced in the next patch in this
series.

Signed-off-by: Amir Vadai <a...@vadai.me>
Signed-off-by: Hadar Hen Zion <had...@mellanox.com>
---
 include/uapi/linux/pkt_cls.h |  11 +
 net/sched/cls_flower.c   | 101 ++-
 2 files changed, 110 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index 51b5b24..f9c287c 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -431,6 +431,17 @@ enum {
TCA_FLOWER_KEY_VLAN_ID,
TCA_FLOWER_KEY_VLAN_PRIO,
TCA_FLOWER_KEY_VLAN_ETH_TYPE,
+
+   TCA_FLOWER_KEY_ENC_KEY_ID,  /* be32 */
+   TCA_FLOWER_KEY_ENC_IPV4_SRC,/* be32 */
+   TCA_FLOWER_KEY_ENC_IPV4_SRC_MASK,/* be32 */
+   TCA_FLOWER_KEY_ENC_IPV4_DST,/* be32 */
+   TCA_FLOWER_KEY_ENC_IPV4_DST_MASK,/* be32 */
+   TCA_FLOWER_KEY_ENC_IPV6_SRC,/* struct in6_addr */
+   TCA_FLOWER_KEY_ENC_IPV6_SRC_MASK,/* struct in6_addr */
+   TCA_FLOWER_KEY_ENC_IPV6_DST,/* struct in6_addr */
+   TCA_FLOWER_KEY_ENC_IPV6_DST_MASK,/* struct in6_addr */
+
__TCA_FLOWER_MAX,
 };
 
diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index cf9ad5b..02b2a5b 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -23,9 +23,13 @@
 #include 
 #include 
 
+#include 
+#include 
+
 struct fl_flow_key {
int indev_ifindex;
struct flow_dissector_key_control control;
+   struct flow_dissector_key_control enc_control;
struct flow_dissector_key_basic basic;
struct flow_dissector_key_eth_addrs eth;
struct flow_dissector_key_vlan vlan;
@@ -35,6 +39,11 @@ struct fl_flow_key {
struct flow_dissector_key_ipv6_addrs ipv6;
};
struct flow_dissector_key_ports tp;
+   struct flow_dissector_key_keyid enc_key_id;
+   union {
+   struct flow_dissector_key_ipv4_addrs enc_ipv4;
+   struct flow_dissector_key_ipv6_addrs enc_ipv6;
+   };
 } __aligned(BITS_PER_LONG / 8); /* Ensure that we can do comparisons as longs. 
*/
 
 struct fl_flow_mask_range {
@@ -124,11 +133,31 @@ static int fl_classify(struct sk_buff *skb, const struct 
tcf_proto *tp,
struct cls_fl_filter *f;
struct fl_flow_key skb_key;
struct fl_flow_key skb_mkey;
+   struct ip_tunnel_info *info;
 
if (!atomic_read(>ht.nelems))
return -1;
 
fl_clear_masked_range(_key, >mask);
+
+   info = skb_tunnel_info(skb);
+   if (info) {
+   struct ip_tunnel_key *key = >key;
+
+   switch (ip_tunnel_info_af(info)) {
+   case AF_INET:
+   skb_key.enc_ipv4.src = key->u.ipv4.src;
+   skb_key.enc_ipv4.dst = key->u.ipv4.dst;
+   break;
+   case AF_INET6:
+   skb_key.enc_ipv6.src = key->u.ipv6.src;
+   skb_key.enc_ipv6.dst = key->u.ipv6.dst;
+   break;
+   }
+
+   skb_key.enc_key_id.keyid = tunnel_id_to_key32(key->tun_id);
+   }
+
skb_key.indev_ifindex = skb->skb_iif;
/* skb_flow_dissect() does not set n_proto in case an unknown protocol,
 * so do it rather here.
@@ -297,7 +326,15 @@ static const struct nla_policy fl_policy[TCA_FLOWER_MAX + 
1] = {
[TCA_FLOWER_KEY_VLAN_ID]= { .type = NLA_U16 },
[TCA_FLOWER_KEY_VLAN_PRIO]  = { .type = NLA_U8 },
[TCA_FLOWER_KEY_VLAN_ETH_TYPE]  = { .type = NLA_U16 },
-
+   [TCA_FLOWER_KEY_ENC_KEY_ID] = { .type = NLA_U32 },
+   [TCA_FLOWER_KEY_ENC_IPV4_SRC]   = { .type = NLA_U32 },
+   [TCA_FLOWER_KEY_ENC_IPV4_SRC_MASK] = { .type = NLA_U32 },
+   [TCA_FLOWER_KEY_ENC_IPV4_DST]   = { .type = NLA_U32 },
+   [TCA_FLOWER_KEY_ENC_IPV4_DST_MASK] = { .type = NLA_U32 },
+   [TCA_FLOWER_KEY_ENC_IPV6_SRC]   = { .len = sizeof(struct in6_addr) },
+   [TCA_FLOWER_KEY_ENC_IPV6_SRC_MASK] = { .len = sizeof(struct in6_addr) },
+   [TCA_FLOWER_KEY_ENC_IPV6_DST]   = { .len = sizeof(struct in6_addr) },
+   [TCA_FLOWER_KEY_ENC_IPV6_DST_MASK] = { .l

[PATCH net-next V4 0/4] net/sched: ip tunnel metadata set/release/classify by using TC

2016-08-31 Thread Hadar Hen Zion
Hi,

This patchset introduces ip tunnel manipulation support using the TC subsystem.

In the decap flow, it enables the user to redirect packets from a shared tunnel
device and classify by outer and inner headers. The outer headers are extracted
from the metadata and used by the flower filter. A new action act_tunnel_key,
releases the metadata.

In the encap flow, act_tunnel_key creates a metadata object to be used by the
shared tunnel device. The actual redirection to the tunnel device is done using
act_mirred.

For example:
$ tc qdisc add dev vnet0 ingress
$ tc filter add dev vnet0 protocol ip parent : \
flower \
 ip_proto 1 \
action tunnel_key set \
 src_ip 11.11.0.1 \
 dst_ip 11.11.0.2 \
 id 11 \
action mirred egress redirect dev vxlan0

$ tc qdisc add dev vxlan0 ingress
$ tc filter add dev vxlan0 protocol ip parent : \
flower \
 enc_src_ip 11.11.0.2 \
 enc_dst_ip 11.11.0.1 \
 enc_key_id 11 \
action tunnel_key release \
action mirred egress redirect dev vnet0

Amir & Hadar

Changes from V3:
- Use percpu stats
- No spinlock on datapatch - protecting parameters with rcu
- Fix buggy handling of set/release dst
- Use nla_get_in_addr and nla_put_in_addr
- Fix change logs
- Pass in6_addr by pointer
- Rename utility functions to start with double underscore

Changes from V2:
- Use union in struct fl_flow_key for enc_ipv6 and enc_ipv4.
- Rename functions _ip_tun_rx_dst and _ipv6_tun_rx_dst to _ip_tun_set_dst and
  _ipv6_tun_set_dst accordingly.
- Remove local parameter 'encapdecap' from tunnel_key_init function.
- Don't copy in6_addr values in tunnel_key_dump_addresses function, use 
pointers.

Changes from V1:
- More cleanups to key32_to_tunnel_id() and tunnel_id_to_key32()
- IPv6 Support added
- Set TUNNEL_KEY flag to make GRE work
- Handle zero tunnel id properly in act_tunnel_key
- Don't leave junk in decap action
- Fix bug in act_tunnel_key initialization where (exists & ocr) is true
- Remove BUG() from code
- Rename action to tunnel_key
- Improve grep-ability of code
- Reuse code from ip_tun_rx_dst() and ipv6_tun_rx_dst()

Changes from RFC:
- Add a new action instead of making mirred too complex
- No need to specify UDP port in action - it is already in the tunnel device
  configuration
- Added a decap operation to drop tunnel metadata

Amir Vadai (4):
  net/ip_tunnels: Introduce tunnel_id_to_key32() and
key32_to_tunnel_id()
  net/dst: Utility functions to build dst_metadata without supplying an
skb
  net/sched: cls_flower: Classify packet in ip tunnels
  net/sched: Introduce act_tunnel_key

 drivers/net/vxlan.c   |   4 +-
 include/net/dst_metadata.h|  45 ++--
 include/net/ip_tunnels.h  |  19 ++
 include/net/tc_act/tc_tunnel_key.h|  30 +++
 include/net/vxlan.h   |  18 --
 include/uapi/linux/pkt_cls.h  |  11 +
 include/uapi/linux/tc_act/tc_tunnel_key.h |  42 
 net/ipv4/ip_gre.c |  23 +-
 net/sched/Kconfig |  11 +
 net/sched/Makefile|   1 +
 net/sched/act_tunnel_key.c| 349 ++
 net/sched/cls_flower.c| 101 -
 12 files changed, 598 insertions(+), 56 deletions(-)
 create mode 100644 include/net/tc_act/tc_tunnel_key.h
 create mode 100644 include/uapi/linux/tc_act/tc_tunnel_key.h
 create mode 100644 net/sched/act_tunnel_key.c

-- 
1.8.3.1



[PATCH net-next V3 1/4] net/ip_tunnels: Introduce tunnel_id_to_key32() and key32_to_tunnel_id()

2016-08-25 Thread Hadar Hen Zion
From: Amir Vadai <a...@vadai.me>

Add utility functions to convert a 32 bits key into a 64 bits tunnel and
vice versa.
These functions will be used instead of cloning code in GRE and VXLAN,
and in tc act_iptunnel which will be introduced in a following patch in
this patchset.

Signed-off-by: Amir Vadai <a...@vadai.me>
Signed-off-by: Hadar Hen Zion <had...@mellanox.com>
Reviewed-by: Shmulik Ladkani <shmulik.ladk...@gmail.com>
---
 drivers/net/vxlan.c  |  4 ++--
 include/net/ip_tunnels.h | 19 +++
 include/net/vxlan.h  | 18 --
 net/ipv4/ip_gre.c| 23 ++-
 4 files changed, 23 insertions(+), 41 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index c0dda6f..b1ddf8f 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -1294,7 +1294,7 @@ static int vxlan_rcv(struct sock *sk, struct sk_buff *skb)
struct metadata_dst *tun_dst;
 
tun_dst = udp_tun_rx_dst(skb, vxlan_get_sk_family(vs), 
TUNNEL_KEY,
-vxlan_vni_to_tun_id(vni), sizeof(*md));
+key32_to_tunnel_id(vni), sizeof(*md));
 
if (!tun_dst)
goto drop;
@@ -1948,7 +1948,7 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct 
net_device *dev,
goto drop;
}
dst_port = info->key.tp_dst ? : vxlan->cfg.dst_port;
-   vni = vxlan_tun_id_to_vni(info->key.tun_id);
+   vni = tunnel_id_to_key32(info->key.tun_id);
remote_ip.sa.sa_family = ip_tunnel_info_af(info);
if (remote_ip.sa.sa_family == AF_INET) {
remote_ip.sin.sin_addr.s_addr = info->key.u.ipv4.dst;
diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
index a5e7035..e598c63 100644
--- a/include/net/ip_tunnels.h
+++ b/include/net/ip_tunnels.h
@@ -222,6 +222,25 @@ static inline unsigned short ip_tunnel_info_af(const 
struct ip_tunnel_info
return tun_info->mode & IP_TUNNEL_INFO_IPV6 ? AF_INET6 : AF_INET;
 }
 
+static inline __be64 key32_to_tunnel_id(__be32 key)
+{
+#ifdef __BIG_ENDIAN
+   return (__force __be64)key;
+#else
+   return (__force __be64)((__force u64)key << 32);
+#endif
+}
+
+/* Returns the least-significant 32 bits of a __be64. */
+static inline __be32 tunnel_id_to_key32(__be64 tun_id)
+{
+#ifdef __BIG_ENDIAN
+   return (__force __be32)tun_id;
+#else
+   return (__force __be32)((__force u64)tun_id >> 32);
+#endif
+}
+
 #ifdef CONFIG_INET
 
 int ip_tunnel_init(struct net_device *dev);
diff --git a/include/net/vxlan.h b/include/net/vxlan.h
index b96d036..0255613 100644
--- a/include/net/vxlan.h
+++ b/include/net/vxlan.h
@@ -350,24 +350,6 @@ static inline __be32 vxlan_vni_field(__be32 vni)
 #endif
 }
 
-static inline __be32 vxlan_tun_id_to_vni(__be64 tun_id)
-{
-#if defined(__BIG_ENDIAN)
-   return (__force __be32)tun_id;
-#else
-   return (__force __be32)((__force u64)tun_id >> 32);
-#endif
-}
-
-static inline __be64 vxlan_vni_to_tun_id(__be32 vni)
-{
-#if defined(__BIG_ENDIAN)
-   return (__force __be64)vni;
-#else
-   return (__force __be64)((u64)(__force u32)vni << 32);
-#endif
-}
-
 static inline size_t vxlan_rco_start(__be32 vni_field)
 {
return be32_to_cpu(vni_field & VXLAN_RCO_MASK) << VXLAN_RCO_SHIFT;
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index 113cc43..576f705 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -246,25 +246,6 @@ static void gre_err(struct sk_buff *skb, u32 info)
ipgre_err(skb, info, );
 }
 
-static __be64 key_to_tunnel_id(__be32 key)
-{
-#ifdef __BIG_ENDIAN
-   return (__force __be64)((__force u32)key);
-#else
-   return (__force __be64)((__force u64)key << 32);
-#endif
-}
-
-/* Returns the least-significant 32 bits of a __be64. */
-static __be32 tunnel_id_to_key(__be64 x)
-{
-#ifdef __BIG_ENDIAN
-   return (__force __be32)x;
-#else
-   return (__force __be32)((__force u64)x >> 32);
-#endif
-}
-
 static int __ipgre_rcv(struct sk_buff *skb, const struct tnl_ptk_info *tpi,
   struct ip_tunnel_net *itn, int hdr_len, bool raw_proto)
 {
@@ -290,7 +271,7 @@ static int __ipgre_rcv(struct sk_buff *skb, const struct 
tnl_ptk_info *tpi,
__be64 tun_id;
 
flags = tpi->flags & (TUNNEL_CSUM | TUNNEL_KEY);
-   tun_id = key_to_tunnel_id(tpi->key);
+   tun_id = key32_to_tunnel_id(tpi->key);
tun_dst = ip_tun_rx_dst(skb, flags, tun_id, 0);
if (!tun_dst)
return PACKET_REJECT;
@@ -446,7 +427,7 @@ static void gre_fb_xmit(struct sk_buff *skb, struct 
net_device *dev,
 
flags = tun_info->key.tun_flags & (TUNNEL_CSUM | TUNNEL_K

[PATCH net-next V3 4/4] net/sched: Introduce act_tunnel_key

2016-08-25 Thread Hadar Hen Zion
From: Amir Vadai <a...@vadai.me>

This action could be used before redirecting packets to a shared tunnel
device, or when redirecting packets arriving from a such a device.

The action will release the metadata created by the tunnel device
(decap), or set the metadata with the specified values for encap
operation.

For example, the following flower filter will forward all ICMP packets
destined to 11.11.11.2 through the shared vxlan device 'vxlan0'. Before
redirecting, a metadata for the vxlan tunnel is created using the
tunnel_key action and it's arguments:

$ filter add dev net0 protocol ip parent : \
flower \
  ip_proto 1 \
  dst_ip 11.11.11.2 \
action tunnel_key set\
  src_ip 11.11.0.1 \
  dst_ip 11.11.0.2 \
  id 11 \
action mirred egress redirect dev vxlan0

Signed-off-by: Amir Vadai <a...@vadai.me>
Signed-off-by: Hadar Hen Zion <had...@mellanox.com>
---
 include/net/tc_act/tc_tunnel_key.h|  25 +++
 include/uapi/linux/tc_act/tc_tunnel_key.h |  42 
 net/sched/Kconfig |  11 ++
 net/sched/Makefile|   1 +
 net/sched/act_tunnel_key.c| 312 ++
 5 files changed, 391 insertions(+)
 create mode 100644 include/net/tc_act/tc_tunnel_key.h
 create mode 100644 include/uapi/linux/tc_act/tc_tunnel_key.h
 create mode 100644 net/sched/act_tunnel_key.c

diff --git a/include/net/tc_act/tc_tunnel_key.h 
b/include/net/tc_act/tc_tunnel_key.h
new file mode 100644
index 000..18d5950
--- /dev/null
+++ b/include/net/tc_act/tc_tunnel_key.h
@@ -0,0 +1,25 @@
+/*
+ * Copyright (c) 2016, Amir Vadai <a...@vadai.me>
+ * Copyright (c) 2016, Mellanox Technologies. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#ifndef __NET_TC_TUNNEL_KEY_H
+#define __NET_TC_TUNNEL_KEY_H
+
+#include 
+
+struct tcf_tunnel_key {
+   struct tc_actioncommon;
+   int tcft_action;
+   struct metadata_dst *tcft_enc_metadata;
+};
+
+#define to_tunnel_key(a) ((struct tcf_tunnel_key *)a)
+
+#endif /* __NET_TC_TUNNEL_KEY_H */
+
diff --git a/include/uapi/linux/tc_act/tc_tunnel_key.h 
b/include/uapi/linux/tc_act/tc_tunnel_key.h
new file mode 100644
index 000..f9ddf53
--- /dev/null
+++ b/include/uapi/linux/tc_act/tc_tunnel_key.h
@@ -0,0 +1,42 @@
+/*
+ * Copyright (c) 2016, Amir Vadai <a...@vadai.me>
+ * Copyright (c) 2016, Mellanox Technologies. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#ifndef __LINUX_TC_TUNNEL_KEY_H
+#define __LINUX_TC_TUNNEL_KEY_H
+
+#include 
+
+#define TCA_ACT_TUNNEL_KEY 17
+
+#define TCA_TUNNEL_KEY_ACT_SET 1
+#define TCA_TUNNEL_KEY_ACT_RELEASE  2
+
+struct tc_tunnel_key {
+   tc_gen;
+   int t_action;
+};
+
+enum {
+   TCA_TUNNEL_KEY_UNSPEC,
+   TCA_TUNNEL_KEY_TM,
+   TCA_TUNNEL_KEY_PARMS,
+   TCA_TUNNEL_KEY_ENC_IPV4_SRC,/* be32 */
+   TCA_TUNNEL_KEY_ENC_IPV4_DST,/* be32 */
+   TCA_TUNNEL_KEY_ENC_IPV6_SRC,/* struct in6_addr */
+   TCA_TUNNEL_KEY_ENC_IPV6_DST,/* struct in6_addr */
+   TCA_TUNNEL_KEY_ENC_KEY_ID,  /* be64 */
+   TCA_TUNNEL_KEY_PAD,
+   __TCA_TUNNEL_KEY_MAX,
+};
+
+#define TCA_TUNNEL_KEY_MAX (__TCA_TUNNEL_KEY_MAX - 1)
+
+#endif
+
diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index ccf931b..72e3426 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -761,6 +761,17 @@ config NET_ACT_IFE
  To compile this code as a module, choose M here: the
  module will be called act_ife.
 
+config NET_ACT_TUNNEL_KEY
+tristate "IP tunnel metadata manipulation"
+depends on NET_CLS_ACT
+---help---
+ Say Y here to set/release ip tunnel metadata.
+
+ If unsure, say N.
+
+ To compile this code as a module, choose M here: the
+ module will be called act_tunnel_key.
+
 config NET_IFE_SKBMARK
 tristate "Support to encoding decoding skb mark on IFE action"
 depends on NET_ACT_IFE
diff --git a/net/sched/Makefile b/net/sched/Makefile
index ae088a5..b9d046b 100644
--- a/net/sched/Makefile
+++ b/net/sched/Makefile
@@ -22,6 +22,7 @@ obj-$(CONFIG_NET_ACT_CONNMARK)+= act_connmark.o
 obj-$(CONFIG_NET_ACT_IFE)  += act_ife.o
 obj-$(CONFIG_NET_IFE_SKBMARK)  += act_meta_mark.o
 obj-$(CONFIG_NET_IFE_SKBPRIO)  += act_meta_skbprio.o
+obj-$(CONFIG_NET_ACT_TUNNEL_KEY)+= act_tunnel_key.o
 obj-$(CONFIG_NET_SCH_FIFO) += sch_fifo.o
 obj-$(CONFIG_NET_SCH_CBQ)  += sch_cbq.o

[PATCH net-next V3 2/4] net/dst: Utility functions to build dst_metadata without supplying an skb

2016-08-25 Thread Hadar Hen Zion
From: Amir Vadai <a...@vadai.me>

Extract _ip_tun_rx_dst() and _ipv6_tun_rx_dst() out of ip_tun_rx_dst()
and ipv6_tun_rx_dst(), to be used without supplying an skb.

Signed-off-by: Amir Vadai <a...@vadai.me>
Signed-off-by: Hadar Hen Zion <had...@mellanox.com>
---
 include/net/dst_metadata.h | 45 -
 1 file changed, 32 insertions(+), 13 deletions(-)

diff --git a/include/net/dst_metadata.h b/include/net/dst_metadata.h
index 5db9f59..f82ea58 100644
--- a/include/net/dst_metadata.h
+++ b/include/net/dst_metadata.h
@@ -112,12 +112,10 @@ static inline struct ip_tunnel_info 
*skb_tunnel_info_unclone(struct sk_buff *skb
return >u.tun_info;
 }
 
-static inline struct metadata_dst *ip_tun_rx_dst(struct sk_buff *skb,
-__be16 flags,
-__be64 tunnel_id,
-int md_size)
+static inline struct metadata_dst *
+_ip_tun_set_dst(__be32 saddr, __be32 daddr, __u8 tos, __u8 ttl,
+   __be16 flags, __be64 tunnel_id, int md_size)
 {
-   const struct iphdr *iph = ip_hdr(skb);
struct metadata_dst *tun_dst;
 
tun_dst = tun_rx_dst(md_size);
@@ -125,17 +123,27 @@ static inline struct metadata_dst *ip_tun_rx_dst(struct 
sk_buff *skb,
return NULL;
 
ip_tunnel_key_init(_dst->u.tun_info.key,
-  iph->saddr, iph->daddr, iph->tos, iph->ttl,
+  saddr, daddr, tos, ttl,
   0, 0, 0, tunnel_id, flags);
return tun_dst;
 }
 
-static inline struct metadata_dst *ipv6_tun_rx_dst(struct sk_buff *skb,
+static inline struct metadata_dst *ip_tun_rx_dst(struct sk_buff *skb,
 __be16 flags,
 __be64 tunnel_id,
 int md_size)
 {
-   const struct ipv6hdr *ip6h = ipv6_hdr(skb);
+   const struct iphdr *iph = ip_hdr(skb);
+
+   return _ip_tun_set_dst(iph->saddr, iph->daddr, iph->tos, iph->ttl,
+ flags, tunnel_id, md_size);
+}
+
+static inline struct metadata_dst *
+_ipv6_tun_set_dst(const struct in6_addr saddr, const struct in6_addr daddr,
+ __u8 tos, __u8 ttl, __be32 label, __be16 flags,
+ __be64 tunnel_id, int md_size)
+{
struct metadata_dst *tun_dst;
struct ip_tunnel_info *info;
 
@@ -150,14 +158,25 @@ static inline struct metadata_dst *ipv6_tun_rx_dst(struct 
sk_buff *skb,
info->key.tp_src = 0;
info->key.tp_dst = 0;
 
-   info->key.u.ipv6.src = ip6h->saddr;
-   info->key.u.ipv6.dst = ip6h->daddr;
+   info->key.u.ipv6.src = saddr;
+   info->key.u.ipv6.dst = daddr;
 
-   info->key.tos = ipv6_get_dsfield(ip6h);
-   info->key.ttl = ip6h->hop_limit;
-   info->key.label = ip6_flowlabel(ip6h);
+   info->key.tos = tos;
+   info->key.ttl = ttl;
+   info->key.label = label;
 
return tun_dst;
 }
 
+static inline struct metadata_dst *
+ipv6_tun_rx_dst(struct sk_buff *skb, __be16 flags, __be64 tunnel_id,
+   int md_size)
+{
+   const struct ipv6hdr *ip6h = ipv6_hdr(skb);
+
+   return _ipv6_tun_set_dst(ip6h->saddr, ip6h->daddr,
+   ipv6_get_dsfield(ip6h), ip6h->hop_limit,
+   ip6_flowlabel(ip6h), flags, tunnel_id,
+   md_size);
+}
 #endif /* __NET_DST_METADATA_H */
-- 
1.8.3.1



[PATCH net-next V3 3/4] net/sched: cls_flower: Classify packet in ip tunnels

2016-08-25 Thread Hadar Hen Zion
From: Amir Vadai <a...@vadai.me>

Introduce classifying by metadata extracted by the tunnel device.
Outer header fields - source/dest ip and tunnel id, are extracted from
the metadata when classifying.

For example, the following will add a filter on the ingress Qdisc of shared
vxlan device named 'vxlan0'. To forward packets with outer src ip
11.11.0.2, dst ip 11.11.0.1 and tunnel id 11. The packets will be
forwarded to tap device 'vnet0' (after metadata is released):

$ filter add dev vxlan0 protocol ip parent : \
flower \
  enc_src_ip 11.11.0.2 \
  enc_dst_ip 11.11.0.1 \
  enc_key_id 11 \
  dst_ip 11.11.11.1 \
action iptunnel decap \
action mirred egress redirect dev vnet0

The action iptunnel, will be introduced in the next patch in this
series.

Signed-off-by: Amir Vadai <a...@vadai.me>
Signed-off-by: Hadar Hen Zion <had...@mellanox.com>
---
 include/uapi/linux/pkt_cls.h |  11 +
 net/sched/cls_flower.c   | 101 ++-
 2 files changed, 110 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index 51b5b24..f9c287c 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -431,6 +431,17 @@ enum {
TCA_FLOWER_KEY_VLAN_ID,
TCA_FLOWER_KEY_VLAN_PRIO,
TCA_FLOWER_KEY_VLAN_ETH_TYPE,
+
+   TCA_FLOWER_KEY_ENC_KEY_ID,  /* be32 */
+   TCA_FLOWER_KEY_ENC_IPV4_SRC,/* be32 */
+   TCA_FLOWER_KEY_ENC_IPV4_SRC_MASK,/* be32 */
+   TCA_FLOWER_KEY_ENC_IPV4_DST,/* be32 */
+   TCA_FLOWER_KEY_ENC_IPV4_DST_MASK,/* be32 */
+   TCA_FLOWER_KEY_ENC_IPV6_SRC,/* struct in6_addr */
+   TCA_FLOWER_KEY_ENC_IPV6_SRC_MASK,/* struct in6_addr */
+   TCA_FLOWER_KEY_ENC_IPV6_DST,/* struct in6_addr */
+   TCA_FLOWER_KEY_ENC_IPV6_DST_MASK,/* struct in6_addr */
+
__TCA_FLOWER_MAX,
 };
 
diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index 1e11e57..46f4f52 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -23,9 +23,13 @@
 #include 
 #include 
 
+#include 
+#include 
+
 struct fl_flow_key {
int indev_ifindex;
struct flow_dissector_key_control control;
+   struct flow_dissector_key_control enc_control;
struct flow_dissector_key_basic basic;
struct flow_dissector_key_eth_addrs eth;
struct flow_dissector_key_vlan vlan;
@@ -35,6 +39,11 @@ struct fl_flow_key {
struct flow_dissector_key_ipv6_addrs ipv6;
};
struct flow_dissector_key_ports tp;
+   struct flow_dissector_key_keyid enc_key_id;
+   union {
+   struct flow_dissector_key_ipv4_addrs enc_ipv4;
+   struct flow_dissector_key_ipv6_addrs enc_ipv6;
+   };
 } __aligned(BITS_PER_LONG / 8); /* Ensure that we can do comparisons as longs. 
*/
 
 struct fl_flow_mask_range {
@@ -124,11 +133,31 @@ static int fl_classify(struct sk_buff *skb, const struct 
tcf_proto *tp,
struct cls_fl_filter *f;
struct fl_flow_key skb_key;
struct fl_flow_key skb_mkey;
+   struct ip_tunnel_info *info;
 
if (!atomic_read(>ht.nelems))
return -1;
 
fl_clear_masked_range(_key, >mask);
+
+   info = skb_tunnel_info(skb);
+   if (info) {
+   struct ip_tunnel_key *key = >key;
+
+   switch (ip_tunnel_info_af(info)) {
+   case AF_INET:
+   skb_key.enc_ipv4.src = key->u.ipv4.src;
+   skb_key.enc_ipv4.dst = key->u.ipv4.dst;
+   break;
+   case AF_INET6:
+   skb_key.enc_ipv6.src = key->u.ipv6.src;
+   skb_key.enc_ipv6.dst = key->u.ipv6.dst;
+   break;
+   }
+
+   skb_key.enc_key_id.keyid = tunnel_id_to_key32(key->tun_id);
+   }
+
skb_key.indev_ifindex = skb->skb_iif;
/* skb_flow_dissect() does not set n_proto in case an unknown protocol,
 * so do it rather here.
@@ -297,7 +326,15 @@ static const struct nla_policy fl_policy[TCA_FLOWER_MAX + 
1] = {
[TCA_FLOWER_KEY_VLAN_ID]= { .type = NLA_U16 },
[TCA_FLOWER_KEY_VLAN_PRIO]  = { .type = NLA_U8 },
[TCA_FLOWER_KEY_VLAN_ETH_TYPE]  = { .type = NLA_U16 },
-
+   [TCA_FLOWER_KEY_ENC_KEY_ID] = { .type = NLA_U32 },
+   [TCA_FLOWER_KEY_ENC_IPV4_SRC]   = { .type = NLA_U32 },
+   [TCA_FLOWER_KEY_ENC_IPV4_SRC_MASK] = { .type = NLA_U32 },
+   [TCA_FLOWER_KEY_ENC_IPV4_DST]   = { .type = NLA_U32 },
+   [TCA_FLOWER_KEY_ENC_IPV4_DST_MASK] = { .type = NLA_U32 },
+   [TCA_FLOWER_KEY_ENC_IPV6_SRC]   = { .len = sizeof(struct in6_addr) },
+   [TCA_FLOWER_KEY_ENC_IPV6_SRC_MASK] = { .len = sizeof(struct in6_addr) },
+   [TCA_FLOWER_KEY_ENC_IPV6_DST]   = { .len = sizeof(struct in6_addr) },
+   [TCA_FLOWER_KEY_ENC_IPV6_DST_MASK] = { .len = 

[PATCH net-next V3 0/4] net/sched: ip tunnel metadata set/release/classify by using TC

2016-08-25 Thread Hadar Hen Zion
Hi,

This patchset introduces ip tunnel manipulation support using the TC subsystem.

In the decap flow, it enables the user to redirect packets from a shared tunnel
device and classify by outer and inner headers. The outer headers are extracted
from the metadata and used by the flower filter. A new action act_tunnel_key,
releases the metadata.

In the encap flow, act_tunnel_key creates a metadata object to be used by the
shared tunnel device. The actual redirection to the tunnel device is done using
act_mirred.

For example:
$ tc qdisc add dev vnet0 ingress
$ tc filter add dev vnet0 protocol ip parent : \
flower \
 ip_proto 1 \
action tunnel_key set \
 src_ip 11.11.0.1 \
 dst_ip 11.11.0.2 \
 id 11 \
action mirred egress redirect dev vxlan0

$ tc qdisc add dev vxlan0 ingress
$ tc filter add dev vxlan0 protocol ip parent : \
flower \
 enc_src_ip 11.11.0.2 \
 enc_dst_ip 11.11.0.1 \
 enc_key_id 11 \
action tunnel_key release \
action mirred egress redirect dev vnet0

Amir & Hadar

Changes from V2:
- Use union in struct fl_flow_key for enc_ipv6 and enc_ipv4.
- Rename functions _ip_tun_rx_dst and _ipv6_tun_rx_dst to _ip_tun_set_dst and
  _ipv6_tun_set_dst accordingly.
- Remove local parameter 'encapdecap' from tunnel_key_init function.
- Don't copy in6_addr values in tunnel_key_dump_addresses function, use 
pointers.

Changes from V1:
- More cleanups to key32_to_tunnel_id() and tunnel_id_to_key32()
- IPv6 Support added
- Set TUNNEL_KEY flag to make GRE work
- Handle zero tunnel id properly in act_tunnel_key
- Don't leave junk in decap action
- Fix bug in act_tunnel_key initialization where (exists & ocr) is true
- Remove BUG() from code
- Rename action to tunnel_key
- Improve grep-ability of code
- Reuse code from ip_tun_rx_dst() and ipv6_tun_rx_dst()

Changes from RFC:
- Add a new action instead of making mirred too complex
- No need to specify UDP port in action - it is already in the tunnel device
  configuration
- Added a decap operation to drop tunnel metadata

Amir Vadai (4):
  net/ip_tunnels: Introduce tunnel_id_to_key32() and
key32_to_tunnel_id()
  net/dst: Utility functions to build dst_metadata without supplying an
skb
  net/sched: cls_flower: Classify packet in ip tunnels
  net/sched: Introduce act_tunnel_key

 drivers/net/vxlan.c   |   4 +-
 include/net/dst_metadata.h|  45 +++--
 include/net/ip_tunnels.h  |  19 ++
 include/net/tc_act/tc_tunnel_key.h|  25 +++
 include/net/vxlan.h   |  18 --
 include/uapi/linux/pkt_cls.h  |  11 ++
 include/uapi/linux/tc_act/tc_tunnel_key.h |  42 
 net/ipv4/ip_gre.c |  23 +--
 net/sched/Kconfig |  11 ++
 net/sched/Makefile|   1 +
 net/sched/act_tunnel_key.c| 312 ++
 net/sched/cls_flower.c| 101 +-
 12 files changed, 556 insertions(+), 56 deletions(-)
 create mode 100644 include/net/tc_act/tc_tunnel_key.h
 create mode 100644 include/uapi/linux/tc_act/tc_tunnel_key.h
 create mode 100644 net/sched/act_tunnel_key.c

-- 
1.8.3.1



[PATCH iproute2 2/2] tc: m_vlan: Add priority option to push vlan action

2016-08-22 Thread Hadar Hen Zion
The current vlan push action supports only vid and protocol options.
Add priority option.

Example script that adds vlan push action with vid and priority:

tc filter add dev veth0 protocol ip parent : \
flower \
indev veth0 \
action vlan push id 100 priority 5

Signed-off-by: Hadar Hen Zion <had...@mellanox.com>
---
 include/linux/tc_act/tc_vlan.h |  1 +
 man/man8/tc-vlan.8 |  5 +
 tc/m_vlan.c| 22 +-
 3 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/include/linux/tc_act/tc_vlan.h b/include/linux/tc_act/tc_vlan.h
index 26ae695..29e2113 100644
--- a/include/linux/tc_act/tc_vlan.h
+++ b/include/linux/tc_act/tc_vlan.h
@@ -32,6 +32,7 @@ enum {
TCA_VLAN_PUSH_VLAN_ID,
TCA_VLAN_PUSH_VLAN_PROTOCOL,
TCA_VLAN_PAD,
+   TCA_VLAN_PUSH_VLAN_PRIORITY,
__TCA_VLAN_MAX,
 };
 #define TCA_VLAN_MAX (__TCA_VLAN_MAX - 1)
diff --git a/man/man8/tc-vlan.8 b/man/man8/tc-vlan.8
index 4bfd72b..4d0c5c8 100644
--- a/man/man8/tc-vlan.8
+++ b/man/man8/tc-vlan.8
@@ -12,6 +12,8 @@ vlan - vlan manipulation module
 .IR PUSH " := "
 .BR push " [ " protocol
 .IR VLANPROTO " ]"
+.BR " [ " priority
+.IR VLANPRIO " ] "
 .BI id " VLANID"
 
 .ti -8
@@ -55,6 +57,9 @@ for hexadecimal interpretation, etc.).
 Choose the VLAN protocol to use. At the time of writing, the kernel accepts 
only
 .BR 802.1Q " or " 802.1ad .
 .TP
+.BI priority " VLANPRIO"
+Choose the VLAN priority to use. Decimal number in range of 0-7.
+.TP
 .I CONTROL
 How to continue after executing this action.
 .RS
diff --git a/tc/m_vlan.c b/tc/m_vlan.c
index ac63d9e..be2ffd2 100644
--- a/tc/m_vlan.c
+++ b/tc/m_vlan.c
@@ -22,7 +22,7 @@
 static void explain(void)
 {
fprintf(stderr, "Usage: vlan pop\n");
-   fprintf(stderr, "   vlan push [ protocol VLANPROTO ] id VLANID 
[CONTROL]\n");
+   fprintf(stderr, "   vlan push [ protocol VLANPROTO ] id VLANID [ 
priority VLANPRIO ] [CONTROL]\n");
fprintf(stderr, "   VLANPROTO is one of 802.1Q or 802.1AD\n");
fprintf(stderr, "with default: 802.1Q\n");
fprintf(stderr, "   CONTROL := reclassify | pipe | drop | continue 
| pass\n");
@@ -45,6 +45,8 @@ static int parse_vlan(struct action_util *a, int *argc_p, 
char ***argv_p,
int id_set = 0;
__u16 proto;
int proto_set = 0;
+   __u8 prio;
+   int prio_set = 0;
struct tc_vlan parm = { 0 };
 
if (matches(*argv, "vlan") != 0)
@@ -91,6 +93,17 @@ static int parse_vlan(struct action_util *a, int *argc_p, 
char ***argv_p,
if (ll_proto_a2n(, *argv))
invarg("protocol is invalid", *argv);
proto_set = 1;
+   } else if (matches(*argv, "priority") == 0) {
+   if (action != TCA_VLAN_ACT_PUSH) {
+   fprintf(stderr, "\"%s\" is only valid for 
push\n",
+   *argv);
+   explain();
+   return -1;
+   }
+   NEXT_ARG();
+   if (get_u8(, *argv, 0) || (prio & ~VLAN_PRIO_MASK))
+   invarg("prio is invalid", *argv);
+   prio_set = 1;
} else if (matches(*argv, "help") == 0) {
usage();
} else {
@@ -138,6 +151,9 @@ static int parse_vlan(struct action_util *a, int *argc_p, 
char ***argv_p,
 
addattr_l(n, MAX_MSG, TCA_VLAN_PUSH_VLAN_PROTOCOL, , 2);
}
+   if (prio_set)
+   addattr8(n, MAX_MSG, TCA_VLAN_PUSH_VLAN_PRIORITY, prio);
+
tail->rta_len = (char *)NLMSG_TAIL(n) - (char *)tail;
 
*argc_p = argc;
@@ -180,6 +196,10 @@ static int print_vlan(struct action_util *au, FILE *f, 
struct rtattr *arg)

ll_proto_n2a(rta_getattr_u16(tb[TCA_VLAN_PUSH_VLAN_PROTOCOL]),
 b1, sizeof(b1)));
}
+   if (tb[TCA_VLAN_PUSH_VLAN_PRIORITY]) {
+   val = rta_getattr_u8(tb[TCA_VLAN_PUSH_VLAN_PRIORITY]);
+   fprintf(f, " priority %u", val);
+   }
break;
}
fprintf(f, " %s", action_n2a(parm->action));
-- 
1.8.3.1



[PATCH iproute2 1/2] tc: flower: Introduce vlan support

2016-08-22 Thread Hadar Hen Zion
Classification according to vlan id and vlan priority.

Example script that adds vlan filter:

 # add ingress qdisc
 tc qdisc add dev ens4f0 ingress

 # add a flower filter with vlan id and priority classification
 tc filter add dev ens4f0 protocol 802.1Q parent : \
flower \
indev ens4f0 \
vlan_ethtype ipv4 \
vlan_id 100 \
vlan_prio 3 \
action vlan pop

Signed-off-by: Hadar Hen Zion <had...@mellanox.com>
---
 include/linux/pkt_cls.h|  5 +++
 include/linux/tc_act/tc_vlan.h |  3 ++
 man/man8/tc-flower.8   | 25 -
 tc/f_flower.c  | 80 --
 4 files changed, 109 insertions(+), 4 deletions(-)

diff --git a/include/linux/pkt_cls.h b/include/linux/pkt_cls.h
index 5e6c61e..25a8fae 100644
--- a/include/linux/pkt_cls.h
+++ b/include/linux/pkt_cls.h
@@ -374,6 +374,11 @@ enum {
TCA_FLOWER_KEY_UDP_DST, /* be16 */
 
TCA_FLOWER_FLAGS,
+
+   TCA_FLOWER_KEY_VLAN_ID,
+   TCA_FLOWER_KEY_VLAN_PRIO,
+   TCA_FLOWER_KEY_VLAN_ETH_TYPE,
+
__TCA_FLOWER_MAX,
 };
 
diff --git a/include/linux/tc_act/tc_vlan.h b/include/linux/tc_act/tc_vlan.h
index 31151ff..26ae695 100644
--- a/include/linux/tc_act/tc_vlan.h
+++ b/include/linux/tc_act/tc_vlan.h
@@ -16,6 +16,9 @@
 
 #define TCA_VLAN_ACT_POP   1
 #define TCA_VLAN_ACT_PUSH  2
+#define VLAN_PRIO_MASK 0x7
+#define VLAN_VID_MASK  0x0fff
+
 
 struct tc_vlan {
tc_gen;
diff --git a/man/man8/tc-flower.8 b/man/man8/tc-flower.8
index 9ae10e6..74f7664 100644
--- a/man/man8/tc-flower.8
+++ b/man/man8/tc-flower.8
@@ -23,7 +23,13 @@ flower \- flow based traffic control filter
 .R " | { "
 .BR dst_mac " | " src_mac " } "
 .IR mac_address " | "
-.BR eth_type " { " ipv4 " | " ipv6 " | "
+.BR eth_type " { " ipv4 " | " ipv6 " | " 802.1Q " | "
+.IR ETH_TYPE " } | "
+.B vlan_id
+.IR VID " | "
+.B vlan_prio
+.IR PRIORITY " | "
+.BR vlan_eth_type " { " ipv4 " | " ipv6 " | "
 .IR ETH_TYPE " } | "
 .BR ip_proto " { " tcp " | " udp " | "
 .IR IP_PROTO " } | { "
@@ -70,6 +76,23 @@ Do not process filter by hardware.
 Match on source or destination MAC address.
 .TP
 .BI eth_type " ETH_TYPE"
+Match on the next protocol.
+.I ETH_TYPE
+may be either
+.BR ipv4 , ipv6 , 802.1Q ,
+or an unsigned 16bit value in hexadecimal format.
+.TP
+.BI vlan_id " VID"
+Match on vlan tag id.
+.I VID
+is an unsigned 12bit value in decimal format.
+.TP
+.BI vlan_prio " priority"
+Match on vlan tag priority.
+.I PRIORITY
+is an unsigned 3bit value in decimal format.
+.TP
+.BI vlan_eth_type " VLAN_ETH_TYPE"
 Match on layer three protocol.
 .I ETH_TYPE
 may be either
diff --git a/tc/f_flower.c b/tc/f_flower.c
index 791ade7..2ab2de1 100644
--- a/tc/f_flower.c
+++ b/tc/f_flower.c
@@ -17,6 +17,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "utils.h"
 #include "tc_util.h"
@@ -30,6 +31,9 @@ static void explain(void)
fprintf(stderr, "\n");
fprintf(stderr, "Where: MATCH-LIST := [ MATCH-LIST ] MATCH\n");
fprintf(stderr, "   MATCH  := { indev DEV-NAME |\n");
+   fprintf(stderr, "   vlan_id VID |\n");
+   fprintf(stderr, "   vlan_prio PRIORITY |\n");
+   fprintf(stderr, "   vlan_ethtype [ ipv4 | ipv6 | 
ETH-TYPE ] |\n");
fprintf(stderr, "   dst_mac MAC-ADDR |\n");
fprintf(stderr, "   src_mac MAC-ADDR |\n");
fprintf(stderr, "   [ipv4 | ipv6 ] |\n");
@@ -61,6 +65,24 @@ static int flower_parse_eth_addr(char *str, int addr_type, 
int mask_type,
return 0;
 }
 
+static int flower_parse_vlan_eth_type(char *str, __be16 eth_type, int type,
+ __be16 *p_vlan_eth_type,
+ struct nlmsghdr *n)
+{
+   __be16 vlan_eth_type;
+
+   if (eth_type != htons(ETH_P_8021Q)) {
+   fprintf(stderr, "Can't set \"vlan_ethtype\" if ethertype isn't 
802.1Q\n");
+   return -1;
+   }
+
+   if (ll_proto_a2n(_eth_type, str))
+   invarg("invalid vlan_ethtype", str);
+   addattr16(n, MAX_MSG, type, vlan_eth_type);
+   *p_vlan_eth_type = vlan_eth_type;
+   return 0;
+}
+
 static int flower_parse_ip_proto(char *str, __be16 eth_type, int type,
 __u8 *p_ip_proto, struct nlmsghdr *n)
 {
@@ -167,6 +189,7 @@ static int flower_parse_opt(struct filter_util *qu, char 
*

[PATCH iproute2 0/2] tc: flower, m_vlan: Introduce vlan tag support

2016-08-22 Thread Hadar Hen Zion
Hi,
Re-sending becuase of a wrong source e-mail address sent before.

This patchset introduce vlan tag support to the tc flower classifier.
In addition to adding vlan priority to vlan push action.

- The first patch adds classification according to vlan id and vlan priority to 
the flower.
- The second patch adds support for vlan priority to the current vlan push 
action.

Hadar Hen Zion (2):
  tc: flower: Introduce vlan support
  tc: m_vlan: Add priority option to push vlan action

 include/linux/pkt_cls.h|  5 +++
 include/linux/tc_act/tc_vlan.h |  4 +++
 man/man8/tc-flower.8   | 25 -
 man/man8/tc-vlan.8 |  5 +++
 tc/f_flower.c  | 80 --
 tc/m_vlan.c| 22 +++-
 6 files changed, 136 insertions(+), 5 deletions(-)

-- 
1.8.3.1



Re: [PATCH net-next V2 1/5] flow_dissector: For stripped vlan, get vlan info from skb->vlan_tci

2016-08-18 Thread Hadar Hen Zion
On Wed, Aug 17, 2016 at 4:02 PM, Jiri Pirko <j...@resnulli.us> wrote:
> Wed, Aug 17, 2016 at 01:05:33PM CEST, had...@dev.mellanox.co.il wrote:
>>On Wed, Aug 17, 2016 at 1:46 PM, Jiri Pirko <j...@resnulli.us> wrote:
>>> Wed, Aug 17, 2016 at 12:36:10PM CEST, had...@mellanox.com wrote:
>>>>Early in the datapath skb_vlan_untag function is called, stripped
>>>>the vlan from the skb and set skb->vlan_tci and skb->vlan_proto fields.
>>>>
>>>>The current dissection doesn't handle stripped vlan packets correctly.
>>>>In some flows, vlan doesn't exist in skb->data anymore when applying
>>>>flow dissection on the skb, fix that.
>>>>
>>>>In case vlan info wasn't stripped before applying flow_dissector (RPS
>>>>flow for example), or in case of skb with multiple vlans (e.g. 802.1ad),
>>>>get the vlan info from skb->data. The flow_dissector correctly skips
>>>>any number of vlans and stores only the first level vlan.
>>>>
>>>>Fixes: 0744dd00c1b1 ('net: introduce skb_flow_dissect()')
>>>>Signed-off-by: Hadar Hen Zion <had...@mellanox.com>
>>>>---
>>>> net/core/flow_dissector.c | 34 ++
>>>> 1 file changed, 26 insertions(+), 8 deletions(-)
>>>>
>>>>diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
>>>>index 91028ae..362d693 100644
>>>>--- a/net/core/flow_dissector.c
>>>>+++ b/net/core/flow_dissector.c
>>>>@@ -119,12 +119,14 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
>>>>   struct flow_dissector_key_ports *key_ports;
>>>>   struct flow_dissector_key_tags *key_tags;
>>>>   struct flow_dissector_key_keyid *key_keyid;
>>>>+  bool skip_vlan = false;
>>>>   u8 ip_proto = 0;
>>>>   bool ret = false;
>>>>
>>>>   if (!data) {
>>>>   data = skb->data;
>>>>-  proto = skb->protocol;
>>>>+  proto = skb_vlan_tag_present(skb) ?
>>>>+   skb->vlan_proto : skb->protocol;
>>>>   nhoff = skb_network_offset(skb);
>>>>   hlen = skb_headlen(skb);
>>>>   }
>>>>@@ -243,23 +245,39 @@ ipv6:
>>>>   case htons(ETH_P_8021AD):
>>>>   case htons(ETH_P_8021Q): {
>>>>   const struct vlan_hdr *vlan;
>>>>-  struct vlan_hdr _vlan;
>>>>
>>>>-  vlan = __skb_header_pointer(skb, nhoff, sizeof(_vlan), data, 
>>>>hlen, &_vlan);
>>>>-  if (!vlan)
>>>>-  goto out_bad;
>>>>+  if (skb_vlan_tag_present(skb))
>>>>+  proto = skb->protocol;
>>>>+
>>>>+  if (!skb_vlan_tag_present(skb) ||
>>>>+  proto == cpu_to_be16(ETH_P_8021Q) ||
>>>>+  proto == cpu_to_be16(ETH_P_8021AD)) {
>>>
>>> How this can happen? Could you give me an example?
>>>
>>
>>This can happen in 2 cases:
>>
>>1. vlan wasn't stripped yet from the skb.
>>In RPS flow for example, get_rps_cpu function is using flow-dissector
>>before vlan_untag is called by __netif_receive_skb_core.
>
> right, sigh...
>
>
>>
>>2. skb with multiple vlan tags.
>>Only the first vlan is stripped while the inner vlans are still in skb->data.
>>In this case skb->vlan_proto is 802.1AD and skb->protocol is 802.1Q
>>(for example) so I have to take the next header from skb->data.
>
> Hmm I think that whoever removes the outermost vlan from skb->vlan_*
> should strip the next header into skb->vlan_*

The outermost vlan isn't removed from skb->vlan_* it stays there.


Re: [PATCH net-next V2 1/5] flow_dissector: For stripped vlan, get vlan info from skb->vlan_tci

2016-08-17 Thread Hadar Hen Zion
On Wed, Aug 17, 2016 at 1:46 PM, Jiri Pirko <j...@resnulli.us> wrote:
> Wed, Aug 17, 2016 at 12:36:10PM CEST, had...@mellanox.com wrote:
>>Early in the datapath skb_vlan_untag function is called, stripped
>>the vlan from the skb and set skb->vlan_tci and skb->vlan_proto fields.
>>
>>The current dissection doesn't handle stripped vlan packets correctly.
>>In some flows, vlan doesn't exist in skb->data anymore when applying
>>flow dissection on the skb, fix that.
>>
>>In case vlan info wasn't stripped before applying flow_dissector (RPS
>>flow for example), or in case of skb with multiple vlans (e.g. 802.1ad),
>>get the vlan info from skb->data. The flow_dissector correctly skips
>>any number of vlans and stores only the first level vlan.
>>
>>Fixes: 0744dd00c1b1 ('net: introduce skb_flow_dissect()')
>>Signed-off-by: Hadar Hen Zion <had...@mellanox.com>
>>---
>> net/core/flow_dissector.c | 34 ++
>> 1 file changed, 26 insertions(+), 8 deletions(-)
>>
>>diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
>>index 91028ae..362d693 100644
>>--- a/net/core/flow_dissector.c
>>+++ b/net/core/flow_dissector.c
>>@@ -119,12 +119,14 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
>>   struct flow_dissector_key_ports *key_ports;
>>   struct flow_dissector_key_tags *key_tags;
>>   struct flow_dissector_key_keyid *key_keyid;
>>+  bool skip_vlan = false;
>>   u8 ip_proto = 0;
>>   bool ret = false;
>>
>>   if (!data) {
>>   data = skb->data;
>>-  proto = skb->protocol;
>>+  proto = skb_vlan_tag_present(skb) ?
>>+   skb->vlan_proto : skb->protocol;
>>   nhoff = skb_network_offset(skb);
>>   hlen = skb_headlen(skb);
>>   }
>>@@ -243,23 +245,39 @@ ipv6:
>>   case htons(ETH_P_8021AD):
>>   case htons(ETH_P_8021Q): {
>>   const struct vlan_hdr *vlan;
>>-  struct vlan_hdr _vlan;
>>
>>-  vlan = __skb_header_pointer(skb, nhoff, sizeof(_vlan), data, 
>>hlen, &_vlan);
>>-  if (!vlan)
>>-  goto out_bad;
>>+  if (skb_vlan_tag_present(skb))
>>+  proto = skb->protocol;
>>+
>>+  if (!skb_vlan_tag_present(skb) ||
>>+  proto == cpu_to_be16(ETH_P_8021Q) ||
>>+  proto == cpu_to_be16(ETH_P_8021AD)) {
>
> How this can happen? Could you give me an example?
>

This can happen in 2 cases:

1. vlan wasn't stripped yet from the skb.
In RPS flow for example, get_rps_cpu function is using flow-dissector
before vlan_untag is called by __netif_receive_skb_core.

2. skb with multiple vlan tags.
Only the first vlan is stripped while the inner vlans are still in skb->data.
In this case skb->vlan_proto is 802.1AD and skb->protocol is 802.1Q
(for example) so I have to take the next header from skb->data.


[PATCH net-next V2 0/5] net_sched, flow_dissector, flower: Introduce vlan tag support

2016-08-17 Thread Hadar Hen Zion
This patchset introduce vlan tag support to the flower classifier and the flow
dissector. In addition to adding vlan priority to act vlan.

The first 2 patches are dealing with flow-dissector:
 - The first patch is a fix, in case the vlan was already stripped from the
   skb, take it from skb->vlan_tci.
 - The second patch adds support for vlan priority.

The next 2 patches are dealing with flower:
 - The first patch is a fix, sets flow dissector 'used_keys' according to the
   mask value of each key.
 - The secound patch adds vlan tag support to the flower classifier, user space
   patches will be sent later to complete it.

The last patch adds vlan priority to act vlan since only vlan id is currently 
supported.

Changes from V1:
 - A new patch was added to this series "net_sched: flower: Avoid dissection of 
unmasked keys"
 - Adding u16 padding to struct flow_dissector_key_vlan
 - change flow_label field in struct flow_dissector_key_tags form 20 bits field 
to u32
 - Remove 'if (v->tcfv_push_prio)' check from tcf_vlan_dump function
 - Add support to un-stripped vlan skb and skb with multipale vlans in 
__skb_flow_dissect


Hadar Hen Zion (5):
  flow_dissector: For stripped vlan, get vlan info from skb->vlan_tci
  flow_dissector: Get vlan priority in addition to vlan id
  net_sched: flower: Avoid dissection of unmasked keys
  net_sched: flower: Add vlan support
  net_sched: act_vlan: Add priority option

 include/linux/if_vlan.h |  1 +
 include/net/flow_dissector.h| 12 +++--
 include/net/tc_act/tc_vlan.h|  1 +
 include/uapi/linux/pkt_cls.h|  3 ++
 include/uapi/linux/tc_act/tc_vlan.h |  1 +
 net/core/flow_dissector.c   | 51 ++-
 net/sched/act_vlan.c| 13 -
 net/sched/cls_flower.c  | 98 ++---
 8 files changed, 144 insertions(+), 36 deletions(-)

-- 
1.8.3.1



[PATCH net-next V2 4/5] net_sched: flower: Add vlan support

2016-08-17 Thread Hadar Hen Zion
Enhance flower to support 802.1Q vlan protocol classification.
Currently, the supported fields are vlan_id and vlan_priority.

Example:

# add a flower filter with vlan id and priority classification
tc filter add dev ens4f0 protocol 802.1Q parent : \
flower \
indev ens4f0 \
vlan_ethtype ipv4 \
vlan_id 100 \
vlan_prio 3 \
action vlan pop

Signed-off-by: Hadar Hen Zion <had...@mellanox.com>
Acked-by: Jiri Pirko <j...@mellanox.com>
---
 include/uapi/linux/pkt_cls.h |  3 ++
 net/sched/cls_flower.c   | 70 ++--
 2 files changed, 70 insertions(+), 3 deletions(-)

diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index d1c1cca..51b5b24 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -428,6 +428,9 @@ enum {
TCA_FLOWER_KEY_UDP_DST, /* be16 */
 
TCA_FLOWER_FLAGS,
+   TCA_FLOWER_KEY_VLAN_ID,
+   TCA_FLOWER_KEY_VLAN_PRIO,
+   TCA_FLOWER_KEY_VLAN_ETH_TYPE,
__TCA_FLOWER_MAX,
 };
 
diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index 0080fc0..1e11e57 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -28,6 +28,7 @@ struct fl_flow_key {
struct flow_dissector_key_control control;
struct flow_dissector_key_basic basic;
struct flow_dissector_key_eth_addrs eth;
+   struct flow_dissector_key_vlan vlan;
struct flow_dissector_key_addrs ipaddrs;
union {
struct flow_dissector_key_ipv4_addrs ipv4;
@@ -293,6 +294,10 @@ static const struct nla_policy fl_policy[TCA_FLOWER_MAX + 
1] = {
[TCA_FLOWER_KEY_TCP_DST]= { .type = NLA_U16 },
[TCA_FLOWER_KEY_UDP_SRC]= { .type = NLA_U16 },
[TCA_FLOWER_KEY_UDP_DST]= { .type = NLA_U16 },
+   [TCA_FLOWER_KEY_VLAN_ID]= { .type = NLA_U16 },
+   [TCA_FLOWER_KEY_VLAN_PRIO]  = { .type = NLA_U8 },
+   [TCA_FLOWER_KEY_VLAN_ETH_TYPE]  = { .type = NLA_U16 },
+
 };
 
 static void fl_set_key_val(struct nlattr **tb,
@@ -308,9 +313,29 @@ static void fl_set_key_val(struct nlattr **tb,
memcpy(mask, nla_data(tb[mask_type]), len);
 }
 
+static void fl_set_key_vlan(struct nlattr **tb,
+   struct flow_dissector_key_vlan *key_val,
+   struct flow_dissector_key_vlan *key_mask)
+{
+#define VLAN_PRIORITY_MASK 0x7
+
+   if (tb[TCA_FLOWER_KEY_VLAN_ID]) {
+   key_val->vlan_id =
+   nla_get_u16(tb[TCA_FLOWER_KEY_VLAN_ID]) & VLAN_VID_MASK;
+   key_mask->vlan_id = VLAN_VID_MASK;
+   }
+   if (tb[TCA_FLOWER_KEY_VLAN_PRIO]) {
+   key_val->vlan_priority =
+   nla_get_u8(tb[TCA_FLOWER_KEY_VLAN_PRIO]) &
+   VLAN_PRIORITY_MASK;
+   key_mask->vlan_priority = VLAN_PRIORITY_MASK;
+   }
+}
+
 static int fl_set_key(struct net *net, struct nlattr **tb,
  struct fl_flow_key *key, struct fl_flow_key *mask)
 {
+   __be16 ethertype;
 #ifdef CONFIG_NET_CLS_IND
if (tb[TCA_FLOWER_INDEV]) {
int err = tcf_change_indev(net, tb[TCA_FLOWER_INDEV]);
@@ -328,9 +353,19 @@ static int fl_set_key(struct net *net, struct nlattr **tb,
   mask->eth.src, TCA_FLOWER_KEY_ETH_SRC_MASK,
   sizeof(key->eth.src));
 
-   fl_set_key_val(tb, >basic.n_proto, TCA_FLOWER_KEY_ETH_TYPE,
-  >basic.n_proto, TCA_FLOWER_UNSPEC,
-  sizeof(key->basic.n_proto));
+   if (tb[TCA_FLOWER_KEY_ETH_TYPE])
+   ethertype = nla_get_be16(tb[TCA_FLOWER_KEY_ETH_TYPE]);
+
+   if (ethertype == htons(ETH_P_8021Q)) {
+   fl_set_key_vlan(tb, >vlan, >vlan);
+   fl_set_key_val(tb, >basic.n_proto,
+  TCA_FLOWER_KEY_VLAN_ETH_TYPE,
+  >basic.n_proto, TCA_FLOWER_UNSPEC,
+  sizeof(key->basic.n_proto));
+   } else {
+   key->basic.n_proto = ethertype;
+   mask->basic.n_proto = cpu_to_be16(~0);
+   }
 
if (key->basic.n_proto == htons(ETH_P_IP) ||
key->basic.n_proto == htons(ETH_P_IPV6)) {
@@ -438,6 +473,8 @@ static void fl_init_dissector(struct cls_fl_head *head,
 FLOW_DISSECTOR_KEY_IPV6_ADDRS, ipv6);
FL_KEY_SET_IF_MASKED(>key, keys, cnt,
 FLOW_DISSECTOR_KEY_PORTS, tp);
+   FL_KEY_SET_IF_MASKED(>key, keys, cnt,
+FLOW_DISSECTOR_KEY_VLAN, vlan);
 
skb_flow_dissector_init(>dissector, keys, cnt);
 }
@@ -666,6 +703,29 @@ static int fl_dump_key_val(struct sk_buff *skb,
return 0;
 }
 
+static int fl_dump_key_vlan(struct

[PATCH net-next V2 5/5] net_sched: act_vlan: Add priority option

2016-08-17 Thread Hadar Hen Zion
The current vlan push action supports only vid and protocol options.
Add priority option.

Example script that adds vlan push action with vid and
priority:

tc filter add dev veth0 protocol ip parent : \
   flower \
indev veth0 \
   action vlan push id 100 priority 5

Signed-off-by: Hadar Hen Zion <had...@mellanox.com>
Acked-by: Jiri Pirko <j...@mellanox.com>
---
 include/net/tc_act/tc_vlan.h|  1 +
 include/uapi/linux/tc_act/tc_vlan.h |  1 +
 net/sched/act_vlan.c| 13 +++--
 3 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/include/net/tc_act/tc_vlan.h b/include/net/tc_act/tc_vlan.h
index e29f52e..6b83588 100644
--- a/include/net/tc_act/tc_vlan.h
+++ b/include/net/tc_act/tc_vlan.h
@@ -20,6 +20,7 @@ struct tcf_vlan {
int tcfv_action;
u16 tcfv_push_vid;
__be16  tcfv_push_proto;
+   u8  tcfv_push_prio;
 };
 #define to_vlan(a) ((struct tcf_vlan *)a)
 
diff --git a/include/uapi/linux/tc_act/tc_vlan.h 
b/include/uapi/linux/tc_act/tc_vlan.h
index 31151ff62..be72b6e 100644
--- a/include/uapi/linux/tc_act/tc_vlan.h
+++ b/include/uapi/linux/tc_act/tc_vlan.h
@@ -29,6 +29,7 @@ enum {
TCA_VLAN_PUSH_VLAN_ID,
TCA_VLAN_PUSH_VLAN_PROTOCOL,
TCA_VLAN_PAD,
+   TCA_VLAN_PUSH_VLAN_PRIORITY,
__TCA_VLAN_MAX,
 };
 #define TCA_VLAN_MAX (__TCA_VLAN_MAX - 1)
diff --git a/net/sched/act_vlan.c b/net/sched/act_vlan.c
index 691409d..59a8d31 100644
--- a/net/sched/act_vlan.c
+++ b/net/sched/act_vlan.c
@@ -43,7 +43,8 @@ static int tcf_vlan(struct sk_buff *skb, const struct 
tc_action *a,
goto drop;
break;
case TCA_VLAN_ACT_PUSH:
-   err = skb_vlan_push(skb, v->tcfv_push_proto, v->tcfv_push_vid);
+   err = skb_vlan_push(skb, v->tcfv_push_proto, v->tcfv_push_vid |
+   (v->tcfv_push_prio << VLAN_PRIO_SHIFT));
if (err)
goto drop;
break;
@@ -65,6 +66,7 @@ static const struct nla_policy vlan_policy[TCA_VLAN_MAX + 1] 
= {
[TCA_VLAN_PARMS]= { .len = sizeof(struct tc_vlan) },
[TCA_VLAN_PUSH_VLAN_ID] = { .type = NLA_U16 },
[TCA_VLAN_PUSH_VLAN_PROTOCOL]   = { .type = NLA_U16 },
+   [TCA_VLAN_PUSH_VLAN_PRIORITY]   = { .type = NLA_U8 },
 };
 
 static int tcf_vlan_init(struct net *net, struct nlattr *nla,
@@ -78,6 +80,7 @@ static int tcf_vlan_init(struct net *net, struct nlattr *nla,
int action;
__be16 push_vid = 0;
__be16 push_proto = 0;
+   u8 push_prio = 0;
bool exists = false;
int ret = 0, err;
 
@@ -123,6 +126,9 @@ static int tcf_vlan_init(struct net *net, struct nlattr 
*nla,
} else {
push_proto = htons(ETH_P_8021Q);
}
+
+   if (tb[TCA_VLAN_PUSH_VLAN_PRIORITY])
+   push_prio = nla_get_u8(tb[TCA_VLAN_PUSH_VLAN_PRIORITY]);
break;
default:
if (exists)
@@ -150,6 +156,7 @@ static int tcf_vlan_init(struct net *net, struct nlattr 
*nla,
 
v->tcfv_action = action;
v->tcfv_push_vid = push_vid;
+   v->tcfv_push_prio = push_prio;
v->tcfv_push_proto = push_proto;
 
v->tcf_action = parm->action;
@@ -181,7 +188,9 @@ static int tcf_vlan_dump(struct sk_buff *skb, struct 
tc_action *a,
if (v->tcfv_action == TCA_VLAN_ACT_PUSH &&
(nla_put_u16(skb, TCA_VLAN_PUSH_VLAN_ID, v->tcfv_push_vid) ||
 nla_put_be16(skb, TCA_VLAN_PUSH_VLAN_PROTOCOL,
- v->tcfv_push_proto)))
+ v->tcfv_push_proto) ||
+(nla_put_u8(skb, TCA_VLAN_PUSH_VLAN_PRIORITY,
+ v->tcfv_push_prio
goto nla_put_failure;
 
tcf_tm_dump(, >tcf_tm);
-- 
1.8.3.1



[PATCH net-next V2 2/5] flow_dissector: Get vlan priority in addition to vlan id

2016-08-17 Thread Hadar Hen Zion
Add vlan priority check to the flow dissector by adding new flow
dissector struct, flow_dissector_key_vlan which includes vlan tag
fields.

vlan_id and flow_label fields were under the same struct
(flow_dissector_key_tags). It was a convenient setting since struct
flow_dissector_key_tags is used by struct flow_keys and by setting
vlan_id and flow_label under the same struct, we get precisely 24 or 48
bytes in flow_keys from flow_dissector_key_basic.

Now, when adding vlan priority support, the code will be cleaner if
flow_label and vlan tag won't be under the same struct anymore.

Signed-off-by: Hadar Hen Zion <had...@mellanox.com>
---
 include/linux/if_vlan.h  |  1 +
 include/net/flow_dissector.h | 12 +---
 net/core/flow_dissector.c| 25 -
 3 files changed, 26 insertions(+), 12 deletions(-)

diff --git a/include/linux/if_vlan.h b/include/linux/if_vlan.h
index a5f6ce6..49d4aef 100644
--- a/include/linux/if_vlan.h
+++ b/include/linux/if_vlan.h
@@ -81,6 +81,7 @@ static inline bool is_vlan_dev(const struct net_device *dev)
 #define skb_vlan_tag_present(__skb)((__skb)->vlan_tci & VLAN_TAG_PRESENT)
 #define skb_vlan_tag_get(__skb)((__skb)->vlan_tci & 
~VLAN_TAG_PRESENT)
 #define skb_vlan_tag_get_id(__skb) ((__skb)->vlan_tci & VLAN_VID_MASK)
+#define skb_vlan_tag_get_prio(__skb)   ((__skb)->vlan_tci & VLAN_PRIO_MASK)
 
 /**
  * struct vlan_pcpu_stats - VLAN percpu rx/tx stats
diff --git a/include/net/flow_dissector.h b/include/net/flow_dissector.h
index d3d60dc..f266b51 100644
--- a/include/net/flow_dissector.h
+++ b/include/net/flow_dissector.h
@@ -32,8 +32,13 @@ struct flow_dissector_key_basic {
 };
 
 struct flow_dissector_key_tags {
-   u32 vlan_id:12,
-   flow_label:20;
+   u32 flow_label;
+};
+
+struct flow_dissector_key_vlan {
+   u16 vlan_id:12,
+   vlan_priority:3;
+   u16 padding;
 };
 
 struct flow_dissector_key_keyid {
@@ -119,7 +124,7 @@ enum flow_dissector_key_id {
FLOW_DISSECTOR_KEY_PORTS, /* struct flow_dissector_key_ports */
FLOW_DISSECTOR_KEY_ETH_ADDRS, /* struct flow_dissector_key_eth_addrs */
FLOW_DISSECTOR_KEY_TIPC_ADDRS, /* struct flow_dissector_key_tipc_addrs 
*/
-   FLOW_DISSECTOR_KEY_VLANID, /* struct flow_dissector_key_flow_tags */
+   FLOW_DISSECTOR_KEY_VLAN, /* struct flow_dissector_key_flow_vlan */
FLOW_DISSECTOR_KEY_FLOW_LABEL, /* struct flow_dissector_key_flow_tags */
FLOW_DISSECTOR_KEY_GRE_KEYID, /* struct flow_dissector_key_keyid */
FLOW_DISSECTOR_KEY_MPLS_ENTROPY, /* struct flow_dissector_key_keyid */
@@ -148,6 +153,7 @@ struct flow_keys {
 #define FLOW_KEYS_HASH_START_FIELD basic
struct flow_dissector_key_basic basic;
struct flow_dissector_key_tags tags;
+   struct flow_dissector_key_vlan vlan;
struct flow_dissector_key_keyid keyid;
struct flow_dissector_key_ports ports;
struct flow_dissector_key_addrs addrs;
diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index 362d693..a2879c0 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -118,6 +118,7 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
struct flow_dissector_key_addrs *key_addrs;
struct flow_dissector_key_ports *key_ports;
struct flow_dissector_key_tags *key_tags;
+   struct flow_dissector_key_vlan *key_vlan;
struct flow_dissector_key_keyid *key_keyid;
bool skip_vlan = false;
u8 ip_proto = 0;
@@ -266,16 +267,22 @@ ipv6:
 
skip_vlan = true;
if (dissector_uses_key(flow_dissector,
-  FLOW_DISSECTOR_KEY_VLANID)) {
-   key_tags = skb_flow_dissector_target(flow_dissector,
-
FLOW_DISSECTOR_KEY_VLANID,
+  FLOW_DISSECTOR_KEY_VLAN)) {
+   key_vlan = skb_flow_dissector_target(flow_dissector,
+
FLOW_DISSECTOR_KEY_VLAN,
 target_container);
 
-   if (skb_vlan_tag_present(skb))
-   key_tags->vlan_id = skb_vlan_tag_get_id(skb);
-   else
-   key_tags->vlan_id = ntohs(vlan->h_vlan_TCI) &
+   if (skb_vlan_tag_present(skb)) {
+   key_vlan->vlan_id = skb_vlan_tag_get_id(skb);
+   key_vlan->vlan_priority =
+   (skb_vlan_tag_get_prio(skb) >> 
VLAN_PRIO_SHIFT);
+   } else {
+   key_vlan->vlan_id = ntohs(vlan->h_vlan_TCI) &
V

[PATCH net-next V2 3/5] net_sched: flower: Avoid dissection of unmasked keys

2016-08-17 Thread Hadar Hen Zion
The current flower implementation checks the mask range and set all the
keys included in that range as "used_keys", even if a specific key in
the range has a zero mask.

This behavior can cause a false positive return value of
dissector_uses_key function and unnecessary dissection in
__skb_flow_dissect.

This patch checks explicitly the mask of each key and "used_keys" will
be set accordingly.

Fixes: 77b9900ef53a ('tc: introduce Flower classifier')
Signed-off-by: Hadar Hen Zion <had...@mellanox.com>
Signed-off-by: Jiri Pirko <j...@mellanox.com>
---
 net/sched/cls_flower.c | 28 +---
 1 file changed, 13 insertions(+), 15 deletions(-)

diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index 5060801..0080fc0 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -404,12 +404,10 @@ static int fl_init_hashtable(struct cls_fl_head *head,
 
 #define FL_KEY_MEMBER_OFFSET(member) offsetof(struct fl_flow_key, member)
 #define FL_KEY_MEMBER_SIZE(member) (sizeof(((struct fl_flow_key *) 0)->member))
-#define FL_KEY_MEMBER_END_OFFSET(member)   
\
-   (FL_KEY_MEMBER_OFFSET(member) + FL_KEY_MEMBER_SIZE(member))
 
-#define FL_KEY_IN_RANGE(mask, member)  
\
-(FL_KEY_MEMBER_OFFSET(member) <= (mask)->range.end &&  
\
- FL_KEY_MEMBER_END_OFFSET(member) >= (mask)->range.start)
+#define FL_KEY_IS_MASKED(mask, member) 
\
+   memchr_inv(((char *)mask) + FL_KEY_MEMBER_OFFSET(member),   
\
+  0, FL_KEY_MEMBER_SIZE(member))   
\
 
 #define FL_KEY_SET(keys, cnt, id, member)  
\
do {
\
@@ -418,9 +416,9 @@ static int fl_init_hashtable(struct cls_fl_head *head,
cnt++;  
\
} while(0);
 
-#define FL_KEY_SET_IF_IN_RANGE(mask, keys, cnt, id, member)
\
+#define FL_KEY_SET_IF_MASKED(mask, keys, cnt, id, member)  
\
do {
\
-   if (FL_KEY_IN_RANGE(mask, member))  
\
+   if (FL_KEY_IS_MASKED(mask, member)) 
\
FL_KEY_SET(keys, cnt, id, member);  
\
} while(0);
 
@@ -432,14 +430,14 @@ static void fl_init_dissector(struct cls_fl_head *head,
 
FL_KEY_SET(keys, cnt, FLOW_DISSECTOR_KEY_CONTROL, control);
FL_KEY_SET(keys, cnt, FLOW_DISSECTOR_KEY_BASIC, basic);
-   FL_KEY_SET_IF_IN_RANGE(mask, keys, cnt,
-  FLOW_DISSECTOR_KEY_ETH_ADDRS, eth);
-   FL_KEY_SET_IF_IN_RANGE(mask, keys, cnt,
-  FLOW_DISSECTOR_KEY_IPV4_ADDRS, ipv4);
-   FL_KEY_SET_IF_IN_RANGE(mask, keys, cnt,
-  FLOW_DISSECTOR_KEY_IPV6_ADDRS, ipv6);
-   FL_KEY_SET_IF_IN_RANGE(mask, keys, cnt,
-  FLOW_DISSECTOR_KEY_PORTS, tp);
+   FL_KEY_SET_IF_MASKED(>key, keys, cnt,
+FLOW_DISSECTOR_KEY_ETH_ADDRS, eth);
+   FL_KEY_SET_IF_MASKED(>key, keys, cnt,
+FLOW_DISSECTOR_KEY_IPV4_ADDRS, ipv4);
+   FL_KEY_SET_IF_MASKED(>key, keys, cnt,
+FLOW_DISSECTOR_KEY_IPV6_ADDRS, ipv6);
+   FL_KEY_SET_IF_MASKED(>key, keys, cnt,
+FLOW_DISSECTOR_KEY_PORTS, tp);
 
skb_flow_dissector_init(>dissector, keys, cnt);
 }
-- 
1.8.3.1



[PATCH net-next V2 1/5] flow_dissector: For stripped vlan, get vlan info from skb->vlan_tci

2016-08-17 Thread Hadar Hen Zion
Early in the datapath skb_vlan_untag function is called, stripped
the vlan from the skb and set skb->vlan_tci and skb->vlan_proto fields.

The current dissection doesn't handle stripped vlan packets correctly.
In some flows, vlan doesn't exist in skb->data anymore when applying
flow dissection on the skb, fix that.

In case vlan info wasn't stripped before applying flow_dissector (RPS
flow for example), or in case of skb with multiple vlans (e.g. 802.1ad),
get the vlan info from skb->data. The flow_dissector correctly skips
any number of vlans and stores only the first level vlan.

Fixes: 0744dd00c1b1 ('net: introduce skb_flow_dissect()')
Signed-off-by: Hadar Hen Zion <had...@mellanox.com>
---
 net/core/flow_dissector.c | 34 ++
 1 file changed, 26 insertions(+), 8 deletions(-)

diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index 91028ae..362d693 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -119,12 +119,14 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
struct flow_dissector_key_ports *key_ports;
struct flow_dissector_key_tags *key_tags;
struct flow_dissector_key_keyid *key_keyid;
+   bool skip_vlan = false;
u8 ip_proto = 0;
bool ret = false;
 
if (!data) {
data = skb->data;
-   proto = skb->protocol;
+   proto = skb_vlan_tag_present(skb) ?
+skb->vlan_proto : skb->protocol;
nhoff = skb_network_offset(skb);
hlen = skb_headlen(skb);
}
@@ -243,23 +245,39 @@ ipv6:
case htons(ETH_P_8021AD):
case htons(ETH_P_8021Q): {
const struct vlan_hdr *vlan;
-   struct vlan_hdr _vlan;
 
-   vlan = __skb_header_pointer(skb, nhoff, sizeof(_vlan), data, 
hlen, &_vlan);
-   if (!vlan)
-   goto out_bad;
+   if (skb_vlan_tag_present(skb))
+   proto = skb->protocol;
+
+   if (!skb_vlan_tag_present(skb) ||
+   proto == cpu_to_be16(ETH_P_8021Q) ||
+   proto == cpu_to_be16(ETH_P_8021AD)) {
+   struct vlan_hdr _vlan;
+
+   vlan = __skb_header_pointer(skb, nhoff, sizeof(_vlan),
+   data, hlen, &_vlan);
+   if (!vlan)
+   goto out_bad;
+   proto = vlan->h_vlan_encapsulated_proto;
+   nhoff += sizeof(*vlan);
+   if (skip_vlan)
+   goto again;
+   }
 
+   skip_vlan = true;
if (dissector_uses_key(flow_dissector,
   FLOW_DISSECTOR_KEY_VLANID)) {
key_tags = skb_flow_dissector_target(flow_dissector,
 
FLOW_DISSECTOR_KEY_VLANID,
 target_container);
 
-   key_tags->vlan_id = skb_vlan_tag_get_id(skb);
+   if (skb_vlan_tag_present(skb))
+   key_tags->vlan_id = skb_vlan_tag_get_id(skb);
+   else
+   key_tags->vlan_id = ntohs(vlan->h_vlan_TCI) &
+   VLAN_VID_MASK;
}
 
-   proto = vlan->h_vlan_encapsulated_proto;
-   nhoff += sizeof(*vlan);
goto again;
}
case htons(ETH_P_PPP_SES): {
-- 
1.8.3.1



Re: [PATCH net-next 1/4] flow_dissector: Get vlan info from skb->vlan_tci instead of skb->data

2016-08-15 Thread Hadar Hen Zion
On Mon, Aug 15, 2016 at 5:38 AM, Toshiaki Makita
<toshiaki.maki...@gmail.com> wrote:
> On 16/08/14 (日) 23:58, Hadar Hen Zion wrote:
>>
>> On Fri, Aug 12, 2016 at 9:36 AM, Toshiaki Makita
>> <makita.toshi...@lab.ntt.co.jp> wrote:
>>>
>>> On 2016/08/10 22:32, Hadar Hen Zion wrote:
>>>>
>>>> Early in the datapath skb_vlan_untag function is called, stripped
>>>> the vlan from the skb and set skb->vlan_tci and skb->vlan_proto fields.
>>>>
>>>> The current dissection doesn't handle vlan packets correctly.  Vlan
>>>> doesn't exist in skb->data anymore when applying flow dissection on the
>>>> skb, fix that.
>>>
>>>
>>> RPS (and flow-dissector called in RPS) is performed before vlan-strip in
>>> __netif_receive_skb_core().
>>
>>
>> right, I'll fix it to v2.
>>
>>> Also, in cases skb is tagged with multiple vlan headers (typical when
>>> using 802.1ad), the second level vlan tag is in skb->data.
>>
>>
>> Currently, flow_dissector doesn't support multiple vlan headers, only
>> one vlan_id field is present.
>> There aren't any flow_dissector "customers" yet for multiple vlan support.
>
>
> Sure, no need to store second level vlan tag information for now.
> The point is that current flow-dissector correctly skips any number of vlan
> tags and get hash value from IP/TCP/UDP headers, so RPS works for multiple
> vlan tagged packets.
>
> Thanks,
> Toshiaki Makita

ok, so we are on the same page.
The flow dissector will correctly skip any number of vlans regardless
if the first vlan is stripped or not.

I also found a dependency between my vlan addition to flower and mlx5
tc offload support so I'm working to fix it for V2.

Thanks,
Hadar


Re: [PATCH net-next 1/4] flow_dissector: Get vlan info from skb->vlan_tci instead of skb->data

2016-08-14 Thread Hadar Hen Zion
On Fri, Aug 12, 2016 at 9:36 AM, Toshiaki Makita
<makita.toshi...@lab.ntt.co.jp> wrote:
> On 2016/08/10 22:32, Hadar Hen Zion wrote:
>> Early in the datapath skb_vlan_untag function is called, stripped
>> the vlan from the skb and set skb->vlan_tci and skb->vlan_proto fields.
>>
>> The current dissection doesn't handle vlan packets correctly.  Vlan
>> doesn't exist in skb->data anymore when applying flow dissection on the
>> skb, fix that.
>
> RPS (and flow-dissector called in RPS) is performed before vlan-strip in
> __netif_receive_skb_core().

right, I'll fix it to v2.

> Also, in cases skb is tagged with multiple vlan headers (typical when
> using 802.1ad), the second level vlan tag is in skb->data.

Currently, flow_dissector doesn't support multiple vlan headers, only
one vlan_id field is present.
There aren't any flow_dissector "customers" yet for multiple vlan support.


> So I think you should handle both of skb->vlan_tci and skb->data cases.

Sure, will do it.


>
> Thanks,
> Toshiaki Makita
>
>


Re: [PATCH net-next 2/4] flow_dissector: Get vlan priority in addition to vlan id

2016-08-11 Thread Hadar Hen Zion
On Thu, Aug 11, 2016 at 12:58 AM, kbuild test robot <l...@intel.com> wrote:
> Hi Hadar,
>
> [auto build test ERROR on net-next/master]
>
> url:
> https://github.com/0day-ci/linux/commits/Hadar-Hen-Zion/flow_dissector-Get-vlan-info-from-skb-vlan_tci-instead-of-skb-data/20160811-042500
> config: cris-etrax-100lx_v2_defconfig (attached as .config)
> compiler: cris-linux-gcc (GCC) 4.6.3
> reproduce:
> wget 
> https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
>  -O ~/bin/make.cross
> chmod +x ~/bin/make.cross
> # save the attached .config to linux build tree
> make.cross ARCH=cris
>
> All errors (new ones prefixed by >>):
>
>In function 'flow_keys_hash_length.isra.6',
>inlined from '__flow_hash_from_keys' at 
> net/core/flow_dissector.c:599:26,
>inlined from 'flow_hash_from_keys' at net/core/flow_dissector.c:610:2:
>>> net/core/flow_dissector.c:512:2: error: call to '__compiletime_assert_512' 
>>> declared with attribute error: BUILD_BUG_ON failed: (sizeof(*flow) - 
>>> FLOW_KEYS_HASH_OFFSET) % sizeof(u32)

[...]


I'm working on a fix, will send it soon.

I'll be happy to get your review and comments on my flow_dissector and
flower patches.

Thank you,

Hadar

[...]


[PATCH net-next 1/4] flow_dissector: Get vlan info from skb->vlan_tci instead of skb->data

2016-08-10 Thread Hadar Hen Zion
Early in the datapath skb_vlan_untag function is called, stripped
the vlan from the skb and set skb->vlan_tci and skb->vlan_proto fields.

The current dissection doesn't handle vlan packets correctly.  Vlan
doesn't exist in skb->data anymore when applying flow dissection on the
skb, fix that.

Fixes: 0744dd00c1b1 ('net: introduce skb_flow_dissect()')
Signed-off-by: Hadar Hen Zion <had...@mellanox.com>
---
 net/core/flow_dissector.c | 13 +++--
 1 file changed, 3 insertions(+), 10 deletions(-)

diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index 61ad43f..6060fc2 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -122,7 +122,8 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
 
if (!data) {
data = skb->data;
-   proto = skb->protocol;
+   proto = skb_vlan_tag_present(skb) ?
+skb->vlan_proto : skb->protocol;
nhoff = skb_network_offset(skb);
hlen = skb_headlen(skb);
}
@@ -240,13 +241,6 @@ ipv6:
}
case htons(ETH_P_8021AD):
case htons(ETH_P_8021Q): {
-   const struct vlan_hdr *vlan;
-   struct vlan_hdr _vlan;
-
-   vlan = __skb_header_pointer(skb, nhoff, sizeof(_vlan), data, 
hlen, &_vlan);
-   if (!vlan)
-   goto out_bad;
-
if (dissector_uses_key(flow_dissector,
   FLOW_DISSECTOR_KEY_VLANID)) {
key_tags = skb_flow_dissector_target(flow_dissector,
@@ -256,8 +250,7 @@ ipv6:
key_tags->vlan_id = skb_vlan_tag_get_id(skb);
}
 
-   proto = vlan->h_vlan_encapsulated_proto;
-   nhoff += sizeof(*vlan);
+   proto = skb->protocol;
goto again;
}
case htons(ETH_P_PPP_SES): {
-- 
1.8.3.1



[PATCH net-next 3/4] net_sched: flower: Add vlan support

2016-08-10 Thread Hadar Hen Zion
Enhance flower to support 802.1Q vlan protocol classification.
Currently, the supported fields are vlan_id and vlan_priority.

Example:

# add a flower filter with vlan id and priority classification
tc filter add dev ens4f0 protocol 802.1Q parent : \
flower \
indev ens4f0 \
vlan_ethtype ipv4 \
vlan_id 100 \
vlan_prio 3 \
action vlan pop

Signed-off-by: Hadar Hen Zion <had...@mellanox.com>
---
 include/uapi/linux/pkt_cls.h |  3 ++
 net/sched/cls_flower.c   | 69 ++--
 2 files changed, 69 insertions(+), 3 deletions(-)

diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index d1c1cca..51b5b24 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -428,6 +428,9 @@ enum {
TCA_FLOWER_KEY_UDP_DST, /* be16 */
 
TCA_FLOWER_FLAGS,
+   TCA_FLOWER_KEY_VLAN_ID,
+   TCA_FLOWER_KEY_VLAN_PRIO,
+   TCA_FLOWER_KEY_VLAN_ETH_TYPE,
__TCA_FLOWER_MAX,
 };
 
diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index 5060801..4e249be 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -28,6 +28,7 @@ struct fl_flow_key {
struct flow_dissector_key_control control;
struct flow_dissector_key_basic basic;
struct flow_dissector_key_eth_addrs eth;
+   struct flow_dissector_key_vlan vlan;
struct flow_dissector_key_addrs ipaddrs;
union {
struct flow_dissector_key_ipv4_addrs ipv4;
@@ -293,6 +294,10 @@ static const struct nla_policy fl_policy[TCA_FLOWER_MAX + 
1] = {
[TCA_FLOWER_KEY_TCP_DST]= { .type = NLA_U16 },
[TCA_FLOWER_KEY_UDP_SRC]= { .type = NLA_U16 },
[TCA_FLOWER_KEY_UDP_DST]= { .type = NLA_U16 },
+   [TCA_FLOWER_KEY_VLAN_ID]= { .type = NLA_U16 },
+   [TCA_FLOWER_KEY_VLAN_PRIO]  = { .type = NLA_U8 },
+   [TCA_FLOWER_KEY_VLAN_ETH_TYPE]  = { .type = NLA_U16 },
+
 };
 
 static void fl_set_key_val(struct nlattr **tb,
@@ -308,9 +313,28 @@ static void fl_set_key_val(struct nlattr **tb,
memcpy(mask, nla_data(tb[mask_type]), len);
 }
 
+static void fl_set_key_vlan(struct nlattr **tb,
+   struct flow_dissector_key_vlan *key_val,
+   struct flow_dissector_key_vlan *key_mask)
+{
+#define VLAN_PRIORITY_MASK 0x7
+
+   if (tb[TCA_FLOWER_KEY_VLAN_ID]) {
+   key_val->vlan_id =
+   nla_get_u16(tb[TCA_FLOWER_KEY_VLAN_ID]) & VLAN_VID_MASK;
+   key_mask->vlan_id = VLAN_VID_MASK;
+   }
+   if (tb[TCA_FLOWER_KEY_VLAN_PRIO]) {
+   key_val->vlan_priority =
+   nla_get_u8(tb[TCA_FLOWER_KEY_VLAN_PRIO]) & 
VLAN_PRIORITY_MASK;
+   key_mask->vlan_priority = VLAN_PRIORITY_MASK;
+   }
+}
+
 static int fl_set_key(struct net *net, struct nlattr **tb,
  struct fl_flow_key *key, struct fl_flow_key *mask)
 {
+   __be16 ethertype;
 #ifdef CONFIG_NET_CLS_IND
if (tb[TCA_FLOWER_INDEV]) {
int err = tcf_change_indev(net, tb[TCA_FLOWER_INDEV]);
@@ -328,9 +352,19 @@ static int fl_set_key(struct net *net, struct nlattr **tb,
   mask->eth.src, TCA_FLOWER_KEY_ETH_SRC_MASK,
   sizeof(key->eth.src));
 
-   fl_set_key_val(tb, >basic.n_proto, TCA_FLOWER_KEY_ETH_TYPE,
-  >basic.n_proto, TCA_FLOWER_UNSPEC,
-  sizeof(key->basic.n_proto));
+   if (tb[TCA_FLOWER_KEY_ETH_TYPE])
+   ethertype = nla_get_be16(tb[TCA_FLOWER_KEY_ETH_TYPE]);
+
+   if (ethertype == htons(ETH_P_8021Q)) {
+   fl_set_key_vlan(tb, >vlan, >vlan);
+   fl_set_key_val(tb, >basic.n_proto,
+  TCA_FLOWER_KEY_VLAN_ETH_TYPE,
+  >basic.n_proto, TCA_FLOWER_UNSPEC,
+  sizeof(key->basic.n_proto));
+   } else {
+   key->basic.n_proto = ethertype;
+   mask->basic.n_proto = cpu_to_be16(~0);
+   }
 
if (key->basic.n_proto == htons(ETH_P_IP) ||
key->basic.n_proto == htons(ETH_P_IPV6)) {
@@ -440,6 +474,8 @@ static void fl_init_dissector(struct cls_fl_head *head,
   FLOW_DISSECTOR_KEY_IPV6_ADDRS, ipv6);
FL_KEY_SET_IF_IN_RANGE(mask, keys, cnt,
   FLOW_DISSECTOR_KEY_PORTS, tp);
+   FL_KEY_SET_IF_IN_RANGE(mask, keys, cnt,
+  FLOW_DISSECTOR_KEY_VLAN, vlan);
 
skb_flow_dissector_init(>dissector, keys, cnt);
 }
@@ -668,6 +704,29 @@ static int fl_dump_key_val(struct sk_buff *skb,
return 0;
 }
 
+static int fl_dump_key_vlan(struct sk_buff *skb,
+   struc

[PATCH net-next 0/4] net_sched, flow_dissector, flower: Introduce vlan tag support

2016-08-10 Thread Hadar Hen Zion
This patchset introduce vlan tag support to the flower classifier and the flow
dissector. In addition to adding vlan priority to act vlan.

The first 2 patches are dealing with the flow dissector:
 - The first patch is a fix, vlan id value should be taken from skb->vlan_tci
   and not from skb->data.
 - The second patch adds support for vlan priority.

The third patch adds vlan tag support to the flower classifier, user space
patches will be sent later to complete it.
The last patch adds vlan priority to act vlan since only vlan id is currently 
supported.

Hadar Hen Zion (4):
  flow_dissector: Get vlan info from skb->vlan_tci instead of skb->data
  flow_dissector: Get vlan priority in addition to vlan id
  net_sched: flower: Add vlan support
  net_sched: act_vlan: Add priority option

 include/linux/if_vlan.h |  1 +
 include/net/flow_dissector.h| 11 --
 include/net/tc_act/tc_vlan.h|  1 +
 include/uapi/linux/pkt_cls.h|  3 ++
 include/uapi/linux/tc_act/tc_vlan.h |  1 +
 net/core/flow_dissector.c   | 28 +++
 net/sched/act_vlan.c| 13 +--
 net/sched/cls_flower.c  | 69 +++--
 8 files changed, 103 insertions(+), 24 deletions(-)

-- 
1.8.3.1



[PATCH net-next 2/4] flow_dissector: Get vlan priority in addition to vlan id

2016-08-10 Thread Hadar Hen Zion
Add vlan priority check to the flow dissector by adding new flow
dissector struct, flow_dissector_key_vlan which includes vlan tag
fields.

vlan_id and flow_label fields were under the same struct
(flow_dissector_key_tags). It was a convenient setting since struct
flow_dissector_key_tags is used by struct flow_keys and by setting
vlan_id and flow_label under the same struct, we get precisely 24 or 48
bytes in flow_keys from flow_dissector_key_basic.

Now, when adding vlan priority support, the code will be cleaner if
flow_label and vlan tag won't be under the same struct anymore.

Signed-off-by: Hadar Hen Zion <had...@mellanox.com>
---
 include/linux/if_vlan.h  |  1 +
 include/net/flow_dissector.h | 11 ---
 net/core/flow_dissector.c| 15 +--
 3 files changed, 18 insertions(+), 9 deletions(-)

diff --git a/include/linux/if_vlan.h b/include/linux/if_vlan.h
index a5f6ce6..49d4aef 100644
--- a/include/linux/if_vlan.h
+++ b/include/linux/if_vlan.h
@@ -81,6 +81,7 @@ static inline bool is_vlan_dev(const struct net_device *dev)
 #define skb_vlan_tag_present(__skb)((__skb)->vlan_tci & VLAN_TAG_PRESENT)
 #define skb_vlan_tag_get(__skb)((__skb)->vlan_tci & 
~VLAN_TAG_PRESENT)
 #define skb_vlan_tag_get_id(__skb) ((__skb)->vlan_tci & VLAN_VID_MASK)
+#define skb_vlan_tag_get_prio(__skb)   ((__skb)->vlan_tci & VLAN_PRIO_MASK)
 
 /**
  * struct vlan_pcpu_stats - VLAN percpu rx/tx stats
diff --git a/include/net/flow_dissector.h b/include/net/flow_dissector.h
index d3d60dc..3781f18 100644
--- a/include/net/flow_dissector.h
+++ b/include/net/flow_dissector.h
@@ -32,8 +32,12 @@ struct flow_dissector_key_basic {
 };
 
 struct flow_dissector_key_tags {
-   u32 vlan_id:12,
-   flow_label:20;
+   u32 flow_label:20;
+};
+
+struct flow_dissector_key_vlan {
+   u16 vlan_id:12,
+   vlan_priority:3;
 };
 
 struct flow_dissector_key_keyid {
@@ -119,7 +123,7 @@ enum flow_dissector_key_id {
FLOW_DISSECTOR_KEY_PORTS, /* struct flow_dissector_key_ports */
FLOW_DISSECTOR_KEY_ETH_ADDRS, /* struct flow_dissector_key_eth_addrs */
FLOW_DISSECTOR_KEY_TIPC_ADDRS, /* struct flow_dissector_key_tipc_addrs 
*/
-   FLOW_DISSECTOR_KEY_VLANID, /* struct flow_dissector_key_flow_tags */
+   FLOW_DISSECTOR_KEY_VLAN, /* struct flow_dissector_key_flow_vlan */
FLOW_DISSECTOR_KEY_FLOW_LABEL, /* struct flow_dissector_key_flow_tags */
FLOW_DISSECTOR_KEY_GRE_KEYID, /* struct flow_dissector_key_keyid */
FLOW_DISSECTOR_KEY_MPLS_ENTROPY, /* struct flow_dissector_key_keyid */
@@ -148,6 +152,7 @@ struct flow_keys {
 #define FLOW_KEYS_HASH_START_FIELD basic
struct flow_dissector_key_basic basic;
struct flow_dissector_key_tags tags;
+   struct flow_dissector_key_vlan vlan;
struct flow_dissector_key_keyid keyid;
struct flow_dissector_key_ports ports;
struct flow_dissector_key_addrs addrs;
diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index 6060fc2..6dfcb10 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -116,6 +116,7 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
struct flow_dissector_key_addrs *key_addrs;
struct flow_dissector_key_ports *key_ports;
struct flow_dissector_key_tags *key_tags;
+   struct flow_dissector_key_vlan *key_vlan;
struct flow_dissector_key_keyid *key_keyid;
u8 ip_proto = 0;
bool ret = false;
@@ -242,12 +243,14 @@ ipv6:
case htons(ETH_P_8021AD):
case htons(ETH_P_8021Q): {
if (dissector_uses_key(flow_dissector,
-  FLOW_DISSECTOR_KEY_VLANID)) {
-   key_tags = skb_flow_dissector_target(flow_dissector,
-
FLOW_DISSECTOR_KEY_VLANID,
+  FLOW_DISSECTOR_KEY_VLAN)) {
+   key_vlan = skb_flow_dissector_target(flow_dissector,
+
FLOW_DISSECTOR_KEY_VLAN,
 target_container);
 
-   key_tags->vlan_id = skb_vlan_tag_get_id(skb);
+   key_vlan->vlan_id = skb_vlan_tag_get_id(skb);
+   key_vlan->vlan_priority =
+   (skb_vlan_tag_get_prio(skb) >> VLAN_PRIO_SHIFT);
}
 
proto = skb->protocol;
@@ -865,8 +868,8 @@ static const struct flow_dissector_key 
flow_keys_dissector_keys[] = {
.offset = offsetof(struct flow_keys, ports),
},
{
-   .key_id = FLOW_DISSECTOR_KEY_VLANID,
-   .offset = offsetof(struct flow_keys, tags),
+   .key_id = FLOW_DISSECTOR_KEY_VLAN,
+   .offset = offsetof(struct flow_keys,

[PATCH net-next 4/4] net_sched: act_vlan: Add priority option

2016-08-10 Thread Hadar Hen Zion
The current vlan push action supports only vid and protocol options.
Add priority option.

Example script that adds vlan push action with vid and
priority:

tc filter add dev veth0 protocol ip parent : \
   flower \
indev veth0 \
   action vlan push id 100 priority 5

Signed-off-by: Hadar Hen Zion <had...@mellanox.com>
---
 include/net/tc_act/tc_vlan.h|  1 +
 include/uapi/linux/tc_act/tc_vlan.h |  1 +
 net/sched/act_vlan.c| 13 +++--
 3 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/include/net/tc_act/tc_vlan.h b/include/net/tc_act/tc_vlan.h
index e29f52e..6b83588 100644
--- a/include/net/tc_act/tc_vlan.h
+++ b/include/net/tc_act/tc_vlan.h
@@ -20,6 +20,7 @@ struct tcf_vlan {
int tcfv_action;
u16 tcfv_push_vid;
__be16  tcfv_push_proto;
+   u8  tcfv_push_prio;
 };
 #define to_vlan(a) ((struct tcf_vlan *)a)
 
diff --git a/include/uapi/linux/tc_act/tc_vlan.h 
b/include/uapi/linux/tc_act/tc_vlan.h
index 31151ff62..be72b6e 100644
--- a/include/uapi/linux/tc_act/tc_vlan.h
+++ b/include/uapi/linux/tc_act/tc_vlan.h
@@ -29,6 +29,7 @@ enum {
TCA_VLAN_PUSH_VLAN_ID,
TCA_VLAN_PUSH_VLAN_PROTOCOL,
TCA_VLAN_PAD,
+   TCA_VLAN_PUSH_VLAN_PRIORITY,
__TCA_VLAN_MAX,
 };
 #define TCA_VLAN_MAX (__TCA_VLAN_MAX - 1)
diff --git a/net/sched/act_vlan.c b/net/sched/act_vlan.c
index 691409d..d2fcf7cd 100644
--- a/net/sched/act_vlan.c
+++ b/net/sched/act_vlan.c
@@ -43,7 +43,8 @@ static int tcf_vlan(struct sk_buff *skb, const struct 
tc_action *a,
goto drop;
break;
case TCA_VLAN_ACT_PUSH:
-   err = skb_vlan_push(skb, v->tcfv_push_proto, v->tcfv_push_vid);
+   err = skb_vlan_push(skb, v->tcfv_push_proto, v->tcfv_push_vid |
+   (v->tcfv_push_prio << VLAN_PRIO_SHIFT));
if (err)
goto drop;
break;
@@ -65,6 +66,7 @@ static const struct nla_policy vlan_policy[TCA_VLAN_MAX + 1] 
= {
[TCA_VLAN_PARMS]= { .len = sizeof(struct tc_vlan) },
[TCA_VLAN_PUSH_VLAN_ID] = { .type = NLA_U16 },
[TCA_VLAN_PUSH_VLAN_PROTOCOL]   = { .type = NLA_U16 },
+   [TCA_VLAN_PUSH_VLAN_PRIORITY]   = { .type = NLA_U8 },
 };
 
 static int tcf_vlan_init(struct net *net, struct nlattr *nla,
@@ -78,6 +80,7 @@ static int tcf_vlan_init(struct net *net, struct nlattr *nla,
int action;
__be16 push_vid = 0;
__be16 push_proto = 0;
+   u8 push_prio = 0;
bool exists = false;
int ret = 0, err;
 
@@ -123,6 +126,9 @@ static int tcf_vlan_init(struct net *net, struct nlattr 
*nla,
} else {
push_proto = htons(ETH_P_8021Q);
}
+
+   if (tb[TCA_VLAN_PUSH_VLAN_PRIORITY])
+   push_prio = nla_get_u8(tb[TCA_VLAN_PUSH_VLAN_PRIORITY]);
break;
default:
if (exists)
@@ -150,6 +156,7 @@ static int tcf_vlan_init(struct net *net, struct nlattr 
*nla,
 
v->tcfv_action = action;
v->tcfv_push_vid = push_vid;
+   v->tcfv_push_prio = push_prio;
v->tcfv_push_proto = push_proto;
 
v->tcf_action = parm->action;
@@ -181,7 +188,9 @@ static int tcf_vlan_dump(struct sk_buff *skb, struct 
tc_action *a,
if (v->tcfv_action == TCA_VLAN_ACT_PUSH &&
(nla_put_u16(skb, TCA_VLAN_PUSH_VLAN_ID, v->tcfv_push_vid) ||
 nla_put_be16(skb, TCA_VLAN_PUSH_VLAN_PROTOCOL,
- v->tcfv_push_proto)))
+ v->tcfv_push_proto) ||
+(v->tcfv_push_prio && nla_put_u8(skb, TCA_VLAN_PUSH_VLAN_PRIORITY,
+ v->tcfv_push_prio
goto nla_put_failure;
 
tcf_tm_dump(, >tcf_tm);
-- 
1.8.3.1



flow_dissector: Get vlan priority in addition to vlan id

2016-08-07 Thread Hadar Hen Zion
Hi Tom and Jiri,

I would like to add vlan priority to  __skb_flow_dissect.

In the current vlan tag implementation there isn't any room left to
vlan priority next to the vlan id.

struct flow_dissector_key_tags {
u32 vlan_id:12,
flow_label:20;
};

According to the discussion between you two [1], I'll be happy to get
your advice about what is the best way of adding vlan priority?

My suggestion is to add new vlan tag struct, it will make the code
cleaner and since we have to add 3 bits to vlan priority any way it
won't add unnecessary physical address holes.

struct flow_dissector_key_tags {
u32 flow_label:20;
};

struct flow_dissector_key_vlan {
u16 vlan_id:12,
vlan_priority:3;
};

Thanks,
Hadar

[1] - http://marc.info/?l=linux-netdev=143232557025994=2


Re: [PATCH net-next V2 0/2] Mellanox 100G mlx5 minimum inline header mode

2016-07-26 Thread Hadar Hen Zion
On Tue, Jul 26, 2016 at 3:54 AM, David Miller  wrote:
> From: Saeed Mahameed 
> Date: Sun, 24 Jul 2016 16:12:38 +0300
>
>> This small series from Hadar adds the support for minimum inline
>> header mode query in mlx5e NIC driver.
>>
>> Today on TX the driver copies to the HW descriptor only up to L2
>> header which is the default required mode and sufficient for today's
>> needs.
>>
>> The header in the HW descriptor is used for HW loopback steering
>> decision, without it packets will go directly to the wire with no
>> questions asked.
>>
>> For TX loopback steering according to L2/L3/L4 headers, ConnectX-4
>> requires to copy the corresponding headers into the send queue(SQ)
>> WQE HW descriptor so it can decide whether to loop it back or to
>> forward to wire.
>>
>> For legacy E-Switch mode only L2 headers copy is required.  For
>> advanced steering (E-Switch offloads) more header layers may be
>> required to be copied, the required mode will be advertised by FW to
>> each VF and PF according to the corresponding E-Switch
>> configuration.
>>
>> Changes V2:
>>  - Allocate query_nic_vport_context_out on the stack
>
> Applied, but even doing an eth_get_headlen() every transmitted packet
> it really too expensive.
>
> You shouldn't be touching networking headers so much when forwarding
> frames.

Sorry for re-sending, i had a problem before.

In the default case eth_get_headlen() won't be called, it will happen
only if PF administrator changes the mode from default to L4.

In L4 mode, we need to copy all the packet headers including L4, do
you know of a better/cheaper way for doing that?

Thanks,
Hadar


Re: [PATCH net-next V2 0/2] Mellanox 100G mlx5 minimum inline header mode

2016-07-26 Thread Hadar Hen Zion
In the default case eth_get_headlen() won't be called, it will happen
only if PF administrator changes the mode from default to L4.

In L4 mode, we need to copy all the packet headers including L4, do
you know of a better/cheaper way for doing that?

Thanks,
Hadar

On Tue, Jul 26, 2016 at 3:54 AM, David Miller  wrote:
> From: Saeed Mahameed 
> Date: Sun, 24 Jul 2016 16:12:38 +0300
>
>> This small series from Hadar adds the support for minimum inline
>> header mode query in mlx5e NIC driver.
>>
>> Today on TX the driver copies to the HW descriptor only up to L2
>> header which is the default required mode and sufficient for today's
>> needs.
>>
>> The header in the HW descriptor is used for HW loopback steering
>> decision, without it packets will go directly to the wire with no
>> questions asked.
>>
>> For TX loopback steering according to L2/L3/L4 headers, ConnectX-4
>> requires to copy the corresponding headers into the send queue(SQ)
>> WQE HW descriptor so it can decide whether to loop it back or to
>> forward to wire.
>>
>> For legacy E-Switch mode only L2 headers copy is required.  For
>> advanced steering (E-Switch offloads) more header layers may be
>> required to be copied, the required mode will be advertised by FW to
>> each VF and PF according to the corresponding E-Switch
>> configuration.
>>
>> Changes V2:
>>  - Allocate query_nic_vport_context_out on the stack
>
> Applied, but even doing an eth_get_headlen() every transmitted packet
> it really too expensive.
>
> You shouldn't be touching networking headers so much when forwarding
> frames.


[PATCH ethtool V3 2/2] ethtool.8.in: Add man page for tunable copybreak

2015-06-11 Thread Hadar Hen Zion
From: Govindarajulu Varadarajan _gov...@gmx.com

Signed-off-by: Govindarajulu Varadarajan _gov...@gmx.com
Signed-off-by: Hadar Hen Zion had...@mellanox.com
---
 ethtool.8.in | 20 
 1 file changed, 20 insertions(+)

diff --git a/ethtool.8.in b/ethtool.8.in
index ae56293..eae630e 100644
--- a/ethtool.8.in
+++ b/ethtool.8.in
@@ -176,6 +176,14 @@ ethtool \- query or control network driver and hardware 
settings
 .BN rx\-jumbo
 .BN tx
 .HP
+.B ethtool \-b|\-\-show\-tunable
+.I devname
+.HP
+.B ethtool \-B|\-\-set\-tunable
+.I devname
+.BN rx-copybreak
+.BN tx-copybreak
+.HP
 .B ethtool \-i|\-\-driver
 .I devname
 .HP
@@ -399,6 +407,18 @@ Changes the number of ring entries for the Rx Jumbo ring.
 .BI tx \ N
 Changes the number of ring entries for the Tx ring.
 .TP
+.B \-b|\-\-show\-tunable
+Get device tunable values
+.TP
+.B \-B|\-\-set\-tunable
+Set device tunable values
+.TP
+.BI rx-copybreak \ N
+Change rx_copybreak
+.TP
+.BI tx-copybreak \ N
+Change tx_copybreak
+.TP
 .B \-i \-\-driver
 Queries the specified network device for associated driver information.
 .TP
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH ethtool V3 0/2] Add copybreak support

2015-06-11 Thread Hadar Hen Zion
Hi Ben,

This series add support for setting/getting driver's tx/rx_copybreak value.

Copybreak is handled through a new ethtool tunable interface.

The kernel support will be avilable from kernel 4.2, commit net/ethtool: Add
current supported tunable options.

The series was originally sent by Govindarajulu Varadarajan, I fixed the
comments and resend the series.

Thanks,
Hadar

Changes form V2:
- Change -B/-b to generic tunable option.
- Remove tunable names from user space, defined tunable strings names in the
  kernel.
- Remove the third patch - ethtool-copy.h: Sync with net-next 3.17.0-rc7

Govindarajulu Varadarajan (2):
  ethtool: Add copybreak support
  ethtool.8.in: Add man page for tunable copybreak

 ethtool-copy.h |   1 +
 ethtool.8.in   |  20 +
 ethtool.c  | 227 +
 3 files changed, 248 insertions(+)

-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH ethtool V3 1/2] ethtool: Add copybreak support

2015-06-11 Thread Hadar Hen Zion
From: Govindarajulu Varadarajan _gov...@gmx.com

Add support for setting/getting driver's tx/rx_copybreak value.

Copybreak is handled through a new ethtool tunable interface.

The kernel support was added in 3.18, commit f0db9b07341 ethtool:
Add generic options for tunables

Signed-off-by: Govindarajulu Varadarajan _gov...@gmx.com
Signed-off-by: Hadar Hen Zion had...@mellanox.com
---
 ethtool-copy.h |   1 +
 ethtool.c  | 227 +
 2 files changed, 228 insertions(+)

diff --git a/ethtool-copy.h b/ethtool-copy.h
index d23ffc4..f92743b 100644
--- a/ethtool-copy.h
+++ b/ethtool-copy.h
@@ -545,6 +545,7 @@ enum ethtool_stringset {
ETH_SS_NTUPLE_FILTERS,
ETH_SS_FEATURES,
ETH_SS_RSS_HASH_FUNCS,
+   ETH_SS_TUNABLES,
 };
 
 /**
diff --git a/ethtool.c b/ethtool.c
index 01b13a6..16b5c41 100644
--- a/ethtool.c
+++ b/ethtool.c
@@ -145,6 +145,12 @@ struct cmdline_info {
void *seen_val;
 };
 
+struct ethtool_stunable {
+   cmdline_type_t type;
+   __u32 u32_val;
+   int seen_val;
+};
+
 struct flag_info {
const char *name;
u32 value;
@@ -1800,6 +1806,223 @@ static int do_gring(struct cmd_context *ctx)
return 0;
 }
 
+static int get_u32tunable(struct cmd_context *ctx, enum tunable_id id,
+ __u32 *value)
+{
+   struct ethtool_tunable *etuna;
+   int ret;
+
+   etuna = calloc(sizeof(*etuna) + sizeof(__u32), 1);
+   if (!etuna)
+   return 1;
+   etuna-cmd = ETHTOOL_GTUNABLE;
+   etuna-id = id;
+   etuna-type_id = ETHTOOL_TUNABLE_U32;
+   etuna-len = sizeof(__u32);
+   ret = send_ioctl(ctx, etuna);
+   *value = *(__u32 *)((void *)etuna + sizeof(*etuna));
+   free(etuna);
+
+   return ret;
+}
+
+static int print_u32tunable(int err, struct ethtool_gstrings *tunables,
+   enum tunable_id id, const __u32 value)
+{
+   char *tunable_name = (char *)tunables-data + id * ETH_GSTRING_LEN;
+
+   if (err) {
+   switch (errno) {
+   /* Driver does not support this particular tunable
+* Usually displays 0
+*/
+   case EINVAL:
+   goto print;
+   /* Driver does not support get tunables ops or no such device
+* No point in proceeding further
+*/
+   case EOPNOTSUPP:
+   case ENODEV:
+   perror(Cannot get device settings);
+   exit(err);
+   default:
+   perror(tunable_name);
+   return err;
+   }
+   }
+print:
+   fprintf(stdout, %s: %u\n, tunable_name, value);
+
+   return 0;
+}
+
+static int do_gtunables(struct cmd_context *ctx)
+{
+   int err, anyerror = 0;
+   __u32 u32value = 0;
+   struct ethtool_gstrings *tunables;
+   int idx;
+   __u32 n_tunables;
+
+   if (ctx-argc != 0)
+   exit_bad_args();
+
+   tunables = get_stringset(ctx, ETH_SS_TUNABLES, 0, 1);
+   if (!tunables) {
+   perror(Cannot get tunables names);
+   return 1;
+   }
+   if (tunables-len == 0) {
+   fprintf(stderr, No tunables defined\n);
+   return 1;
+   }
+   n_tunables = tunables-len;
+
+   fprintf(stdout, Tunables settings for device %s\n, ctx-devname);
+
+   for (idx = 0; idx  n_tunables; idx++) {
+   switch(idx) {
+   case ETHTOOL_ID_UNSPEC:
+   break;
+   case ETHTOOL_RX_COPYBREAK:
+   case ETHTOOL_TX_COPYBREAK:
+   err = get_u32tunable(ctx, idx, u32value);
+   err = print_u32tunable(err, tunables, idx, u32value);
+   if (err)
+   anyerror = err;
+   break;
+   default:
+   anyerror = EINVAL;
+   }
+   }
+   if (anyerror)
+   fprintf(stderr, Failed to get all settings. displayed partial 
settings\n);
+
+   free(tunables);
+   return anyerror;
+}
+
+static int set_u32tunable(struct cmd_context *ctx, enum tunable_id id,
+ const __u32 value)
+{
+   struct ethtool_tunable *etuna;
+   int ret;
+   __u32 *data;
+
+   etuna = malloc(sizeof(*etuna) + sizeof(__u32));
+   if (!etuna) {
+   perror(Set tunable:);
+   return 1;
+   }
+   data = (void *)etuna + sizeof(*etuna);
+   *data = value;
+   etuna-cmd = ETHTOOL_STUNABLE;
+   etuna-id = id;
+   etuna-type_id = ETHTOOL_TUNABLE_U32;
+   etuna-len = sizeof(__u32);
+   ret = send_ioctl(ctx, etuna);
+   free(etuna);
+
+   return ret;
+}
+
+static int check_set_u32tunable(int err, enum tunable_id id)
+{
+   if (err) {
+   switch (errno

[PATCH net-next V1] net/ethtool: Add current supported tunable options

2015-06-11 Thread Hadar Hen Zion
Add strings array of the current supported tunable options.

Signed-off-by: Hadar Hen Zion had...@mellanox.com
Reviewed-by: Amir Vadai am...@mellanox.com
---
Changes from V0:
- s/ETHTOOL_TUNABLE_COUNT/__ETHTOOL_TUNABLE_COUNT/

 include/uapi/linux/ethtool.h |  6 ++
 net/core/ethtool.c   | 12 
 2 files changed, 18 insertions(+)

diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
index 0594933..cd67aec 100644
--- a/include/uapi/linux/ethtool.h
+++ b/include/uapi/linux/ethtool.h
@@ -215,6 +215,11 @@ enum tunable_id {
ETHTOOL_ID_UNSPEC,
ETHTOOL_RX_COPYBREAK,
ETHTOOL_TX_COPYBREAK,
+   /*
+* Add your fresh new tubale attribute above and remember to update
+* tunable_strings[] in net/core/ethtool.c
+*/
+   __ETHTOOL_TUNABLE_COUNT,
 };
 
 enum tunable_type_id {
@@ -545,6 +550,7 @@ enum ethtool_stringset {
ETH_SS_NTUPLE_FILTERS,
ETH_SS_FEATURES,
ETH_SS_RSS_HASH_FUNCS,
+   ETH_SS_TUNABLES,
 };
 
 /**
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index eb0c3ac..b495ab1 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -106,6 +106,13 @@ 
rss_hash_func_strings[ETH_RSS_HASH_FUNCS_COUNT][ETH_GSTRING_LEN] = {
[ETH_RSS_HASH_XOR_BIT] =xor,
 };
 
+static const char
+tunable_strings[__ETHTOOL_TUNABLE_COUNT][ETH_GSTRING_LEN] = {
+   [ETHTOOL_ID_UNSPEC] = Unspec,
+   [ETHTOOL_RX_COPYBREAK]  = rx-copybreak,
+   [ETHTOOL_TX_COPYBREAK]  = tx-copybreak,
+};
+
 static int ethtool_get_features(struct net_device *dev, void __user *useraddr)
 {
struct ethtool_gfeatures cmd = {
@@ -194,6 +201,9 @@ static int __ethtool_get_sset_count(struct net_device *dev, 
int sset)
if (sset == ETH_SS_RSS_HASH_FUNCS)
return ARRAY_SIZE(rss_hash_func_strings);
 
+   if (sset == ETH_SS_TUNABLES)
+   return ARRAY_SIZE(tunable_strings);
+
if (ops-get_sset_count  ops-get_strings)
return ops-get_sset_count(dev, sset);
else
@@ -211,6 +221,8 @@ static void __ethtool_get_strings(struct net_device *dev,
else if (stringset == ETH_SS_RSS_HASH_FUNCS)
memcpy(data, rss_hash_func_strings,
   sizeof(rss_hash_func_strings));
+   else if (stringset == ETH_SS_TUNABLES)
+   memcpy(data, tunable_strings, sizeof(tunable_strings));
else
/* ops-get_strings is valid because checked earlier */
ops-get_strings(dev, stringset, data);
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next] net/ethtool: Add current supported tunable options

2015-06-09 Thread Hadar Hen Zion
Add strings array of the current supported tunable options.

Signed-off-by: Hadar Hen Zion had...@mellanox.com
Reviewed-by: Amir Vadai am...@mellanox.com
---
 include/uapi/linux/ethtool.h |  6 ++
 net/core/ethtool.c   | 12 
 2 files changed, 18 insertions(+)

diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
index 0594933..90e4bff 100644
--- a/include/uapi/linux/ethtool.h
+++ b/include/uapi/linux/ethtool.h
@@ -215,6 +215,11 @@ enum tunable_id {
ETHTOOL_ID_UNSPEC,
ETHTOOL_RX_COPYBREAK,
ETHTOOL_TX_COPYBREAK,
+   /*
+* Add your fresh new tubale attribute above and remember to update
+* tunable_strings[] in net/core/ethtool.c
+*/
+   ETHTOOL_TUNABLE_COUNT,
 };
 
 enum tunable_type_id {
@@ -545,6 +550,7 @@ enum ethtool_stringset {
ETH_SS_NTUPLE_FILTERS,
ETH_SS_FEATURES,
ETH_SS_RSS_HASH_FUNCS,
+   ETH_SS_TUNABLES,
 };
 
 /**
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index eb0c3ac..13bbb47 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -106,6 +106,13 @@ 
rss_hash_func_strings[ETH_RSS_HASH_FUNCS_COUNT][ETH_GSTRING_LEN] = {
[ETH_RSS_HASH_XOR_BIT] =xor,
 };
 
+static const char
+tunable_strings[ETHTOOL_TUNABLE_COUNT][ETH_GSTRING_LEN] = {
+   [ETHTOOL_ID_UNSPEC] = Unspec,
+   [ETHTOOL_RX_COPYBREAK]  = rx-copybreak,
+   [ETHTOOL_TX_COPYBREAK]  = tx-copybreak,
+};
+
 static int ethtool_get_features(struct net_device *dev, void __user *useraddr)
 {
struct ethtool_gfeatures cmd = {
@@ -194,6 +201,9 @@ static int __ethtool_get_sset_count(struct net_device *dev, 
int sset)
if (sset == ETH_SS_RSS_HASH_FUNCS)
return ARRAY_SIZE(rss_hash_func_strings);
 
+   if (sset == ETH_SS_TUNABLES)
+   return ARRAY_SIZE(tunable_strings);
+
if (ops-get_sset_count  ops-get_strings)
return ops-get_sset_count(dev, sset);
else
@@ -211,6 +221,8 @@ static void __ethtool_get_strings(struct net_device *dev,
else if (stringset == ETH_SS_RSS_HASH_FUNCS)
memcpy(data, rss_hash_func_strings,
   sizeof(rss_hash_func_strings));
+   else if (stringset == ETH_SS_TUNABLES)
+   memcpy(data, tunable_strings, sizeof(tunable_strings));
else
/* ops-get_strings is valid because checked earlier */
ops-get_strings(dev, stringset, data);
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html