Easy way to set NOTRACK for INPUT, FORWARD and OUTPUT independently

2016-12-05 Thread mudrunka

Hello,
currently in iptables i can set NOTRACK (-j CT --notrack) only for 
OUTPUT and PREROUTING. Because the routing decision is made after the 
conntracking.


I need stateful firewall on INPUT, but conntrack on FORWARD is 
performance drawback for me. And i can imagine that someone might have 
exact oposite of this problem.


When i want to enable conntrack for input, but not for forwarding, i 
have to list all the ip adresses on local interfaces. This is big 
administrative PITA for several reasons. i have routers with hundreds of 
vlans and each of these vlans have multiple ip adresses - both ipv4 and 
ipv6. Disabling conntrack for FORWARD only means listing all of them in 
PREROUTING to disguise INPUT traffic from the FORWARDed one. This is 
annoying and prone to error.


It would be super useful if one can simply use "-j CT --notrack" in 
INPUT and FORWARD. (it already works in OUTPUT)


If it's impossible to postpone conntrack after routing decision, it 
might be possible to add some macro that would match any of local 
adresses that are currently on any of interfaces. like "--src local" or 
"--dst local". Currently i am using ipset filled by cron script with all 
these adresses parsed from "ip a s". But that's far from being elegant 
or reliable.


I am planning to switch over to nftables, so it might be another 
solution...
Is this planned to be fixed in nftables? If not can you please consider 
fixing it?



Thanks

Best regards
  Tomas Mudrunka
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Easy way to set NOTRACK for INPUT, FORWARD and OUTPUT independently

2016-12-05 Thread mudrunka

And by "disguise" i've meant "distinguish" :-)
T.

Dne 2016-12-06 06:54, mudru...@spoje.net napsal:

Hello,
currently in iptables i can set NOTRACK (-j CT --notrack) only for
OUTPUT and PREROUTING. Because the routing decision is made after the
conntracking.

I need stateful firewall on INPUT, but conntrack on FORWARD is
performance drawback for me. And i can imagine that someone might have
exact oposite of this problem.

When i want to enable conntrack for input, but not for forwarding, i
have to list all the ip adresses on local interfaces. This is big
administrative PITA for several reasons. i have routers with hundreds
of vlans and each of these vlans have multiple ip adresses - both ipv4
and ipv6. Disabling conntrack for FORWARD only means listing all of
them in PREROUTING to disguise INPUT traffic from the FORWARDed one.
This is annoying and prone to error.

It would be super useful if one can simply use "-j CT --notrack" in
INPUT and FORWARD. (it already works in OUTPUT)

If it's impossible to postpone conntrack after routing decision, it
might be possible to add some macro that would match any of local
adresses that are currently on any of interfaces. like "--src local"
or "--dst local". Currently i am using ipset filled by cron script
with all these adresses parsed from "ip a s". But that's far from
being elegant or reliable.

I am planning to switch over to nftables, so it might be another 
solution...

Is this planned to be fixed in nftables? If not can you please
consider fixing it?


Thanks

Best regards
  Tomas Mudrunka


--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH nft] datatype: Display pre-defined inet_service values in host byte order

2016-12-05 Thread Elise Lennion
nft describe displays, to the user, which values are available for a selector,
then the values should be in host byte order.

Reported-by: Pablo Neira Ayuso 
Fixes: ccc5da470e76 ("datatype: Replace getnameinfo() by internal lookup table")
Signed-off-by: Elise Lennion 
---
 include/datatype.h |  3 ++-
 src/datatype.c | 14 +++---
 src/expression.c   |  3 ++-
 3 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/include/datatype.h b/include/datatype.h
index d4fe817..a7db1df 100644
--- a/include/datatype.h
+++ b/include/datatype.h
@@ -191,7 +191,8 @@ extern struct error_record *symbolic_constant_parse(const 
struct expr *sym,
 extern void symbolic_constant_print(const struct symbol_table *tbl,
const struct expr *expr, bool quotes);
 extern void symbol_table_print(const struct symbol_table *tbl,
-  const struct datatype *dtype);
+  const struct datatype *dtype,
+  enum byteorder byteorder);
 
 extern struct symbol_table *rt_symbol_table_init(const char *filename);
 extern void rt_symbol_table_free(struct symbol_table *tbl);
diff --git a/src/datatype.c b/src/datatype.c
index b5d73bc..4f98a83 100644
--- a/src/datatype.c
+++ b/src/datatype.c
@@ -181,14 +181,22 @@ void symbolic_constant_print(const struct symbol_table 
*tbl,
 }
 
 void symbol_table_print(const struct symbol_table *tbl,
-   const struct datatype *dtype)
+   const struct datatype *dtype,
+   enum byteorder byteorder)
 {
const struct symbolic_constant *s;
unsigned int size = 2 * dtype->size / BITS_PER_BYTE;
+   long unsigned int value;
+
+   for (s = tbl->symbols; s->identifier != NULL; s++) {
+   if (byteorder == BYTEORDER_BIG_ENDIAN)
+   value = __constant_ntohs(s->value);
+   else
+   value = s->value;
 
-   for (s = tbl->symbols; s->identifier != NULL; s++)
printf("\t%-30s\t0x%.*" PRIx64 "\n",
-  s->identifier, size, s->value);
+  s->identifier, size, value);
+   }
 }
 
 static void invalid_type_print(const struct expr *expr)
diff --git a/src/expression.c b/src/expression.c
index a10af5d..2aada77 100644
--- a/src/expression.c
+++ b/src/expression.c
@@ -115,7 +115,8 @@ void expr_describe(const struct expr *expr)
 
if (expr->dtype->sym_tbl != NULL) {
printf("\npre-defined symbolic constants:\n");
-   symbol_table_print(expr->dtype->sym_tbl, expr->dtype);
+   symbol_table_print(expr->dtype->sym_tbl, expr->dtype,
+  expr->byteorder);
}
 }
 
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH nf-next] netfilter: xt_bpf: support ebpf

2016-12-05 Thread Florian Westphal
Willem de Bruijn  wrote:
> While we're discussing the patch, another question, about revisions: I
> tested both modified and original iptables binaries on both standard
> and modified kernels. It all works as expected, except for the case
> where both binaries are used on a single kernel. For instance:
> 
>   iptables -A OUTPUT -m bpf --bytecode "`./nfbpf_compile RAW 'udp port
> 8000'`" -j LOG
>   ./iptables.new -L
> 
> Here the new binary will interpret the object as xt_bpf_match_v1, but
> iptables has inserted xt_bpf_match. The same problem happens the other
> way around. A new binary can be made robust to detect old structs, but
> not the other way around. Specific to bpf, the existing xt_bpf code
> has an unfortunate bug that it always prints at least one line of
> code, even if ->bpf_program_num_elems == 0.
> 
> I notice that other extensions also do not necessarily only extend
> struct vN in vN+1. Is the above a known issue?

Yes, I guess noone ever bothered to fix this.

The kernel blob should contain the match/target revision number,
so userspace can in fact see that 'this is bpf v42', but iirc
the netfilter userspace just loads the highest userspace revision
supported by the kernel (which is then different for the 2 iptables
binaries).

But we *could* display message like 'kernel uses revision 2 but I can
only find 0 and 1' or fall back to the lower supported revision without
guess-the-struct-by-size games.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH nf-next] netfilter: xt_bpf: support ebpf

2016-12-05 Thread Willem de Bruijn
On Mon, Dec 5, 2016 at 6:29 PM, Willem de Bruijn  wrote:
> On Mon, Dec 5, 2016 at 6:22 PM, Pablo Neira Ayuso  wrote:
>> On Mon, Dec 05, 2016 at 06:06:05PM -0500, Willem de Bruijn wrote:
>> [...]
>>> Eric also suggests a private variable to avoid being subject to
>>> changes to PATH_MAX. Then we can indeed also choose an arbitrary lower
>>> length than current PATH_MAX.
>>
>> Good.
>>
>>> FWIW, there is a workaround for users with deeply nested paths: the
>>> path passed does not have to be absolute. It is literally what is
>>> passed on the command line to iptables right now, including relative
>>> addresses.
>>
>> If iptables userspace always expects to have the bpf file repository
>> in some given location (suggesting to have a directory that we specify
>> at ./configure time, similar to what we do with connlabel.conf), then
>> I think we can rely on relative paths. Would this be flexible enough
>> for your usecase?
>
> As long as it accepts relative paths, I think it will always work.
> Worst case, a user has to cd. No need for hardcoding the bpf mount
> point at compile time.
>
> I have the matching iptables patch for pinned objects, btw. Not for
> elf objects, which requires linking to libelf and parsing the object,
> which is more work (and perhaps best punted on by expanding libbpf in
> bcc to include this functionality. it already exists under samples/bpf
> and iproute2).

While we're discussing the patch, another question, about revisions: I
tested both modified and original iptables binaries on both standard
and modified kernels. It all works as expected, except for the case
where both binaries are used on a single kernel. For instance:

  iptables -A OUTPUT -m bpf --bytecode "`./nfbpf_compile RAW 'udp port
8000'`" -j LOG
  ./iptables.new -L

Here the new binary will interpret the object as xt_bpf_match_v1, but
iptables has inserted xt_bpf_match. The same problem happens the other
way around. A new binary can be made robust to detect old structs, but
not the other way around. Specific to bpf, the existing xt_bpf code
has an unfortunate bug that it always prints at least one line of
code, even if ->bpf_program_num_elems == 0.

I notice that other extensions also do not necessarily only extend
struct vN in vN+1. Is the above a known issue?
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH nf-next] netfilter: xt_bpf: support ebpf

2016-12-05 Thread Willem de Bruijn
On Mon, Dec 5, 2016 at 6:22 PM, Pablo Neira Ayuso  wrote:
> On Mon, Dec 05, 2016 at 06:06:05PM -0500, Willem de Bruijn wrote:
> [...]
>> Eric also suggests a private variable to avoid being subject to
>> changes to PATH_MAX. Then we can indeed also choose an arbitrary lower
>> length than current PATH_MAX.
>
> Good.
>
>> FWIW, there is a workaround for users with deeply nested paths: the
>> path passed does not have to be absolute. It is literally what is
>> passed on the command line to iptables right now, including relative
>> addresses.
>
> If iptables userspace always expects to have the bpf file repository
> in some given location (suggesting to have a directory that we specify
> at ./configure time, similar to what we do with connlabel.conf), then
> I think we can rely on relative paths. Would this be flexible enough
> for your usecase?

As long as it accepts relative paths, I think it will always work.
Worst case, a user has to cd. No need for hardcoding the bpf mount
point at compile time.

I have the matching iptables patch for pinned objects, btw. Not for
elf objects, which requires linking to libelf and parsing the object,
which is more work (and perhaps best punted on by expanding libbpf in
bcc to include this functionality. it already exists under samples/bpf
and iproute2).
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH nf-next] netfilter: xt_bpf: support ebpf

2016-12-05 Thread Pablo Neira Ayuso
On Mon, Dec 05, 2016 at 06:06:05PM -0500, Willem de Bruijn wrote:
[...]
> Eric also suggests a private variable to avoid being subject to
> changes to PATH_MAX. Then we can indeed also choose an arbitrary lower
> length than current PATH_MAX.

Good.

> FWIW, there is a workaround for users with deeply nested paths: the
> path passed does not have to be absolute. It is literally what is
> passed on the command line to iptables right now, including relative
> addresses.

If iptables userspace always expects to have the bpf file repository
in some given location (suggesting to have a directory that we specify
at ./configure time, similar to what we do with connlabel.conf), then
I think we can rely on relative paths. Would this be flexible enough
for your usecase?
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH nf-next] netfilter: xt_bpf: support ebpf

2016-12-05 Thread Willem de Bruijn
On Mon, Dec 5, 2016 at 6:00 PM, Pablo Neira Ayuso  wrote:
> On Mon, Dec 05, 2016 at 11:34:15PM +0100, Pablo Neira Ayuso wrote:
>> On Mon, Dec 05, 2016 at 10:30:01PM +0100, Florian Westphal wrote:
>> > Eric Dumazet  wrote:
>> > > On Mon, 2016-12-05 at 15:28 -0500, Willem de Bruijn wrote:
>> > > > From: Willem de Bruijn 
>> > > >
>> > > > Add support for attaching an eBPF object by file descriptor.
>> > > >
>> > > > The iptables binary can be called with a path to an elf object or a
>> > > > pinned bpf object. Also pass the mode and path to the kernel to be
>> > > > able to return it later for iptables dump and save.
>> > > >
>> > > > Signed-off-by: Willem de Bruijn 
>> > > > ---
>> > >
>> > > Assuming there is no simple way to get variable matchsize in iptables,
>> > > this looks good to me, thanks.
>> >
>> > It should be possible by setting kernel .matchsize to ~0 which
>> > suppresses strict size enforcement.
>> >
>> > Its currently only used by ebt_among, but this should work for any xtables
>> > module.
>>
>> This is likely going to trigger a large rewrite of the core userspace
>> iptables codebase, and likely going to pull part of the mess we have
>> in ebtables into iptables. So I'd prefer not to follow this path.
>
> So this variable path is there to annotate what userspace claims that
> is the file that contains the bpf blob that was loaded, actually this
> is irrelevant to the kernel, so this is just there to dump it back
> when iptables-save it is called. Just a side note, one could set
> anything there from userspace, point somewhere else actually...
>
> Well anyway, going back to the path problem to keep it simple: Why
> don't just trim this down to something smaller, are you really
> expecting to reach PATH_MAX in your usecase?

Not often. Module-specific limitations that differ from global
definitions are just a pain when they bite. This module also has an
arbitrary low limit on the length of the cBPF program passed, for
instance.

Eric also suggests a private variable to avoid being subject to
changes to PATH_MAX. Then we can indeed also choose an arbitrary lower
length than current PATH_MAX.

FWIW, there is a workaround for users with deeply nested paths: the
path passed does not have to be absolute. It is literally what is
passed on the command line to iptables right now, including relative
addresses.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH nf-next] netfilter: xt_bpf: support ebpf

2016-12-05 Thread Pablo Neira Ayuso
On Mon, Dec 05, 2016 at 02:59:09PM -0800, Eric Dumazet wrote:
> On Mon, 2016-12-05 at 23:40 +0100, Florian Westphal wrote:
> 
> > Fair enough, I have no objections to the patch.
> 
> An additional question is about PATH_MAX :
> 
> Is it guaranteed to stay at 4096 forever ?
> 
> To be safe, maybe we should use a constant of our own.

Right, this reminds me we have to fix something else.

So constant of our own plus something smaller, if possible, would be
good to go. Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH nf-next] netfilter: xt_bpf: support ebpf

2016-12-05 Thread Willem de Bruijn
On Mon, Dec 5, 2016 at 5:55 PM, Daniel Borkmann  wrote:
> Hi Willem,
>
> On 12/05/2016 09:28 PM, Willem de Bruijn wrote:
>>
>> From: Willem de Bruijn 
>>
>> Add support for attaching an eBPF object by file descriptor.
>>
>> The iptables binary can be called with a path to an elf object or a
>> pinned bpf object. Also pass the mode and path to the kernel to be
>> able to return it later for iptables dump and save.
>>
>> Signed-off-by: Willem de Bruijn 
>
>
> just out of pure curiosity, use case is for android guys wrt
> accounting, or anything specific that cls_bpf on tc ingress +
> egress cannot do already?

That is the immediate motivation, yes.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH nf-next] netfilter: xt_bpf: support ebpf

2016-12-05 Thread Pablo Neira Ayuso
On Mon, Dec 05, 2016 at 11:34:15PM +0100, Pablo Neira Ayuso wrote:
> On Mon, Dec 05, 2016 at 10:30:01PM +0100, Florian Westphal wrote:
> > Eric Dumazet  wrote:
> > > On Mon, 2016-12-05 at 15:28 -0500, Willem de Bruijn wrote:
> > > > From: Willem de Bruijn 
> > > > 
> > > > Add support for attaching an eBPF object by file descriptor.
> > > > 
> > > > The iptables binary can be called with a path to an elf object or a
> > > > pinned bpf object. Also pass the mode and path to the kernel to be
> > > > able to return it later for iptables dump and save.
> > > > 
> > > > Signed-off-by: Willem de Bruijn 
> > > > ---
> > > 
> > > Assuming there is no simple way to get variable matchsize in iptables,
> > > this looks good to me, thanks.
> > 
> > It should be possible by setting kernel .matchsize to ~0 which
> > suppresses strict size enforcement.
> > 
> > Its currently only used by ebt_among, but this should work for any xtables
> > module.
> 
> This is likely going to trigger a large rewrite of the core userspace
> iptables codebase, and likely going to pull part of the mess we have
> in ebtables into iptables. So I'd prefer not to follow this path.

So this variable path is there to annotate what userspace claims that
is the file that contains the bpf blob that was loaded, actually this
is irrelevant to the kernel, so this is just there to dump it back
when iptables-save it is called. Just a side note, one could set
anything there from userspace, point somewhere else actually...

Well anyway, going back to the path problem to keep it simple: Why
don't just trim this down to something smaller, are you really
expecting to reach PATH_MAX in your usecase?
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH nf-next] netfilter: xt_bpf: support ebpf

2016-12-05 Thread Eric Dumazet
On Mon, 2016-12-05 at 23:40 +0100, Florian Westphal wrote:

> Fair enough, I have no objections to the patch.

An additional question is about PATH_MAX :

Is it guaranteed to stay at 4096 forever ?

To be safe, maybe we should use a constant of our own.


--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH nf-next] netfilter: xt_bpf: support ebpf

2016-12-05 Thread Daniel Borkmann

Hi Willem,

On 12/05/2016 09:28 PM, Willem de Bruijn wrote:

From: Willem de Bruijn 

Add support for attaching an eBPF object by file descriptor.

The iptables binary can be called with a path to an elf object or a
pinned bpf object. Also pass the mode and path to the kernel to be
able to return it later for iptables dump and save.

Signed-off-by: Willem de Bruijn 


just out of pure curiosity, use case is for android guys wrt
accounting, or anything specific that cls_bpf on tc ingress +
egress cannot do already?

Thanks,
Daniel
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH nf-next] netfilter: xt_bpf: support ebpf

2016-12-05 Thread Florian Westphal
Pablo Neira Ayuso  wrote:
> On Mon, Dec 05, 2016 at 10:30:01PM +0100, Florian Westphal wrote:
> > Eric Dumazet  wrote:
> > > On Mon, 2016-12-05 at 15:28 -0500, Willem de Bruijn wrote:
> > > > From: Willem de Bruijn 
> > > > 
> > > > Add support for attaching an eBPF object by file descriptor.
> > > > 
> > > > The iptables binary can be called with a path to an elf object or a
> > > > pinned bpf object. Also pass the mode and path to the kernel to be
> > > > able to return it later for iptables dump and save.
> > > > 
> > > > Signed-off-by: Willem de Bruijn 
> > > > ---
> > > 
> > > Assuming there is no simple way to get variable matchsize in iptables,
> > > this looks good to me, thanks.
> > 
> > It should be possible by setting kernel .matchsize to ~0 which
> > suppresses strict size enforcement.
> > 
> > Its currently only used by ebt_among, but this should work for any xtables
> > module.
> 
> This is likely going to trigger a large rewrite of the core userspace
> iptables codebase, and likely going to pull part of the mess we have
> in ebtables into iptables. So I'd prefer not to follow this path.

Fair enough, I have no objections to the patch.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH nft] src: add support to flush sets

2016-12-05 Thread Pablo Neira Ayuso
You can use this new command to remove all existing elements in a set:

 # nft flush set filter xyz

After this command, the set 'xyz' in table 'filter' becomes empty.

Signed-off-by: Pablo Neira Ayuso 
---
 include/netlink.h | 2 ++
 src/evaluate.c| 3 +++
 src/netlink.c | 9 -
 src/rule.c| 3 +++
 4 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/include/netlink.h b/include/netlink.h
index 28c11f603ed2..363b5251968f 100644
--- a/include/netlink.h
+++ b/include/netlink.h
@@ -165,6 +165,8 @@ extern int netlink_delete_setelems(struct netlink_ctx *ctx, 
const struct handle
   const struct expr *expr);
 extern int netlink_get_setelems(struct netlink_ctx *ctx, const struct handle 
*h,
const struct location *loc, struct set *set);
+extern int netlink_flush_setelems(struct netlink_ctx *ctx, const struct handle 
*h,
+ const struct location *loc);
 
 extern void netlink_dump_table(const struct nftnl_table *nlt);
 extern void netlink_dump_chain(const struct nftnl_chain *nlc);
diff --git a/src/evaluate.c b/src/evaluate.c
index e11a455a5f53..8a3da54e5b2d 100644
--- a/src/evaluate.c
+++ b/src/evaluate.c
@@ -2857,9 +2857,11 @@ static int cmd_evaluate_list(struct eval_ctx *ctx, 
struct cmd *cmd)
 static int cmd_evaluate_flush(struct eval_ctx *ctx, struct cmd *cmd)
 {
int ret;
+
ret = cache_update(cmd->op, ctx->msgs);
if (ret < 0)
return ret;
+
switch (cmd->obj) {
case CMD_OBJ_RULESET:
cache_flush();
@@ -2870,6 +2872,7 @@ static int cmd_evaluate_flush(struct eval_ctx *ctx, 
struct cmd *cmd)
 */
case CMD_OBJ_CHAIN:
/* Chains don't hold sets */
+   case CMD_OBJ_SET:
break;
default:
BUG("invalid command object type %u\n", cmd->obj);
diff --git a/src/netlink.c b/src/netlink.c
index f8e600ff6f81..714df4e892b2 100644
--- a/src/netlink.c
+++ b/src/netlink.c
@@ -1374,7 +1374,8 @@ static int netlink_del_setelems_batch(struct netlink_ctx 
*ctx,
int err;
 
nls = alloc_nftnl_set(h);
-   alloc_setelem_cache(expr, nls);
+   if (expr)
+   alloc_setelem_cache(expr, nls);
netlink_dump_set(nls);
 
err = mnl_nft_setelem_batch_del(nls, 0, ctx->seqnum);
@@ -1406,6 +1407,12 @@ static int netlink_del_setelems_compat(struct 
netlink_ctx *ctx,
return err;
 }
 
+int netlink_flush_setelems(struct netlink_ctx *ctx, const struct handle *h,
+  const struct location *loc)
+{
+   return netlink_del_setelems_batch(ctx, h, NULL);
+}
+
 static struct expr *netlink_parse_concat_elem(const struct datatype *dtype,
  struct expr *data)
 {
diff --git a/src/rule.c b/src/rule.c
index 8710767bc330..f1bb6cfe04ea 100644
--- a/src/rule.c
+++ b/src/rule.c
@@ -1244,6 +1244,9 @@ static int do_command_flush(struct netlink_ctx *ctx, 
struct cmd *cmd)
return netlink_flush_table(ctx, &cmd->handle, &cmd->location);
case CMD_OBJ_CHAIN:
return netlink_flush_chain(ctx, &cmd->handle, &cmd->location);
+   case CMD_OBJ_SET:
+   return netlink_flush_setelems(ctx, &cmd->handle,
+ &cmd->location);
case CMD_OBJ_RULESET:
return netlink_flush_ruleset(ctx, &cmd->handle, &cmd->location);
default:
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH nf-next 3/3] netfilter: nf_tables: support for set flushing

2016-12-05 Thread Pablo Neira Ayuso
This patch adds support for set flushing, that consists of walking over
the set elements if the NFTA_SET_ELEM_LIST_ELEMENTS attribute is set.
This patch requires the following changes:

1) Add set->ops->deactivate_one() operation: This allows us to
   deactivate an element from the set element walk path, given we can
   skip the lookup that happens in ->deactivate().

2) Add a new nft_trans_alloc_gfp() function since we need to allocate
   transactions using GFP_ATOMIC given the set walk path happens with
   held rcu_read_lock.

Signed-off-by: Pablo Neira Ayuso 
---
 include/net/netfilter/nf_tables.h |  6 -
 net/netfilter/nf_tables_api.c | 55 ++-
 net/netfilter/nft_set_hash.c  |  1 +
 net/netfilter/nft_set_rbtree.c|  1 +
 4 files changed, 56 insertions(+), 7 deletions(-)

diff --git a/include/net/netfilter/nf_tables.h 
b/include/net/netfilter/nf_tables.h
index 32970cba184a..e3ca6fca6496 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -259,7 +259,8 @@ struct nft_expr;
  * @lookup: look up an element within the set
  * @insert: insert new element into set
  * @activate: activate new element in the next generation
- * @deactivate: deactivate element in the next generation
+ * @deactivate: lookup for element and deactivate it in the next generation
+ * @deactivate_one: deactivate element in the next generation
  * @remove: remove element from set
  * @walk: iterate over all set elemeennts
  * @privsize: function to return size of set private data
@@ -294,6 +295,9 @@ struct nft_set_ops {
void *  (*deactivate)(const struct net *net,
  const struct nft_set *set,
  const struct nft_set_elem 
*elem);
+   bool(*deactivate_one)(const struct net *net,
+ const struct nft_set 
*set,
+ void *priv);
void(*remove)(const struct nft_set *set,
  const struct nft_set_elem 
*elem);
void(*walk)(const struct nft_ctx *ctx,
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index b8fa54ec6bbb..2cfa63f6b481 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -110,12 +110,12 @@ static void nft_ctx_init(struct nft_ctx *ctx,
ctx->seq= nlh->nlmsg_seq;
 }
 
-static struct nft_trans *nft_trans_alloc(const struct nft_ctx *ctx,
-int msg_type, u32 size)
+static struct nft_trans *nft_trans_alloc_gfp(const struct nft_ctx *ctx,
+int msg_type, u32 size, gfp_t gfp)
 {
struct nft_trans *trans;
 
-   trans = kzalloc(sizeof(struct nft_trans) + size, GFP_KERNEL);
+   trans = kzalloc(sizeof(struct nft_trans) + size, gfp);
if (trans == NULL)
return NULL;
 
@@ -125,6 +125,12 @@ static struct nft_trans *nft_trans_alloc(const struct 
nft_ctx *ctx,
return trans;
 }
 
+static struct nft_trans *nft_trans_alloc(const struct nft_ctx *ctx,
+int msg_type, u32 size)
+{
+   return nft_trans_alloc_gfp(ctx, msg_type, size, GFP_KERNEL);
+}
+
 static void nft_trans_destroy(struct nft_trans *trans)
 {
list_del(&trans->list);
@@ -3779,6 +3785,34 @@ static int nft_del_setelem(struct nft_ctx *ctx, struct 
nft_set *set,
return err;
 }
 
+static int nft_flush_set(const struct nft_ctx *ctx,
+const struct nft_set *set,
+const struct nft_set_iter *iter,
+const struct nft_set_elem *elem)
+{
+   struct nft_trans *trans;
+   int err;
+
+   trans = nft_trans_alloc_gfp(ctx, NFT_MSG_DELSETELEM,
+   sizeof(struct nft_trans_elem), GFP_ATOMIC);
+   if (!trans)
+   return -ENOMEM;
+
+   if (!set->ops->deactivate_one(ctx->net, set, elem->priv)) {
+   err = -ENOENT;
+   goto err1;
+   }
+
+   nft_trans_elem_set(trans) = (struct nft_set *)set;
+   nft_trans_elem(trans) = *((struct nft_set_elem *)elem);
+   list_add_tail(&trans->list, &ctx->net->nft.commit_list);
+
+   return 0;
+err1:
+   kfree(trans);
+   return err;
+}
+
 static int nf_tables_delsetelem(struct net *net, struct sock *nlsk,
struct sk_buff *skb, const struct nlmsghdr *nlh,
const struct nlattr * const nla[])
@@ -3789,9 +3823,6 @@ static int nf_tables_delsetelem(struct net *net, struct 
sock *nlsk,
struct nft_ctx ctx;
int rem, err = 0;
 
-   if (nla[NFTA_SET_ELEM_LIST_ELEMENTS] == N

[PATCH libnftnl] set_elem: nftnl_set_elems_nlmsg_build_payload_iter()

2016-12-05 Thread Pablo Neira Ayuso
Similar to a24e4b21ee33 ("set_elem: don't add NFTA_SET_ELEM_LIST_ELEMENTS
attribute if set is empty"). This is required by the set flush support.

Signed-off-by: Pablo Neira Ayuso 
---
 src/set_elem.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/src/set_elem.c b/src/set_elem.c
index 4d2b4f6074b7..083c597e2f8e 100644
--- a/src/set_elem.c
+++ b/src/set_elem.c
@@ -813,6 +813,10 @@ int nftnl_set_elems_nlmsg_build_payload_iter(struct 
nlmsghdr *nlh,
 
nftnl_set_elem_nlmsg_build_def(nlh, iter->set);
 
+   /* This set is empty, don't add an empty list element nest. */
+   if (list_empty(&iter->set->element_list))
+   return ret;
+
nest1 = mnl_attr_nest_start(nlh, NFTA_SET_ELEM_LIST_ELEMENTS);
elem = nftnl_set_elems_iter_next(iter);
while (elem != NULL) {
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH nf-next 1/3] netfilter: nf_tables: constify struct nft_ctx * parameter in nft_trans_alloc()

2016-12-05 Thread Pablo Neira Ayuso
Context is not modified by nft_trans_alloc(), so constify it.

Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/nf_tables_api.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index e5194f6f906c..b8fa54ec6bbb 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -110,8 +110,8 @@ static void nft_ctx_init(struct nft_ctx *ctx,
ctx->seq= nlh->nlmsg_seq;
 }
 
-static struct nft_trans *nft_trans_alloc(struct nft_ctx *ctx, int msg_type,
-u32 size)
+static struct nft_trans *nft_trans_alloc(const struct nft_ctx *ctx,
+int msg_type, u32 size)
 {
struct nft_trans *trans;
 
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH nf-next 2/3] netfilter: nft_set: introduce nft_{hash,rbtree}_deactivate_one()

2016-12-05 Thread Pablo Neira Ayuso
This new function allows us to deactivate one single element, this is
required by the set flush command that comes in a follow up patch.

Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/nft_set_hash.c   | 24 +---
 net/netfilter/nft_set_rbtree.c | 11 ++-
 2 files changed, 27 insertions(+), 8 deletions(-)

diff --git a/net/netfilter/nft_set_hash.c b/net/netfilter/nft_set_hash.c
index a3dface3e6e6..73f7687c5656 100644
--- a/net/netfilter/nft_set_hash.c
+++ b/net/netfilter/nft_set_hash.c
@@ -167,6 +167,19 @@ static void nft_hash_activate(const struct net *net, const 
struct nft_set *set,
nft_set_elem_clear_busy(&he->ext);
 }
 
+static bool nft_hash_deactivate_one(const struct net *net,
+   const struct nft_set *set, void *priv)
+{
+   struct nft_hash_elem *he = priv;
+
+   if (!nft_set_elem_mark_busy(&he->ext) ||
+   !nft_is_active(net, &he->ext)) {
+   nft_set_elem_change_active(net, set, &he->ext);
+   return true;
+   }
+   return false;
+}
+
 static void *nft_hash_deactivate(const struct net *net,
 const struct nft_set *set,
 const struct nft_set_elem *elem)
@@ -181,13 +194,10 @@ static void *nft_hash_deactivate(const struct net *net,
 
rcu_read_lock();
he = rhashtable_lookup_fast(&priv->ht, &arg, nft_hash_params);
-   if (he != NULL) {
-   if (!nft_set_elem_mark_busy(&he->ext) ||
-   !nft_is_active(net, &he->ext))
-   nft_set_elem_change_active(net, set, &he->ext);
-   else
-   he = NULL;
-   }
+   if (he != NULL &&
+   !nft_hash_deactivate_one(net, set, he))
+   he = NULL;
+
rcu_read_unlock();
 
return he;
diff --git a/net/netfilter/nft_set_rbtree.c b/net/netfilter/nft_set_rbtree.c
index 36493a7cae88..845bbdb26853 100644
--- a/net/netfilter/nft_set_rbtree.c
+++ b/net/netfilter/nft_set_rbtree.c
@@ -171,6 +171,15 @@ static void nft_rbtree_activate(const struct net *net,
nft_set_elem_change_active(net, set, &rbe->ext);
 }
 
+static bool nft_rbtree_deactivate_one(const struct net *net,
+ const struct nft_set *set, void *priv)
+{
+   struct nft_rbtree_elem *rbe = priv;
+
+   nft_set_elem_change_active(net, set, &rbe->ext);
+   return true;
+}
+
 static void *nft_rbtree_deactivate(const struct net *net,
   const struct nft_set *set,
   const struct nft_set_elem *elem)
@@ -204,7 +213,7 @@ static void *nft_rbtree_deactivate(const struct net *net,
parent = parent->rb_right;
continue;
}
-   nft_set_elem_change_active(net, set, &rbe->ext);
+   nft_rbtree_deactivate_one(net, set, &rbe->ext);
return rbe;
}
}
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH nf-next] netfilter: xt_bpf: support ebpf

2016-12-05 Thread Pablo Neira Ayuso
On Mon, Dec 05, 2016 at 10:30:01PM +0100, Florian Westphal wrote:
> Eric Dumazet  wrote:
> > On Mon, 2016-12-05 at 15:28 -0500, Willem de Bruijn wrote:
> > > From: Willem de Bruijn 
> > > 
> > > Add support for attaching an eBPF object by file descriptor.
> > > 
> > > The iptables binary can be called with a path to an elf object or a
> > > pinned bpf object. Also pass the mode and path to the kernel to be
> > > able to return it later for iptables dump and save.
> > > 
> > > Signed-off-by: Willem de Bruijn 
> > > ---
> > 
> > Assuming there is no simple way to get variable matchsize in iptables,
> > this looks good to me, thanks.
> 
> It should be possible by setting kernel .matchsize to ~0 which
> suppresses strict size enforcement.
> 
> Its currently only used by ebt_among, but this should work for any xtables
> module.

This is likely going to trigger a large rewrite of the core userspace
iptables codebase, and likely going to pull part of the mess we have
in ebtables into iptables. So I'd prefer not to follow this path.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH nft] parser: Add glob support to include directive

2016-12-05 Thread Pablo Neira Ayuso
Please, add a description to this patch.

Thanks.

On Mon, Dec 05, 2016 at 08:58:38PM +0900, Kohei Suzuki wrote:
> ---
>  src/scanner.l | 36 
> +--
>  tests/shell/testcases/include/0005glob_0  | 32 
>  tests/shell/testcases/include/0006globempty_1 | 14 +++
>  3 files changed, 69 insertions(+), 13 deletions(-)
>  create mode 100755 tests/shell/testcases/include/0005glob_0
>  create mode 100755 tests/shell/testcases/include/0006globempty_1
> 
> diff --git a/src/scanner.l b/src/scanner.l
> index 625023f..64fe6fc 100644
> --- a/src/scanner.l
> +++ b/src/scanner.l
> @@ -11,6 +11,7 @@
>  %{
> 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -640,37 +641,46 @@ int scanner_include_file(void *scanner, const
> char *filename,
>  struct parser_state *state = yyget_extra(scanner);
>  struct error_record *erec;
>  char buf[PATH_MAX];
> -const char *name = buf;
>  unsigned int i;
> -FILE *f;
> +glob_t globbuf;
> 
> -f = NULL;
> +globbuf.gl_pathc = 0;
>  if (search_in_include_path(filename)) {
>  for (i = 0; i < INCLUDE_PATHS_MAX; i++) {
>  if (include_paths[i] == NULL)
>  break;
>  snprintf(buf, sizeof(buf), "%s/%s",
>   include_paths[i], filename);
> -f = fopen(buf, "r");
> -if (f != NULL)
> +if (glob(buf, 0, NULL, &globbuf) != 0) {
>  break;
> +}
>  }
>  } else {
> -f = fopen(filename, "r");
> -name = filename;
> +glob(filename, 0, NULL, &globbuf);
>  }
> -if (f == NULL) {
> -erec = error(loc, "Could not open file \"%s\": %s",
> - filename, strerror(errno));
> +if (globbuf.gl_pathc == 0) {
> +erec = error(loc, "Could not find file matching \"%s\"\n", filename);
>  goto err;
>  }
> 
> -erec = scanner_push_file(scanner, name, f, loc);
> -if (erec != NULL)
> -goto err;
> +for (i = 0; i < globbuf.gl_pathc; i++) {
> +const char *name = globbuf.gl_pathv[i];
> +FILE *f = fopen(name, "r");
> +if (f == NULL) {
> +erec = error(loc, "Could not open file \"%s\": %s\n",
> name, strerror(errno));
> +goto err;
> +}
> +erec = scanner_push_file(scanner, name, f, loc);
> +if (erec != NULL) {
> +goto err;
> +}
> +}
> +
> +globfree(&globbuf);
>  return 0;
> 
>  err:
> +globfree(&globbuf);
>  erec_queue(erec, state->msgs);
>  return -1;
>  }
> diff --git a/tests/shell/testcases/include/0005glob_0
> b/tests/shell/testcases/include/0005glob_0
> new file mode 100755
> index 000..99dbf53
> --- /dev/null
> +++ b/tests/shell/testcases/include/0005glob_0
> @@ -0,0 +1,32 @@
> +#!/bin/bash
> +
> +set -e
> +
> +tmpdir=$(mktemp -d)
> +tmpfile=$(mktemp)
> +
> +trap "rm -rf $tmpdir $tmpfile" EXIT # cleanup if aborted
> +
> +RULESET1="add table x"
> +RULESET2="add table y"
> +RULESET3="include \"$tmpdir/*.conf\""
> +
> +echo "$RULESET1" > $tmpdir/ruleset1.conf
> +echo "$RULESET2" > $tmpdir/ruleset2.conf
> +echo "$RULESET3" > $tmpfile
> +
> +$NFT -f $tmpfile
> +if [ $? -ne 0 ] ; then
> +echo "E: unable to load good ruleset" >&2
> +exit 1
> +fi
> +$NFT list table x
> +if [ $? -ne 0 ] ; then
> +echo "E: unable to include ruleset1.conf" >&2
> +exit 1
> +fi
> +$NFT list table y
> +if [ $? -ne 0 ] ; then
> +echo "E: unable to include ruleset2.conf" >&2
> +exit 1
> +fi
> diff --git a/tests/shell/testcases/include/0006globempty_1
> b/tests/shell/testcases/include/0006globempty_1
> new file mode 100755
> index 000..3ac8c72
> --- /dev/null
> +++ b/tests/shell/testcases/include/0006globempty_1
> @@ -0,0 +1,14 @@
> +#!/bin/bash
> +
> +set -e
> +
> +tmpdir=$(mktemp -d)
> +tmpfile=$(mktemp)
> +
> +trap "rm -rf $tmpdir $tmpfile" EXIT # cleanup if aborted
> +
> +RULESET="include \"$tmpdir/*.conf\""
> +
> +echo "$RULESET" > $tmpfile
> +
> +$NFT -f $tmpfile 2>/dev/null
> -- 
> 2.10.2
> 
> 
> Kohei Suzuki
> eagle...@gmail.com
> --
> To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Kernel panic in netfilter 4.8.10 probably on conntrack -L

2016-12-05 Thread Denys Fedoryshchenko

Hi!

I have quite loaded NAT server (approx 17Gbps of traffic) where periodic 
"conntrack -L" might trigger once per day kernel panic.
I am not definitely sure it is triggered exactly at running tool, or 
just by enabling events.

Here is panic message:

 [221287.380762] general protection fault:  [#1] SMP
 [221287.381029] Modules linked in:
 xt_rateest
 xt_RATEEST
 nf_conntrack_netlink
 netconsole
 configfs
 tun
 nf_nat_pptp
 nf_nat_proto_gre
 xt_TCPMSS
 xt_connmark
 ipt_MASQUERADE
 nf_nat_masquerade_ipv4
 xt_nat
 nf_conntrack_pptp
 nf_conntrack_proto_gre
 xt_CT
 xt_set
 xt_hl
 xt_tcpudp
 ip_set_hash_net
 ip_set
 nfnetlink
 iptable_raw
 iptable_mangle
 iptable_nat
 nf_conntrack_ipv4
 nf_defrag_ipv4
 nf_nat_ipv4
 nf_nat
 nf_conntrack
 iptable_filter
 ip_tables
 x_tables
 8021q
 garp
 mrp
 stp
 llc
 bonding
 ixgbe
 dca

 [221287.384913] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 
4.8.10-build-0121 #10
 [221287.385184] Hardware name: Intel Corporation 
S2600WTT/S2600WTT, BIOS SE5C610.86B.01.01.1008.031920151331 03/19/2015
 [221287.385634] task: 8200b4c0 task.stack: 
8200

 [221287.385900] RIP: 0010:[]
 [] nf_conntrack_eventmask_report+0xba/0x123 
[nf_conntrack]

 [221287.386428] RSP: 0018:882fbf603df8  EFLAGS: 00010202
 [221287.386693] RAX:  RBX: 882f96a51da8 RCX: 

 [221287.387134] RDX:  RSI: 882fbf603e00 RDI: 
0004
 [221287.387575] RBP: 882fbf603e38 R08: ff81822024ff R09: 
0004
 [221287.388011] R10: 882fbf603de0 R11: 820050c0 R12: 
882f810bf0c0
 [221287.388445] R13:  R14:  R15: 
0004
 [221287.388877] FS:  () 
GS:882fbf60() knlGS:

 [221287.389311] CS:  0010 DS:  ES:  CR0: 80050033
 [221287.389567] CR2: 7faff0bd8978 CR3: 02006000 CR4: 
001406f0

 [221287.389998] Stack:
 [221287.390238]  00049f292300
 882f810bf0c0
 
 882f810bf0c0

 [221287.390913]  882f96a51d80
 
 
 820050c8

 [221287.391587]  882fbf603e68
 a0098bd3
 8100
 a0098c85

 [221287.392262] Call Trace:
 [221287.392508]  

 [221287.392579]  [] nf_ct_delete+0x7a/0x12c 
[nf_conntrack]
 [221287.393082]  [] ? nf_ct_delete+0x12c/0x12c 
[nf_conntrack]
 [221287.393351]  [] death_by_timeout+0xd/0xf 
[nf_conntrack]
 [221287.393617]  [] 
call_timer_fn.isra.5+0x17/0x6b

 [221287.393881]  [] expire_timers+0x6f/0x7e
 [221287.394134]  [] run_timer_softirq+0x69/0x8b
 [221287.394390]  [] __do_softirq+0xbd/0x1aa
 [221287.394643]  [] irq_exit+0x37/0x7c
 [221287.394898]  [] 
smp_trace_call_function_single_interrupt+0x2e/0x30
 [221287.395341]  [] 
smp_call_function_single_interrupt+0x9/0xb
 [221287.395600]  [] 
call_function_single_interrupt+0x7c/0x90

 [221287.395857]  

 [221287.395926]  [] ? mwait_idle+0x64/0x7a
 [221287.396413]  [] arch_cpu_idle+0xa/0xc
 [221287.396665]  [] default_idle_call+0x27/0x29
 [221287.396919]  [] 
cpu_startup_entry+0x11d/0x1c7

 [221287.397175]  [] rest_init+0x72/0x74
 [221287.397428]  [] start_kernel+0x3ba/0x3c7
 [221287.397681]  [] 
x86_64_start_reservations+0x2a/0x2c
 [221287.397937]  [] 
x86_64_start_kernel+0x12a/0x135

 [221287.402124] Code:
 f2
 89
 75
 d0
 75
 04
 4c
 8b
 73
 08
 0f
 b7
 73
 10
 41
 89
 ff
 4d
 89
 f1
 4d
 09
 f9
 31
 c0
 49
 85
 f1
 74
 67
 41
 89
 d5
 89
 7d
 c4
 48
 8d
 75
 c8
 44
 09
 f7

 ff
 10
 89
 c2
 c1
 ea
 1f
 75
 05
 4d
 85
 f6
 74
 4b
 49
 83
 c4
 04
 89
 45

 [221287.406724] RIP
 [] nf_conntrack_eventmask_report+0xba/0x123 
[nf_conntrack]

 [221287.407234]  RSP 
 [221287.407489] ---[ end trace 4b077b9412fc7065 ]---
 [221287.407746] Kernel panic - not syncing: Fatal exception in 
interrupt

 [221287.408013] Kernel Offset: disabled
 [221287.408270] Rebooting in 5 seconds..
Dec  5 23:17:58 10.0.253.34
Dec  5 23:17:58 10.0.253.34 [221292.408645] ACPI MEMORY or I/O 
RESET_REG.

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH nf-next] netfilter: xt_bpf: support ebpf

2016-12-05 Thread Florian Westphal
Eric Dumazet  wrote:
> On Mon, 2016-12-05 at 15:28 -0500, Willem de Bruijn wrote:
> > From: Willem de Bruijn 
> > 
> > Add support for attaching an eBPF object by file descriptor.
> > 
> > The iptables binary can be called with a path to an elf object or a
> > pinned bpf object. Also pass the mode and path to the kernel to be
> > able to return it later for iptables dump and save.
> > 
> > Signed-off-by: Willem de Bruijn 
> > ---
> 
> Assuming there is no simple way to get variable matchsize in iptables,
> this looks good to me, thanks.

It should be possible by setting kernel .matchsize to ~0 which
suppresses strict size enforcement.

Its currently only used by ebt_among, but this should work for any xtables
module.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH nf-next] netfilter: xt_bpf: support ebpf

2016-12-05 Thread Eric Dumazet
On Mon, 2016-12-05 at 15:28 -0500, Willem de Bruijn wrote:
> From: Willem de Bruijn 
> 
> Add support for attaching an eBPF object by file descriptor.
> 
> The iptables binary can be called with a path to an elf object or a
> pinned bpf object. Also pass the mode and path to the kernel to be
> able to return it later for iptables dump and save.
> 
> Signed-off-by: Willem de Bruijn 
> ---

Assuming there is no simple way to get variable matchsize in iptables,
this looks good to me, thanks.

Reviewed-by: Eric Dumazet 




--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH nf-next] netfilter: xt_bpf: support ebpf

2016-12-05 Thread Willem de Bruijn
From: Willem de Bruijn 

Add support for attaching an eBPF object by file descriptor.

The iptables binary can be called with a path to an elf object or a
pinned bpf object. Also pass the mode and path to the kernel to be
able to return it later for iptables dump and save.

Signed-off-by: Willem de Bruijn 
---
 include/uapi/linux/netfilter/xt_bpf.h | 21 
 net/netfilter/xt_bpf.c| 96 +--
 2 files changed, 101 insertions(+), 16 deletions(-)

diff --git a/include/uapi/linux/netfilter/xt_bpf.h 
b/include/uapi/linux/netfilter/xt_bpf.h
index 1fad2c2..652d2b6 100644
--- a/include/uapi/linux/netfilter/xt_bpf.h
+++ b/include/uapi/linux/netfilter/xt_bpf.h
@@ -2,9 +2,11 @@
 #define _XT_BPF_H
 
 #include 
+#include 
 #include 
 
 #define XT_BPF_MAX_NUM_INSTR   64
+#define XT_BPF_MAX_NUM_INSTR_V1(PATH_MAX / sizeof(struct sock_filter))
 
 struct bpf_prog;
 
@@ -16,4 +18,23 @@ struct xt_bpf_info {
struct bpf_prog *filter __attribute__((aligned(8)));
 };
 
+enum xt_bpf_modes {
+   XT_BPF_MODE_BYTECODE,
+   XT_BPF_MODE_FD_PINNED,
+   XT_BPF_MODE_FD_ELF,
+};
+
+struct xt_bpf_info_v1 {
+   __u16 mode;
+   __u16 bpf_program_num_elem;
+   __s32 fd;
+   union {
+   struct sock_filter bpf_program[XT_BPF_MAX_NUM_INSTR_V1];
+   char path[PATH_MAX];
+   };
+
+   /* only used in the kernel */
+   struct bpf_prog *filter __attribute__((aligned(8)));
+};
+
 #endif /*_XT_BPF_H */
diff --git a/net/netfilter/xt_bpf.c b/net/netfilter/xt_bpf.c
index dffee9d47..2dedaa2 100644
--- a/net/netfilter/xt_bpf.c
+++ b/net/netfilter/xt_bpf.c
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -20,15 +21,15 @@ MODULE_LICENSE("GPL");
 MODULE_ALIAS("ipt_bpf");
 MODULE_ALIAS("ip6t_bpf");
 
-static int bpf_mt_check(const struct xt_mtchk_param *par)
+static int __bpf_mt_check_bytecode(struct sock_filter *insns, __u16 len,
+  struct bpf_prog **ret)
 {
-   struct xt_bpf_info *info = par->matchinfo;
struct sock_fprog_kern program;
 
-   program.len = info->bpf_program_num_elem;
-   program.filter = info->bpf_program;
+   program.len = len;
+   program.filter = insns;
 
-   if (bpf_prog_create(&info->filter, &program)) {
+   if (bpf_prog_create(ret, &program)) {
pr_info("bpf: check failed: parse error\n");
return -EINVAL;
}
@@ -36,6 +37,42 @@ static int bpf_mt_check(const struct xt_mtchk_param *par)
return 0;
 }
 
+static int __bpf_mt_check_fd(int fd, struct bpf_prog **ret)
+{
+   struct bpf_prog *prog;
+
+   prog = bpf_prog_get_type(fd, BPF_PROG_TYPE_SOCKET_FILTER);
+   if (IS_ERR(prog))
+   return PTR_ERR(prog);
+
+   *ret = prog;
+   return 0;
+}
+
+static int bpf_mt_check(const struct xt_mtchk_param *par)
+{
+   struct xt_bpf_info *info = par->matchinfo;
+
+   return __bpf_mt_check_bytecode(info->bpf_program,
+  info->bpf_program_num_elem,
+  &info->filter);
+}
+
+static int bpf_mt_check_v1(const struct xt_mtchk_param *par)
+{
+   struct xt_bpf_info_v1 *info = par->matchinfo;
+
+   if (info->mode == XT_BPF_MODE_BYTECODE)
+   return __bpf_mt_check_bytecode(info->bpf_program,
+  info->bpf_program_num_elem,
+  &info->filter);
+   else if (info->mode == XT_BPF_MODE_FD_PINNED ||
+info->mode == XT_BPF_MODE_FD_ELF)
+   return __bpf_mt_check_fd(info->fd, &info->filter);
+   else
+   return -EINVAL;
+}
+
 static bool bpf_mt(const struct sk_buff *skb, struct xt_action_param *par)
 {
const struct xt_bpf_info *info = par->matchinfo;
@@ -43,31 +80,58 @@ static bool bpf_mt(const struct sk_buff *skb, struct 
xt_action_param *par)
return BPF_PROG_RUN(info->filter, skb);
 }
 
+static bool bpf_mt_v1(const struct sk_buff *skb, struct xt_action_param *par)
+{
+   const struct xt_bpf_info_v1 *info = par->matchinfo;
+
+   return !!bpf_prog_run_save_cb(info->filter, (struct sk_buff *) skb);
+}
+
 static void bpf_mt_destroy(const struct xt_mtdtor_param *par)
 {
const struct xt_bpf_info *info = par->matchinfo;
+
+   bpf_prog_destroy(info->filter);
+}
+
+static void bpf_mt_destroy_v1(const struct xt_mtdtor_param *par)
+{
+   const struct xt_bpf_info_v1 *info = par->matchinfo;
+
bpf_prog_destroy(info->filter);
 }
 
-static struct xt_match bpf_mt_reg __read_mostly = {
-   .name   = "bpf",
-   .revision   = 0,
-   .family = NFPROTO_UNSPEC,
-   .checkentry = bpf_mt_check,
-   .match  = bpf_mt,
-   .destroy= bpf_mt_destroy,
-   .matchsize  = sizeof(struct xt_bpf_info),
-   .me = THIS_MODULE,
+static struct 

Re: [PATCH v3 net-next] net_sched: gen_estimator: complete rewrite of rate estimators

2016-12-05 Thread David Miller
From: Eric Dumazet 
Date: Sun, 04 Dec 2016 09:48:16 -0800

> From: Eric Dumazet 
> 
> 1) Old code was hard to maintain, due to complex lock chains.
>(We probably will be able to remove some kfree_rcu() in callers)
> 
> 2) Using a single timer to update all estimators does not scale.
> 
> 3) Code was buggy on 32bit kernel (WRITE_ONCE() on 64bit quantity
>is not supposed to work well)
> 
> In this rewrite :
> 
> - I removed the RB tree that had to be scanned in
>   gen_estimator_active(). qdisc dumps should be much faster.
> 
> - Each estimator has its own timer.
> 
> - Estimations are maintained in net_rate_estimator structure,
>   instead of dirtying the qdisc. Minor, but part of the simplification.
> 
> - Reading the estimator uses RCU and a seqcount to provide proper
>   support for 32bit kernels.
> 
> - We reduce memory need when estimators are not used, since
>   we store a pointer, instead of the bytes/packets counters.
> 
> - xt_rateest_mt() no longer has to grab a spinlock.
>   (In the future, xt_rateest_tg() could be switched to per cpu counters)
> 
> Signed-off-by: Eric Dumazet 
> ---
> v3: Renamed some parameters to please make htmldocs
> v2: Removed unwanted changes to tcp_output.c

This was probably long overdue, thanks for working on this.

Applied, thanks Eric.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH nf-next] NAT: skip checksum on offload SCTP packets

2016-12-05 Thread Davide Caratti
SCTP GSO and hardware can do CRC32c computation after netfilter processing,
so we can avoid calling sctp_compute_checksum() on skb if skb->ip_summed
is equal to CHECKSUM_PARTIAL. Moreover, set skb->ip_summed to CHECKSUM_NONE
when the NAT code computes the CRC, to prevent offloaders from computing
it again (on ixgbe this resulted in a transmission with wrong L4 checksum).

Signed-off-by: Davide Caratti 
---

Notes:
on a veth pair, where GSO is available and some performance evaluation
can be done, a netperf SCTP_STREAM has been run recording the number
of invocations of crc32c() versus the number of invocations of
sctp_manip_pkt(), before and after the patch was applied:

  $perf record -e "probe:crc32c,probe:sctp_manip_pkt" -aR -- \
 $netperf -H $host -t SCTP_STREAM -p 2000 -l 30

  $perf script | grep crc32c | wc -l
  $perf script | grep sctp_manip_pkt | wc -l

  nf_nat_proto_sctp.c | crc32c hits | sctp_manip_pkt hits | throughput
 -+-+-+
  unpatched   |   10493 |3314 | 1.17 Gbit/s
  patched |6100 |3326 | 1.19 Gbit/s

 net/netfilter/nf_nat_proto_sctp.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/net/netfilter/nf_nat_proto_sctp.c 
b/net/netfilter/nf_nat_proto_sctp.c
index 2e14108..31d3586 100644
--- a/net/netfilter/nf_nat_proto_sctp.c
+++ b/net/netfilter/nf_nat_proto_sctp.c
@@ -47,7 +47,10 @@ sctp_manip_pkt(struct sk_buff *skb,
hdr->dest = tuple->dst.u.sctp.port;
}
 
-   hdr->checksum = sctp_compute_cksum(skb, hdroff);
+   if (skb->ip_summed != CHECKSUM_PARTIAL) {
+   hdr->checksum = sctp_compute_cksum(skb, hdroff);
+   skb->ip_summed = CHECKSUM_NONE;
+   }
 
return true;
 }
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH nft] parser: Add glob support to include directive

2016-12-05 Thread Kohei Suzuki
---
 src/scanner.l | 36 +--
 tests/shell/testcases/include/0005glob_0  | 32 
 tests/shell/testcases/include/0006globempty_1 | 14 +++
 3 files changed, 69 insertions(+), 13 deletions(-)
 create mode 100755 tests/shell/testcases/include/0005glob_0
 create mode 100755 tests/shell/testcases/include/0006globempty_1

diff --git a/src/scanner.l b/src/scanner.l
index 625023f..64fe6fc 100644
--- a/src/scanner.l
+++ b/src/scanner.l
@@ -11,6 +11,7 @@
 %{

 #include 
+#include 
 #include 
 #include 
 #include 
@@ -640,37 +641,46 @@ int scanner_include_file(void *scanner, const
char *filename,
 struct parser_state *state = yyget_extra(scanner);
 struct error_record *erec;
 char buf[PATH_MAX];
-const char *name = buf;
 unsigned int i;
-FILE *f;
+glob_t globbuf;

-f = NULL;
+globbuf.gl_pathc = 0;
 if (search_in_include_path(filename)) {
 for (i = 0; i < INCLUDE_PATHS_MAX; i++) {
 if (include_paths[i] == NULL)
 break;
 snprintf(buf, sizeof(buf), "%s/%s",
  include_paths[i], filename);
-f = fopen(buf, "r");
-if (f != NULL)
+if (glob(buf, 0, NULL, &globbuf) != 0) {
 break;
+}
 }
 } else {
-f = fopen(filename, "r");
-name = filename;
+glob(filename, 0, NULL, &globbuf);
 }
-if (f == NULL) {
-erec = error(loc, "Could not open file \"%s\": %s",
- filename, strerror(errno));
+if (globbuf.gl_pathc == 0) {
+erec = error(loc, "Could not find file matching \"%s\"\n", filename);
 goto err;
 }

-erec = scanner_push_file(scanner, name, f, loc);
-if (erec != NULL)
-goto err;
+for (i = 0; i < globbuf.gl_pathc; i++) {
+const char *name = globbuf.gl_pathv[i];
+FILE *f = fopen(name, "r");
+if (f == NULL) {
+erec = error(loc, "Could not open file \"%s\": %s\n",
name, strerror(errno));
+goto err;
+}
+erec = scanner_push_file(scanner, name, f, loc);
+if (erec != NULL) {
+goto err;
+}
+}
+
+globfree(&globbuf);
 return 0;

 err:
+globfree(&globbuf);
 erec_queue(erec, state->msgs);
 return -1;
 }
diff --git a/tests/shell/testcases/include/0005glob_0
b/tests/shell/testcases/include/0005glob_0
new file mode 100755
index 000..99dbf53
--- /dev/null
+++ b/tests/shell/testcases/include/0005glob_0
@@ -0,0 +1,32 @@
+#!/bin/bash
+
+set -e
+
+tmpdir=$(mktemp -d)
+tmpfile=$(mktemp)
+
+trap "rm -rf $tmpdir $tmpfile" EXIT # cleanup if aborted
+
+RULESET1="add table x"
+RULESET2="add table y"
+RULESET3="include \"$tmpdir/*.conf\""
+
+echo "$RULESET1" > $tmpdir/ruleset1.conf
+echo "$RULESET2" > $tmpdir/ruleset2.conf
+echo "$RULESET3" > $tmpfile
+
+$NFT -f $tmpfile
+if [ $? -ne 0 ] ; then
+echo "E: unable to load good ruleset" >&2
+exit 1
+fi
+$NFT list table x
+if [ $? -ne 0 ] ; then
+echo "E: unable to include ruleset1.conf" >&2
+exit 1
+fi
+$NFT list table y
+if [ $? -ne 0 ] ; then
+echo "E: unable to include ruleset2.conf" >&2
+exit 1
+fi
diff --git a/tests/shell/testcases/include/0006globempty_1
b/tests/shell/testcases/include/0006globempty_1
new file mode 100755
index 000..3ac8c72
--- /dev/null
+++ b/tests/shell/testcases/include/0006globempty_1
@@ -0,0 +1,14 @@
+#!/bin/bash
+
+set -e
+
+tmpdir=$(mktemp -d)
+tmpfile=$(mktemp)
+
+trap "rm -rf $tmpdir $tmpfile" EXIT # cleanup if aborted
+
+RULESET="include \"$tmpdir/*.conf\""
+
+echo "$RULESET" > $tmpfile
+
+$NFT -f $tmpfile 2>/dev/null
-- 
2.10.2


Kohei Suzuki
eagle...@gmail.com
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html