Re: slab corruption with current -git

2016-10-13 Thread Al Viro
On Thu, Oct 13, 2016 at 12:49:33PM -0700, Linus Torvalds wrote:

> That said, xt_hook_ops_alloc() itself is odd. Lookie here, this is the
> loop that initializes things:
> 
> for (i = 0, hooknum = 0; i < num_hooks && hook_mask != 0;
>  hook_mask >>= 1, ++hooknum) {
> 
> and it makes no sense to me how that tests *both* "i < num_hools" and
> "hook_mask != 0".
> 
> Why? Because
> 
> num_hooks = hweight32(hook_mask);
> 
> so it's entirely redundant. num_hooks is already how many bits are on
> in hook_mask, so that test is just duplicating the same thing twice
> ("have we done less than that number of bits" and "do we have any bits
> less").
> 
> I don't know. There's something odd going on. Regardless, thsi is a
> different problem from the nf_register_net_hook() list handling, so
> I'll leave it to the networking people. David?

Hey, I remember looking through that stuff.   There it is, in
a thread started by Krause Randomness(tm)...  Short version: nf_hook_ops
is a mess - it's embedded into different objects, with different subsets
of fields used depending on the containing object and I would seriously
suggest moving some of those into those containing objects.

--
On Thu, Sep 01, 2016 at 08:10:44AM -0500, Eric Sandeen wrote:
> On 8/4/16 8:57 AM, Al Viro wrote:
> 
> > Don't feed the troll.  On all paths leading to that place we have
> > result->name = kname;
> > len = strncpy_from_user(kname, filename, EMBEDDED_NAME_MAX);
> > or
> > result->name = kname;
> > len = strncpy_from_user(kname, filename, PATH_MAX);
> > with failure exits taken if strncpy_from_user() returns an error, which 
> > means
> > that the damn thing has already been copied into.
> > 
> > FWIW, it looks a lot like buggered kmemcheck; as usual, he can't be bothered
> > to mention which kernel version would it be (let alone how to reproduce it
> > on the kernel in question), but IIRC davej had run into some instrumentation
> > breakage lately.
> 
> The original report is in https://bugzilla.kernel.org/show_bug.cgi?id=120651
> if anyone is interested in it.

What the hell does that one have to getname_flags(), other than having
attracted the same... something on the edge of failing the Turing Test?

FWIW, looking at the netfilter one...  That's nf_register_net_hook()
hitting
entry->ops  = *reg;
with reg pointing to something uninitialized (according to kmemcheck, that is,
and presuming that it's not an instrumentation bug).  With the callchain
in report, it came (all in the same assumptions) from
nf_register_net_hooks(net, ops, hweight32(table->valid_hooks))
with hweight32(table->valid_hooks) being greater than the amount of
initialized entries in ops[] (call site in ipt_register_table()).

This "ops" ought to be net/ipv4/netfilter/iptable_filter.c:filter_ops,
allocated by
filter_ops = xt_hook_ops_alloc(_filter, iptable_filter_hook);
in iptable_filter_init().  "table" is _filter and its contents ought
to be unchanged, so ->valid_hooks in there is FILTER_VALID_HOOKS, i.e.
((1 << NF_INET_LOCAL_IN) | (1 << NF_INET_FORWARD) | (1 << NF_INET_LOCAL_OUT)).

Which is to say, filter_ops[] had fewer than 3 initialized elements
when it got to the call of iptable_filter_table_init()...  Since filter_ops
hadn't been NULL, the xt_hook_ops_alloc() call above must've already been
done.  Said xt_hook_ops_alloc() should've allocated a 3-element array and
hooked through all of it, so it's not a wholesale uninitialized element, it's
uninitialized parts of one...

What gets initialized is ->hook, ->pf, ->hooknum and ->priority.
Let's figure out the offsets:
0: list (two pointers, i.e. 16 bytes)
0x10: hook (8)
0x18: dev (8)
0x20: priv (8)
0x28: pf (1)
0x29: padding (3)
0x2c: hooknum (4)
0x30: priority (4)
0x34: padding (8)

OK...  The address of the damn thing is apparently 880037b4bd80 and
we see complaint about the accesses at offsets 0, 0x18, 8, 0x20 and then
the same pattern with 0x38 and 0x70 added (i.e. the same fields in the next
two elements of the same array).  Then there are similar complaints, but
with a different call chain (iptable_mangle instead of iptable_filter).

These offsets are ->list, ->dev and ->priv, and those are exactly the ones
not initialized by xt_hook_ops_alloc().  Looking at the nf_register_net_hook(),
we have
list_add_rcu(>ops.list, elem->list.prev);
a bit further down the road.  ->dev and ->priv are left uninitialized (and
very likely - unused).

I would say it's a false positive.  struct nf_hook_ops is embedded into a
bunch of different objects, with different subsets of fields getting used.
IMO it's a bad idea (in particular, I really wonder if ->list would've
been better off moved into (some of) the containing suckers), but it's

Re: slab corruption with current -git

2016-10-13 Thread Florian Westphal
Linus Torvalds  wrote:
> On Wed, Oct 12, 2016 at 11:27 PM, Markus Trippelsdorf
>  wrote:
> >
> > Yeah.
> >
> > 105 entry->orig_ops = reg;
> > 106 entry->ops  = *reg;
> > 107 entry->next = NULL;
> 
> So ipt_register_table() does:
> 
> ret = nf_register_net_hooks(net, ops, hweight32(table->valid_hooks));
> 
> and then nf_register_net_hooks() just does
> 
> for (i = 0; i < n; i++) {
> err = nf_register_net_hook(net, [i]);
> 
> so if the *reg is uninitialized, it means that it's the 'ops[]' array
> that isn't actually really valid in "valid_hooks". Odd. They should
> all be initialized by xt_hook_ops_alloc(), no?

Its only partially initialized.  Looking at Markus' splat
its complaining about first 16 bytes (list_head), whose contents are indeed
undefined when it gets copied to entry->ops.

For the time being this seems like the most simple "fix", until we
disentangle the hook description (which should be const) from run-time
allocated data.

diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
index e0aa7c1d0224..fc4977456c30 100644
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -1513,7 +1513,7 @@ xt_hook_ops_alloc(const struct xt_table *table,
nf_hookfn *fn)
if (!num_hooks)
return ERR_PTR(-EINVAL);
 
-   ops = kmalloc(sizeof(*ops) * num_hooks, GFP_KERNEL);
+   ops = kcalloc(num_hooks, sizeof(*ops), GFP_KERNEL);
if (ops == NULL)
return ERR_PTR(-ENOMEM);

I'll pass such a patch to Pablo.

> That said, xt_hook_ops_alloc() itself is odd. Lookie here, this is the
> loop that initializes things:
> 
> for (i = 0, hooknum = 0; i < num_hooks && hook_mask != 0;
>  hook_mask >>= 1, ++hooknum) {
> 
> and it makes no sense to me how that tests *both* "i < num_hools" and
> "hook_mask != 0".

Right, one of these is enough.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net 2/2] conntrack: enable to tune gc parameters

2016-10-13 Thread Florian Westphal
Nicolas Dichtel  wrote:
> Le 10/10/2016 à 16:04, Florian Westphal a écrit :
> > Nicolas Dichtel  wrote:
> >> After commit b87a2f9199ea ("netfilter: conntrack: add gc worker to remove
> >> timed-out entries"), netlink conntrack deletion events may be sent with a
> >> huge delay. It could be interesting to let the user tweak gc parameters
> >> depending on its use case.
> > 
> > Hmm, care to elaborate?
> > 
> > I am not against doing this but I'd like to hear/read your use case.
> > 
> > The expectation is that in almot all cases eviction will happen from
> > packet path.  The gc worker is jusdt there for case where a busy system
> > goes idle.
> It was precisely that case. After a period of activity, the event is sent a 
> long
> time after the timeout. If the router does not manage a lot of flows, why not
> trying to parse more entries instead of the default 1/64 of the table?
> In fact, I don't understand why using GC_MAX_BUCKETS_DIV instead of using 
> always
> GC_MAX_BUCKETS whatever the size of the table is.

I wanted to make sure that we have a known upper bound on the number of
buckets we process so that we do not block other pending kworker items
for too long.

(Or cause too many useless scans)

Another idea worth trying might be to get rid of the max cap and
instead break early in case too many jiffies expired.

I don't want to add sysctl knobs for this unless absolutely needed; its already
possible to 'force' eviction cycle by running 'conntrack -L'.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: slab corruption with current -git

2016-10-13 Thread Linus Torvalds
On Wed, Oct 12, 2016 at 11:27 PM, Markus Trippelsdorf
 wrote:
>
> Yeah.
>
> 105 entry->orig_ops = reg;
> 106 entry->ops  = *reg;
> 107 entry->next = NULL;

So ipt_register_table() does:

ret = nf_register_net_hooks(net, ops, hweight32(table->valid_hooks));

and then nf_register_net_hooks() just does

for (i = 0; i < n; i++) {
err = nf_register_net_hook(net, [i]);

so if the *reg is uninitialized, it means that it's the 'ops[]' array
that isn't actually really valid in "valid_hooks". Odd. They should
all be initialized by xt_hook_ops_alloc(), no?

That said, xt_hook_ops_alloc() itself is odd. Lookie here, this is the
loop that initializes things:

for (i = 0, hooknum = 0; i < num_hooks && hook_mask != 0;
 hook_mask >>= 1, ++hooknum) {

and it makes no sense to me how that tests *both* "i < num_hools" and
"hook_mask != 0".

Why? Because

num_hooks = hweight32(hook_mask);

so it's entirely redundant. num_hooks is already how many bits are on
in hook_mask, so that test is just duplicating the same thing twice
("have we done less than that number of bits" and "do we have any bits
less").

I don't know. There's something odd going on. Regardless, thsi is a
different problem from the nf_register_net_hook() list handling, so
I'll leave it to the networking people. David?

   Linus
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH nf-next,RFC 03/10] netfilter: bridge: kill NF_HOOK_THRESH() and state->tresh

2016-10-13 Thread Florian Westphal
Pablo Neira Ayuso  wrote:
> int br_nf_hook_thresh(unsigned int hook, struct net *net,
>   struct sock *sk, struct sk_buff *skb,
>   struct net_device *indev,
>   struct net_device *outdev,
>   int (*okfn)(struct net *, struct sock *,
>   struct sk_buff *))
> {
> struct nf_hook_entry *elem;
> struct nf_hook_state state;
> int ret;
> 
> elem = rcu_dereference(net->nf.hooks[NFPROTO_BRIDGE][hook]);
> 
> while (elem && (elem->ops.priority <= NF_BR_PRI_BRNF))
> elem = rcu_dereference(elem->next);
> 
> ...
> 
> nf_hook_state_init(, elem, hook, NFPROTO_BRIDGE, indev, ...
> 
> Hm, but this code (before actually calling nf_hook_slow) is skipping
> the hook until we get to NF_BR_PRI_BRNF + 1.
> 
> Then hook state sets hook_entry to elem.
> 
> Am I missing anything?

Yes, I'm a moron -- Ignore.  I'll turn off the computer now.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH nf-next,RFC 03/10] netfilter: bridge: kill NF_HOOK_THRESH() and state->tresh

2016-10-13 Thread Pablo Neira Ayuso
On Thu, Oct 13, 2016 at 05:10:55PM +0200, Florian Westphal wrote:
> Pablo Neira Ayuso  wrote:
> > On Thu, Oct 13, 2016 at 02:25:45PM +0200, Florian Westphal wrote:
> > > Pablo Neira Ayuso  wrote:
> > > > Patch c5136b15ea36 ("netfilter: bridge: add and use br_nf_hook_thresh")
> > > > introduced br_nf_hook_thresh().
> > > > 
> > > > Replace NF_HOOK_THRESH() by br_nf_hook_thresh from
> > > > br_nf_forward_finish(), so we have no more callers for this macro.
> > > > 
> > > > As a result, state->thresh and explicit thresh parameter in the hook
> > > > state structure is not required anymore.
> > > > 
> > > > And we can get rid of fast forward code in nf_iterate() in the core path
> > > > that is only used by br_netfilter to search for the filter hook.
> > > 
> > > Note that you will need to move more parts of nf_hook_slow() into
> > > br_nf_hook_thresh(); the bridge netfilter does need to thresh feature
> > > that we have in nf_iterate().
> > 
> > br_nf_hook_thresh() is already skipping hooks before NF_BR_PRI_BRNF to
> > emulate thresh. What else is missing?
> 
> AFAICS you are removing the NF_BR_PRI_BRNF skipping in this patch,
> it relied on nf_hook_slow to do this (plus the state->thresh thing).

int br_nf_hook_thresh(unsigned int hook, struct net *net,
  struct sock *sk, struct sk_buff *skb,
  struct net_device *indev,
  struct net_device *outdev,
  int (*okfn)(struct net *, struct sock *,
  struct sk_buff *))
{
struct nf_hook_entry *elem;
struct nf_hook_state state;
int ret;

elem = rcu_dereference(net->nf.hooks[NFPROTO_BRIDGE][hook]);

while (elem && (elem->ops.priority <= NF_BR_PRI_BRNF))
elem = rcu_dereference(elem->next);

...

nf_hook_state_init(, elem, hook, NFPROTO_BRIDGE, indev, ...

Hm, but this code (before actually calling nf_hook_slow) is skipping
the hook until we get to NF_BR_PRI_BRNF + 1.

Then hook state sets hook_entry to elem.

Am I missing anything?
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH nf-next,RFC 03/10] netfilter: bridge: kill NF_HOOK_THRESH() and state->tresh

2016-10-13 Thread Florian Westphal
Pablo Neira Ayuso  wrote:
> On Thu, Oct 13, 2016 at 02:25:45PM +0200, Florian Westphal wrote:
> > Pablo Neira Ayuso  wrote:
> > > Patch c5136b15ea36 ("netfilter: bridge: add and use br_nf_hook_thresh")
> > > introduced br_nf_hook_thresh().
> > > 
> > > Replace NF_HOOK_THRESH() by br_nf_hook_thresh from
> > > br_nf_forward_finish(), so we have no more callers for this macro.
> > > 
> > > As a result, state->thresh and explicit thresh parameter in the hook
> > > state structure is not required anymore.
> > > 
> > > And we can get rid of fast forward code in nf_iterate() in the core path
> > > that is only used by br_netfilter to search for the filter hook.
> > 
> > Note that you will need to move more parts of nf_hook_slow() into
> > br_nf_hook_thresh(); the bridge netfilter does need to thresh feature
> > that we have in nf_iterate().
> 
> br_nf_hook_thresh() is already skipping hooks before NF_BR_PRI_BRNF to
> emulate thresh. What else is missing?

AFAICS you are removing the NF_BR_PRI_BRNF skipping in this patch,
it relied on nf_hook_slow to do this (plus the state->thresh thing).
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH nf-next,RFC 08/10] netfilter: move NF_QUEUE handling away from core

2016-10-13 Thread Pablo Neira Ayuso
On Thu, Oct 13, 2016 at 02:38:21PM +0200, Florian Westphal wrote:
> Pablo Neira Ayuso  wrote:
[...]
> > diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c
> > index de4fa03f46f3..7040842c34f4 100644
> > --- a/net/ipv4/netfilter/ip_tables.c
> > +++ b/net/ipv4/netfilter/ip_tables.c
> > @@ -29,6 +29,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  #include "../../netfilter/xt_repldata.h"
> >  
> >  MODULE_LICENSE("GPL");
> > @@ -329,6 +330,9 @@ ipt_do_table(struct sk_buff *skb,
> > /* Pop from stack? */
> > if (v != XT_RETURN) {
> > verdict = (unsigned int)(-v) - 1;
> > +   if (verdict == NF_QUEUE)
> > +   verdict = nf_queue(skb, state,
> > +  0, false);
> 
> Any reason why this is needed?
> AFAICS xt_NFQUEUE will never return NF_QUEUE after this patch.

-j QUEUE uses the standard target to return NF_QUEUE. This is very
primitive way to queue packets to userspace queue 0 via nf_queue, but
still may break. I can place this under unlikely() as these days
people should be using NFQUEUE instead.

> > diff --git a/net/netfilter/core.c b/net/netfilter/core.c
> > index 2b3b2f8e39c4..9ae2febd86e3 100644
> > --- a/net/netfilter/core.c
> > +++ b/net/netfilter/core.c
> > @@ -309,6 +309,7 @@ unsigned int nf_iterate(struct sk_buff *skb,
> > unsigned int verdict;
> >  
> > while (*entryp) {
> > +   RCU_INIT_POINTER(state->hook_entries, *entryp);
> >  repeat:
> > verdict = (*entryp)->ops.hook((*entryp)->ops.priv, skb, state);
> > if (verdict != NF_ACCEPT) {
> > @@ -331,9 +332,8 @@ int nf_hook_slow(struct sk_buff *skb, struct 
> > nf_hook_state *state)
> > int ret;
> >  
> > entry = rcu_dereference(state->hook_entries);
> > -next_hook:
> > verdict = nf_iterate(skb, state, );
> > -   switch (verdict & NF_VERDICT_MASK) {
> > +   switch (verdict) {
> 
> This looks buggy, verdict might encode errno for NF_DROP case.
> 
> What you could do is:
> 
> switch (verdict) {
> case NF_ACCEPT:
>   /* something */
>   break;
> case NF_STOLEN:
>   break;
> case NF_DROP: /* fallthrough */
> default: /* drop with error? */
>   kfree_skb(skb);
>   errno = ...
> }

Right, will fix this, thanks. 
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH nf-next,RFC 08/10] netfilter: move NF_QUEUE handling away from core

2016-10-13 Thread Florian Westphal
Pablo Neira Ayuso  wrote:
> > Any reason why this is needed?
> > AFAICS xt_NFQUEUE will never return NF_QUEUE after this patch.
> 
> -j QUEUE uses the standard target to return NF_QUEUE. This is very
> primitive way to queue packets to userspace queue 0 via nf_queue, but
> still may break. I can place this under unlikely() as these days
> people should be using NFQUEUE instead.

No need, just add a comment that this handles legacy standard target
QUEUE (i forgot we still have this).
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH nf-next,RFC 03/10] netfilter: bridge: kill NF_HOOK_THRESH() and state->tresh

2016-10-13 Thread Pablo Neira Ayuso
On Thu, Oct 13, 2016 at 02:25:45PM +0200, Florian Westphal wrote:
> Pablo Neira Ayuso  wrote:
> > Patch c5136b15ea36 ("netfilter: bridge: add and use br_nf_hook_thresh")
> > introduced br_nf_hook_thresh().
> > 
> > Replace NF_HOOK_THRESH() by br_nf_hook_thresh from
> > br_nf_forward_finish(), so we have no more callers for this macro.
> > 
> > As a result, state->thresh and explicit thresh parameter in the hook
> > state structure is not required anymore.
> > 
> > And we can get rid of fast forward code in nf_iterate() in the core path
> > that is only used by br_netfilter to search for the filter hook.
> 
> Note that you will need to move more parts of nf_hook_slow() into
> br_nf_hook_thresh(); the bridge netfilter does need to thresh feature
> that we have in nf_iterate().

br_nf_hook_thresh() is already skipping hooks before NF_BR_PRI_BRNF to
emulate thresh. What else is missing?
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2 nf] netfilter: nf_queue: don't re-enter same hook on packet reinjection

2016-10-13 Thread Aaron Conole
Pablo Neira Ayuso  writes:

> Make sure we skip the current hook from where the packet was enqueued,
> otherwise the packets gets enqueued over and over again.
>
> Fixes: e3b37f11e6e4 ("netfilter: replace list_head with single linked list")
> Signed-off-by: Pablo Neira Ayuso 
> ---
> I managed to reproduce this with a simple test.
>
>  # iptables -I OUTPUT -j QUEUE
>  # cd libnetfilter_queue/utils/
>  # ./nfqnl_test
>
> Without my patch, netfilter munches packets that are reinjected.
>
> @Aaron: Please, I'd appreciate if you can have a look to confirm this bug
> and the fix. Thanks.

Looks like I missed this in my testing.

Reviewed-by: Aaron Conole 

>  net/netfilter/nf_queue.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/net/netfilter/nf_queue.c b/net/netfilter/nf_queue.c
> index 96964a0070e1..221d7a5c2fec 100644
> --- a/net/netfilter/nf_queue.c
> +++ b/net/netfilter/nf_queue.c
> @@ -184,6 +184,7 @@ void nf_reinject(struct nf_queue_entry *entry, unsigned 
> int verdict)
>   verdict = NF_DROP;
>   }
>  
> + hook_entry = rcu_dereference(hook_entry->next);
>   entry->state.thresh = INT_MIN;
>  
>   if (verdict == NF_ACCEPT) {

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH nf-next,RFC 03/10] netfilter: bridge: kill NF_HOOK_THRESH() and state->tresh

2016-10-13 Thread Florian Westphal
Pablo Neira Ayuso  wrote:
> Patch c5136b15ea36 ("netfilter: bridge: add and use br_nf_hook_thresh")
> introduced br_nf_hook_thresh().
> 
> Replace NF_HOOK_THRESH() by br_nf_hook_thresh from
> br_nf_forward_finish(), so we have no more callers for this macro.
> 
> As a result, state->thresh and explicit thresh parameter in the hook
> state structure is not required anymore.
> 
> And we can get rid of fast forward code in nf_iterate() in the core path
> that is only used by br_netfilter to search for the filter hook.

Note that you will need to move more parts of nf_hook_slow() into
br_nf_hook_thresh(); the bridge netfilter does need to thresh feature
that we have in nf_iterate().

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH nf-next,RFC 06/10] netfilter: nf_tables: use hook state from xt_action_param structure

2016-10-13 Thread Pablo Neira Ayuso
Don't copy relevant fields from hook state structure, instead use the
one that is already available in struct xt_action_param.

This patch also adds a set of new wrapper functions to fetch relevant
hook state structure fields.

Signed-off-by: Pablo Neira Ayuso 
---
 include/net/netfilter/nf_tables.h| 35 +++-
 net/bridge/netfilter/nft_meta_bridge.c   |  2 +-
 net/bridge/netfilter/nft_reject_bridge.c | 30 ---
 net/ipv4/netfilter/nft_dup_ipv4.c|  2 +-
 net/ipv4/netfilter/nft_masq_ipv4.c   |  4 ++--
 net/ipv4/netfilter/nft_redir_ipv4.c  |  3 +--
 net/ipv4/netfilter/nft_reject_ipv4.c |  4 ++--
 net/ipv6/netfilter/nft_dup_ipv6.c|  2 +-
 net/ipv6/netfilter/nft_masq_ipv6.c   |  3 ++-
 net/ipv6/netfilter/nft_redir_ipv6.c  |  3 ++-
 net/ipv6/netfilter/nft_reject_ipv6.c |  6 +++---
 net/netfilter/nf_dup_netdev.c|  2 +-
 net/netfilter/nf_tables_core.c   | 10 -
 net/netfilter/nf_tables_trace.c  |  8 
 net/netfilter/nft_log.c  |  5 +++--
 net/netfilter/nft_lookup.c   |  5 ++---
 net/netfilter/nft_meta.c |  6 +++---
 net/netfilter/nft_queue.c|  2 +-
 net/netfilter/nft_reject_inet.c  | 18 
 19 files changed, 86 insertions(+), 64 deletions(-)

diff --git a/include/net/netfilter/nf_tables.h 
b/include/net/netfilter/nf_tables.h
index 44060344f958..ba49f21d62ab 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -14,27 +14,42 @@
 
 struct nft_pktinfo {
struct sk_buff  *skb;
-   struct net  *net;
-   const struct net_device *in;
-   const struct net_device *out;
-   u8  pf;
-   u8  hook;
booltprot_set;
u8  tprot;
/* for x_tables compatibility */
struct xt_action_param  xt;
 };
 
+static inline struct net *pkt_net(const struct nft_pktinfo *pkt)
+{
+   return pkt->xt.state->net;
+}
+
+static inline unsigned int pkt_hook(const struct nft_pktinfo *pkt)
+{
+   return pkt->xt.state->hook;
+}
+
+static inline u8 pkt_pf(const struct nft_pktinfo *pkt)
+{
+   return pkt->xt.state->pf;
+}
+
+static inline const struct net_device *pkt_in(const struct nft_pktinfo *pkt)
+{
+   return pkt->xt.state->in;
+}
+
+static inline const struct net_device *pkt_out(const struct nft_pktinfo *pkt)
+{
+   return pkt->xt.state->out;
+}
+
 static inline void nft_set_pktinfo(struct nft_pktinfo *pkt,
   struct sk_buff *skb,
   const struct nf_hook_state *state)
 {
pkt->skb = skb;
-   pkt->net = state->net;
-   pkt->in = state->in;
-   pkt->out = state->out;
-   pkt->hook = state->hook;
-   pkt->pf = state->pf;
pkt->xt.state = state;
 }
 
diff --git a/net/bridge/netfilter/nft_meta_bridge.c 
b/net/bridge/netfilter/nft_meta_bridge.c
index ad47a921b701..ea72d56d44b9 100644
--- a/net/bridge/netfilter/nft_meta_bridge.c
+++ b/net/bridge/netfilter/nft_meta_bridge.c
@@ -23,7 +23,7 @@ static void nft_meta_bridge_get_eval(const struct nft_expr 
*expr,
 const struct nft_pktinfo *pkt)
 {
const struct nft_meta *priv = nft_expr_priv(expr);
-   const struct net_device *in = pkt->in, *out = pkt->out;
+   const struct net_device *in = pkt_in(pkt), *out = pkt_out(pkt);
u32 *dest = >data[priv->dreg];
const struct net_bridge_port *p;
 
diff --git a/net/bridge/netfilter/nft_reject_bridge.c 
b/net/bridge/netfilter/nft_reject_bridge.c
index 4b3df6b0e3b9..e8918a8a1511 100644
--- a/net/bridge/netfilter/nft_reject_bridge.c
+++ b/net/bridge/netfilter/nft_reject_bridge.c
@@ -315,17 +315,20 @@ static void nft_reject_bridge_eval(const struct nft_expr 
*expr,
case htons(ETH_P_IP):
switch (priv->type) {
case NFT_REJECT_ICMP_UNREACH:
-   nft_reject_br_send_v4_unreach(pkt->net, pkt->skb,
- pkt->in, pkt->hook,
+   nft_reject_br_send_v4_unreach(pkt_net(pkt), pkt->skb,
+ pkt_in(pkt),
+ pkt_hook(pkt),
  priv->icmp_code);
break;
case NFT_REJECT_TCP_RST:
-   nft_reject_br_send_v4_tcp_reset(pkt->net, pkt->skb,
-   pkt->in, pkt->hook);
+   nft_reject_br_send_v4_tcp_reset(pkt_net(pkt), pkt->skb,
+   pkt_in(pkt),
+

[PATCH nf-next,RFC 09/10] netfilter: merge nf_iterate() into nf_hook_slow()

2016-10-13 Thread Pablo Neira Ayuso
nf_iterate() has become rather simple, we can integrate this code into
nf_hook_slow() to reduce the amount of LOC in the core path.

However, we still need nf_iterate() around for nf_queue packet handling,
so move this function there where we only need it. I think it should be
possible to refactor nf_queue code to get rid of it definitely, but
given this is slow path anyway, let's have a look this later.

Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/core.c | 64 +---
 net/netfilter/nf_internals.h |  5 
 net/netfilter/nf_queue.c | 20 ++
 3 files changed, 45 insertions(+), 44 deletions(-)

diff --git a/net/netfilter/core.c b/net/netfilter/core.c
index 9ae2febd86e3..dceb5f92c6a2 100644
--- a/net/netfilter/core.c
+++ b/net/netfilter/core.c
@@ -302,27 +302,6 @@ void _nf_unregister_hooks(struct nf_hook_ops *reg, 
unsigned int n)
 }
 EXPORT_SYMBOL(_nf_unregister_hooks);
 
-unsigned int nf_iterate(struct sk_buff *skb,
-   struct nf_hook_state *state,
-   struct nf_hook_entry **entryp)
-{
-   unsigned int verdict;
-
-   while (*entryp) {
-   RCU_INIT_POINTER(state->hook_entries, *entryp);
-repeat:
-   verdict = (*entryp)->ops.hook((*entryp)->ops.priv, skb, state);
-   if (verdict != NF_ACCEPT) {
-   if (verdict != NF_REPEAT)
-   return verdict;
-   goto repeat;
-   }
-   *entryp = rcu_dereference((*entryp)->next);
-   }
-   return NF_ACCEPT;
-}
-
-
 /* Returns 1 if okfn() needs to be executed by the caller,
  * -EPERM for NF_DROP, 0 otherwise.  Caller must hold rcu_read_lock. */
 int nf_hook_slow(struct sk_buff *skb, struct nf_hook_state *state)
@@ -332,25 +311,32 @@ int nf_hook_slow(struct sk_buff *skb, struct 
nf_hook_state *state)
int ret;
 
entry = rcu_dereference(state->hook_entries);
-   verdict = nf_iterate(skb, state, );
-   switch (verdict) {
-   case NF_ACCEPT:
-   ret = 1;
-   break;
-   case NF_DROP:
-   kfree_skb(skb);
-   ret = NF_DROP_GETERR(verdict);
-   if (ret == 0)
-   ret = -EPERM;
-   break;
-   default:
-   /* Implicit handling for NF_STOLEN, as well as any other non
-* conventional verdicts.
-*/
-   ret = 0;
-   break;
+   while (entry) {
+   RCU_INIT_POINTER(state->hook_entries, entry);
+repeat:
+   verdict = entry->ops.hook(entry->ops.priv, skb, state);
+   switch (verdict) {
+   case NF_ACCEPT:
+   entry = rcu_dereference(entry->next);
+   break;
+   case NF_DROP:
+   kfree_skb(skb);
+   ret = NF_DROP_GETERR(verdict);
+   if (ret == 0)
+   ret = -EPERM;
+
+   return ret;
+   case NF_REPEAT:
+   goto repeat;
+   default:
+   /* Implicit handling for NF_STOLEN, as well as any
+* other non conventional verdicts.
+*/
+   return 0;
+   }
}
-   return ret;
+
+   return 1;
 }
 EXPORT_SYMBOL(nf_hook_slow);
 
diff --git a/net/netfilter/nf_internals.h b/net/netfilter/nf_internals.h
index de25d7cdfd42..22b4915c48f4 100644
--- a/net/netfilter/nf_internals.h
+++ b/net/netfilter/nf_internals.h
@@ -11,11 +11,6 @@
 #define NFDEBUG(format, args...)
 #endif
 
-
-/* core.c */
-unsigned int nf_iterate(struct sk_buff *skb, struct nf_hook_state *state,
-   struct nf_hook_entry **entryp);
-
 /* nf_queue.c */
 void nf_queue_nf_hook_drop(struct net *net, const struct nf_hook_entry *entry);
 int __init netfilter_queue_init(void);
diff --git a/net/netfilter/nf_queue.c b/net/netfilter/nf_queue.c
index c97f4e4e25d9..2b5429c969d5 100644
--- a/net/netfilter/nf_queue.c
+++ b/net/netfilter/nf_queue.c
@@ -177,6 +177,26 @@ int nf_queue(struct sk_buff *skb, const struct 
nf_hook_state *state,
 }
 EXPORT_SYMBOL_GPL(nf_queue);
 
+static unsigned int nf_iterate(struct sk_buff *skb,
+  struct nf_hook_state *state,
+  struct nf_hook_entry **entryp)
+{
+   unsigned int verdict;
+
+   while (*entryp) {
+   RCU_INIT_POINTER(state->hook_entries, *entryp);
+repeat:
+   verdict = (*entryp)->ops.hook((*entryp)->ops.priv, skb, state);
+   if (verdict != NF_ACCEPT) {
+   if (verdict != NF_REPEAT)
+   return verdict;
+   goto repeat;
+   }
+   *entryp = rcu_dereference((*entryp)->next);
+   }
+  

[PATCH nf-next,RFC 05/10] netfilter: x_tables: move hook state into xt_action_param structure

2016-10-13 Thread Pablo Neira Ayuso
Place pointer to hook state in xt_action_param structure instead of
copying the fields that we need. After this change xt_action_param fits
into one cacheline.

This patch also adds a set of new wrapper functions to fetch relevant
hook state structure fields.

Signed-off-by: Pablo Neira Ayuso 
---
 include/linux/netfilter/x_tables.h | 48 +++---
 include/net/netfilter/nf_tables.h  | 11 +++
 net/bridge/netfilter/ebt_arpreply.c|  3 +-
 net/bridge/netfilter/ebt_log.c | 11 +++
 net/bridge/netfilter/ebt_nflog.c   |  6 ++--
 net/bridge/netfilter/ebt_redirect.c|  6 ++--
 net/bridge/netfilter/ebtables.c|  6 +---
 net/ipv4/netfilter/arp_tables.c|  6 +---
 net/ipv4/netfilter/ip_tables.c |  6 +---
 net/ipv4/netfilter/ipt_MASQUERADE.c|  3 +-
 net/ipv4/netfilter/ipt_REJECT.c|  4 +--
 net/ipv4/netfilter/ipt_SYNPROXY.c  |  4 +--
 net/ipv4/netfilter/ipt_rpfilter.c  |  2 +-
 net/ipv6/netfilter/ip6_tables.c|  6 +---
 net/ipv6/netfilter/ip6t_MASQUERADE.c   |  2 +-
 net/ipv6/netfilter/ip6t_REJECT.c   | 23 --
 net/ipv6/netfilter/ip6t_SYNPROXY.c |  4 +--
 net/ipv6/netfilter/ip6t_rpfilter.c |  3 +-
 net/netfilter/ipset/ip_set_core.c  |  6 ++--
 net/netfilter/ipset/ip_set_hash_netiface.c |  2 +-
 net/netfilter/xt_AUDIT.c   | 10 +++
 net/netfilter/xt_LOG.c |  6 ++--
 net/netfilter/xt_NETMAP.c  | 20 ++---
 net/netfilter/xt_NFLOG.c   |  6 ++--
 net/netfilter/xt_NFQUEUE.c |  4 +--
 net/netfilter/xt_REDIRECT.c|  4 +--
 net/netfilter/xt_TCPMSS.c  |  4 +--
 net/netfilter/xt_TEE.c |  4 +--
 net/netfilter/xt_TPROXY.c  | 16 +-
 net/netfilter/xt_addrtype.c| 10 +++
 net/netfilter/xt_cluster.c |  2 +-
 net/netfilter/xt_connlimit.c   |  8 ++---
 net/netfilter/xt_conntrack.c   |  8 ++---
 net/netfilter/xt_devgroup.c|  4 +--
 net/netfilter/xt_dscp.c|  2 +-
 net/netfilter/xt_ipvs.c|  4 +--
 net/netfilter/xt_nfacct.c  |  2 +-
 net/netfilter/xt_osf.c | 10 +++
 net/netfilter/xt_owner.c   |  2 +-
 net/netfilter/xt_pkttype.c |  4 +--
 net/netfilter/xt_policy.c  |  4 +--
 net/netfilter/xt_recent.c  | 10 +++
 net/netfilter/xt_set.c | 26 
 net/netfilter/xt_socket.c  |  4 +--
 net/sched/act_ipt.c| 13 
 net/sched/em_ipset.c   | 18 ++-
 46 files changed, 198 insertions(+), 169 deletions(-)

diff --git a/include/linux/netfilter/x_tables.h 
b/include/linux/netfilter/x_tables.h
index 2ad1a2b289b5..cd4eaf8df445 100644
--- a/include/linux/netfilter/x_tables.h
+++ b/include/linux/netfilter/x_tables.h
@@ -4,6 +4,7 @@
 
 #include 
 #include 
+#include 
 #include 
 
 /* Test a struct->invflags and a boolean for inequality */
@@ -17,14 +18,9 @@
  * @target:the target extension
  * @matchinfo: per-match data
  * @targetinfo:per-target data
- * @netnetwork namespace through which the action was invoked
- * @in:input netdevice
- * @out:   output netdevice
+ * @state: pointer to hook state this packet came from
  * @fragoff:   packet is a fragment, this is the data offset
  * @thoff: position of transport header relative to skb->data
- * @hook:  hook number given packet came from
- * @family:Actual NFPROTO_* through which the function is invoked
- * (helpful when match->family == NFPROTO_UNSPEC)
  *
  * Fields written to by extensions:
  *
@@ -38,15 +34,47 @@ struct xt_action_param {
union {
const void *matchinfo, *targinfo;
};
-   struct net *net;
-   const struct net_device *in, *out;
+   const struct nf_hook_state *state;
int fragoff;
unsigned int thoff;
-   unsigned int hooknum;
-   u_int8_t family;
bool hotdrop;
 };
 
+static inline struct net *xt_net(const struct xt_action_param *par)
+{
+   return par->state->net;
+}
+
+static inline struct net_device *xt_in(const struct xt_action_param *par)
+{
+   return par->state->in;
+}
+
+static inline const char *xt_inname(const struct xt_action_param *par)
+{
+   return par->state->in->name;
+}
+
+static inline struct net_device *xt_out(const struct xt_action_param *par)
+{
+   return par->state->out;
+}
+
+static inline const char *xt_outname(const struct xt_action_param *par)
+{
+   return par->state->out->name;
+}
+
+static inline unsigned int xt_hooknum(const struct xt_action_param *par)
+{
+   return 

[PATCH nf-next,RFC 01/10] netfilter: get rid of useless debugging from core

2016-10-13 Thread Pablo Neira Ayuso
Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/core.c | 9 -
 1 file changed, 9 deletions(-)

diff --git a/net/netfilter/core.c b/net/netfilter/core.c
index fcb5d1df11e9..7b723bcd2522 100644
--- a/net/netfilter/core.c
+++ b/net/netfilter/core.c
@@ -323,15 +323,6 @@ unsigned int nf_iterate(struct sk_buff *skb,
 repeat:
verdict = (*entryp)->ops.hook((*entryp)->ops.priv, skb, state);
if (verdict != NF_ACCEPT) {
-#ifdef CONFIG_NETFILTER_DEBUG
-   if (unlikely((verdict & NF_VERDICT_MASK)
-   > NF_MAX_VERDICT)) {
-   NFDEBUG("Evil return from %p(%u).\n",
-   (*entryp)->ops.hook, state->hook);
-   *entryp = rcu_dereference((*entryp)->next);
-   continue;
-   }
-#endif
if (verdict != NF_REPEAT)
return verdict;
goto repeat;
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH nf-next,RFC 10/10] netfilter: inline nf_hook_slow() and rename it to nf_hook_iterate()

2016-10-13 Thread Pablo Neira Ayuso
Now that this function has become smaller, inline it and use a better
name to describe what this is doing.

Signed-off-by: Pablo Neira Ayuso 
---
 include/linux/netfilter.h | 41 +--
 include/linux/netfilter_ingress.h |  2 +-
 net/bridge/br_netfilter_hooks.c   |  4 ++--
 net/netfilter/core.c  | 39 -
 4 files changed, 42 insertions(+), 44 deletions(-)

diff --git a/include/linux/netfilter.h b/include/linux/netfilter.h
index e0d000f6c9bf..d0beb6072e14 100644
--- a/include/linux/netfilter.h
+++ b/include/linux/netfilter.h
@@ -150,7 +150,44 @@ void nf_unregister_sockopt(struct nf_sockopt_ops *reg);
 extern struct static_key nf_hooks_needed[NFPROTO_NUMPROTO][NF_MAX_HOOKS];
 #endif
 
-int nf_hook_slow(struct sk_buff *skb, struct nf_hook_state *state);
+/* Returns 1 if okfn() needs to be executed by the caller,
+ * -EPERM for NF_DROP, 0 otherwise.  Caller must hold rcu_read_lock.
+ */
+static inline int nf_hook_iterate(struct sk_buff *skb,
+ struct nf_hook_state *state)
+{
+   struct nf_hook_entry *entry;
+   unsigned int verdict;
+   int ret;
+
+   entry = rcu_dereference(state->hook_entries);
+   while (entry) {
+   RCU_INIT_POINTER(state->hook_entries, entry);
+repeat:
+   verdict = entry->ops.hook(entry->ops.priv, skb, state);
+   switch (verdict) {
+   case NF_ACCEPT:
+   entry = rcu_dereference(entry->next);
+   break;
+   case NF_DROP:
+   kfree_skb(skb);
+   ret = NF_DROP_GETERR(verdict);
+   if (ret == 0)
+   ret = -EPERM;
+
+   return ret;
+   case NF_REPEAT:
+   goto repeat;
+   default:
+   /* Implicit handling for NF_STOLEN, as well as any
+* other non conventional verdicts.
+*/
+   return 0;
+   }
+   }
+
+   return 1;
+}
 
 /**
  * nf_hook - call a netfilter hook
@@ -182,7 +219,7 @@ static inline int nf_hook(u_int8_t pf, unsigned int hook, 
struct net *net,
nf_hook_state_init(, hook_head, hook, pf, indev, outdev,
   sk, net, okfn);
 
-   ret = nf_hook_slow(skb, );
+   ret = nf_hook_iterate(skb, );
}
rcu_read_unlock();
 
diff --git a/include/linux/netfilter_ingress.h 
b/include/linux/netfilter_ingress.h
index fd44e4131710..c7056a1f9d36 100644
--- a/include/linux/netfilter_ingress.h
+++ b/include/linux/netfilter_ingress.h
@@ -29,7 +29,7 @@ static inline int nf_hook_ingress(struct sk_buff *skb)
nf_hook_state_init(, e, NF_NETDEV_INGRESS,
   NFPROTO_NETDEV, skb->dev, NULL, NULL,
   dev_net(skb->dev), NULL);
-   return nf_hook_slow(skb, );
+   return nf_hook_iterate(skb, );
 }
 
 static inline void nf_hook_ingress_init(struct net_device *dev)
diff --git a/net/bridge/br_netfilter_hooks.c b/net/bridge/br_netfilter_hooks.c
index 7e3645fa6339..d153925ec9ec 100644
--- a/net/bridge/br_netfilter_hooks.c
+++ b/net/bridge/br_netfilter_hooks.c
@@ -992,7 +992,7 @@ static struct notifier_block brnf_notifier __read_mostly = {
.notifier_call = brnf_device_event,
 };
 
-/* recursively invokes nf_hook_slow (again), skipping already-called
+/* recursively invokes nf_hook_iterate (again), skipping already-called
  * hooks (< NF_BR_PRI_BRNF).
  *
  * Called with rcu read lock held.
@@ -1021,7 +1021,7 @@ int br_nf_hook_thresh(unsigned int hook, struct net *net,
nf_hook_state_init(, elem, hook, NFPROTO_BRIDGE, indev, outdev,
   sk, net, okfn);
 
-   ret = nf_hook_slow(skb, );
+   ret = nf_hook_iterate(skb, );
rcu_read_unlock();
if (ret == 1)
ret = okfn(net, sk, skb);
diff --git a/net/netfilter/core.c b/net/netfilter/core.c
index dceb5f92c6a2..5cf941571ecd 100644
--- a/net/netfilter/core.c
+++ b/net/netfilter/core.c
@@ -302,45 +302,6 @@ void _nf_unregister_hooks(struct nf_hook_ops *reg, 
unsigned int n)
 }
 EXPORT_SYMBOL(_nf_unregister_hooks);
 
-/* Returns 1 if okfn() needs to be executed by the caller,
- * -EPERM for NF_DROP, 0 otherwise.  Caller must hold rcu_read_lock. */
-int nf_hook_slow(struct sk_buff *skb, struct nf_hook_state *state)
-{
-   struct nf_hook_entry *entry;
-   unsigned int verdict;
-   int ret;
-
-   entry = rcu_dereference(state->hook_entries);
-   while (entry) {
-   RCU_INIT_POINTER(state->hook_entries, entry);
-repeat:
-   verdict = entry->ops.hook(entry->ops.priv, skb, state);
-   switch (verdict) {
-   case NF_ACCEPT:
-   entry = rcu_dereference(entry->next);
-  

[PATCH nf-next,RFC 03/10] netfilter: bridge: kill NF_HOOK_THRESH() and state->tresh

2016-10-13 Thread Pablo Neira Ayuso
Patch c5136b15ea36 ("netfilter: bridge: add and use br_nf_hook_thresh")
introduced br_nf_hook_thresh().

Replace NF_HOOK_THRESH() by br_nf_hook_thresh from
br_nf_forward_finish(), so we have no more callers for this macro.

As a result, state->thresh and explicit thresh parameter in the hook
state structure is not required anymore.

And we can get rid of fast forward code in nf_iterate() in the core path
that is only used by br_netfilter to search for the filter hook.

Suggested-by: Florian Westphal 
Signed-off-by: Pablo Neira Ayuso 
---
 include/linux/netfilter.h | 50 +--
 include/linux/netfilter_ingress.h |  2 +-
 net/bridge/br_netfilter_hooks.c   |  8 +++---
 net/bridge/netfilter/ebtable_broute.c |  2 +-
 net/netfilter/core.c  |  4 ---
 net/netfilter/nf_queue.c  |  1 -
 6 files changed, 19 insertions(+), 48 deletions(-)

diff --git a/include/linux/netfilter.h b/include/linux/netfilter.h
index abc7fdcb9eb1..e0d000f6c9bf 100644
--- a/include/linux/netfilter.h
+++ b/include/linux/netfilter.h
@@ -49,7 +49,6 @@ struct sock;
 
 struct nf_hook_state {
unsigned int hook;
-   int thresh;
u_int8_t pf;
struct net_device *in;
struct net_device *out;
@@ -84,7 +83,7 @@ struct nf_hook_entry {
 static inline void nf_hook_state_init(struct nf_hook_state *p,
  struct nf_hook_entry *hook_entry,
  unsigned int hook,
- int thresh, u_int8_t pf,
+ u_int8_t pf,
  struct net_device *indev,
  struct net_device *outdev,
  struct sock *sk,
@@ -92,7 +91,6 @@ static inline void nf_hook_state_init(struct nf_hook_state *p,
  int (*okfn)(struct net *, struct sock *, 
struct sk_buff *))
 {
p->hook = hook;
-   p->thresh = thresh;
p->pf = pf;
p->in = indev;
p->out = outdev;
@@ -155,20 +153,16 @@ extern struct static_key 
nf_hooks_needed[NFPROTO_NUMPROTO][NF_MAX_HOOKS];
 int nf_hook_slow(struct sk_buff *skb, struct nf_hook_state *state);
 
 /**
- * nf_hook_thresh - call a netfilter hook
+ * nf_hook - call a netfilter hook
  *
  * Returns 1 if the hook has allowed the packet to pass.  The function
  * okfn must be invoked by the caller in this case.  Any other return
  * value indicates the packet has been consumed by the hook.
  */
-static inline int nf_hook_thresh(u_int8_t pf, unsigned int hook,
-struct net *net,
-struct sock *sk,
-struct sk_buff *skb,
-struct net_device *indev,
-struct net_device *outdev,
-int (*okfn)(struct net *, struct sock *, 
struct sk_buff *),
-int thresh)
+static inline int nf_hook(u_int8_t pf, unsigned int hook, struct net *net,
+ struct sock *sk, struct sk_buff *skb,
+ struct net_device *indev, struct net_device *outdev,
+ int (*okfn)(struct net *, struct sock *, struct 
sk_buff *))
 {
struct nf_hook_entry *hook_head;
int ret = 1;
@@ -185,8 +179,8 @@ static inline int nf_hook_thresh(u_int8_t pf, unsigned int 
hook,
if (hook_head) {
struct nf_hook_state state;
 
-   nf_hook_state_init(, hook_head, hook, thresh,
-  pf, indev, outdev, sk, net, okfn);
+   nf_hook_state_init(, hook_head, hook, pf, indev, outdev,
+  sk, net, okfn);
 
ret = nf_hook_slow(skb, );
}
@@ -195,14 +189,6 @@ static inline int nf_hook_thresh(u_int8_t pf, unsigned int 
hook,
return ret;
 }
 
-static inline int nf_hook(u_int8_t pf, unsigned int hook, struct net *net,
- struct sock *sk, struct sk_buff *skb,
- struct net_device *indev, struct net_device *outdev,
- int (*okfn)(struct net *, struct sock *, struct 
sk_buff *))
-{
-   return nf_hook_thresh(pf, hook, net, sk, skb, indev, outdev, okfn, 
INT_MIN);
-}
-   
 /* Activate hook; either okfn or kfree_skb called, unless a hook
returns NF_STOLEN (in which case, it's up to the hook to deal with
the consequences).
@@ -221,19 +207,6 @@ static inline int nf_hook(u_int8_t pf, unsigned int hook, 
struct net *net,
 */
 
 static inline int
-NF_HOOK_THRESH(uint8_t pf, unsigned int hook, struct net *net, struct sock *sk,
-  struct sk_buff *skb, struct net_device *in,
-  struct net_device *out,
-  int (*okfn)(struct net *, struct sock *, 

[PATCH nf-next,RFC 02/10] netfilter: remove comments that predate rcu days

2016-10-13 Thread Pablo Neira Ayuso
We cannot block/sleep on nf_iterate because netfilter runs under rcu
read lock these days, where blocking is illegal. So let's remove these
old comments.

Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/core.c | 7 ---
 1 file changed, 7 deletions(-)

diff --git a/net/netfilter/core.c b/net/netfilter/core.c
index 7b723bcd2522..b193bd46ac30 100644
--- a/net/netfilter/core.c
+++ b/net/netfilter/core.c
@@ -308,18 +308,11 @@ unsigned int nf_iterate(struct sk_buff *skb,
 {
unsigned int verdict;
 
-   /*
-* The caller must not block between calls to this
-* function because of risk of continuing from deleted element.
-*/
while (*entryp) {
if (state->thresh > (*entryp)->ops.priority) {
*entryp = rcu_dereference((*entryp)->next);
continue;
}
-
-   /* Optimization: we don't need to hold module
-  reference here, since function can't sleep. --RR */
 repeat:
verdict = (*entryp)->ops.hook((*entryp)->ops.priv, skb, state);
if (verdict != NF_ACCEPT) {
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH nf-next,RFC 04/10] netfilter: deprecate NF_STOP

2016-10-13 Thread Pablo Neira Ayuso
NF_STOP is only used by br_netfilter these days, and it can be emulated
with a combination of NF_STOLEN plus explicit call to the ->okfn()
function as Florian suggests.

To retain binary compatibility with userspace nf_queue application, we
have to keep NF_STOP around, so libnetfilter_queue userspace userspace
applications still work if they use NF_STOP for some exotic reason.

Out of tree modules using NF_STOP would break, we don't care about
those.

Signed-off-by: Pablo Neira Ayuso 
---
 include/uapi/linux/netfilter.h  | 2 +-
 net/bridge/br_netfilter_hooks.c | 6 --
 net/netfilter/core.c| 2 +-
 3 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/include/uapi/linux/netfilter.h b/include/uapi/linux/netfilter.h
index d93f949d1d9a..7550e9176a54 100644
--- a/include/uapi/linux/netfilter.h
+++ b/include/uapi/linux/netfilter.h
@@ -13,7 +13,7 @@
 #define NF_STOLEN 2
 #define NF_QUEUE 3
 #define NF_REPEAT 4
-#define NF_STOP 5
+#define NF_STOP 5  /* Deprecated, for userspace nf_queue compatibility. */
 #define NF_MAX_VERDICT NF_STOP
 
 /* we overload the higher bits for encoding auxiliary data such as the queue
diff --git a/net/bridge/br_netfilter_hooks.c b/net/bridge/br_netfilter_hooks.c
index d0d66faebe90..7e3645fa6339 100644
--- a/net/bridge/br_netfilter_hooks.c
+++ b/net/bridge/br_netfilter_hooks.c
@@ -845,8 +845,10 @@ static unsigned int ip_sabotage_in(void *priv,
   struct sk_buff *skb,
   const struct nf_hook_state *state)
 {
-   if (skb->nf_bridge && !skb->nf_bridge->in_prerouting)
-   return NF_STOP;
+   if (skb->nf_bridge && !skb->nf_bridge->in_prerouting) {
+   state->okfn(state->net, state->sk, skb);
+   return NF_STOLEN;
+   }
 
return NF_ACCEPT;
 }
diff --git a/net/netfilter/core.c b/net/netfilter/core.c
index 6b09d9ed2646..2a6ed7d29c6c 100644
--- a/net/netfilter/core.c
+++ b/net/netfilter/core.c
@@ -333,7 +333,7 @@ int nf_hook_slow(struct sk_buff *skb, struct nf_hook_state 
*state)
entry = rcu_dereference(state->hook_entries);
 next_hook:
verdict = nf_iterate(skb, state, );
-   if (verdict == NF_ACCEPT || verdict == NF_STOP) {
+   if (verdict == NF_ACCEPT) {
ret = 1;
} else if ((verdict & NF_VERDICT_MASK) == NF_DROP) {
kfree_skb(skb);
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH nf-next,RFC 10/10] netfilter: inline nf_hook_slow() and rename it to nf_hook_iterate()

2016-10-13 Thread Pablo Neira Ayuso
Now that this function has become smaller, inline it and use a better
name to describe what this is doing.

Signed-off-by: Pablo Neira Ayuso 
---
 include/linux/netfilter.h | 41 +--
 include/linux/netfilter_ingress.h |  2 +-
 net/bridge/br_netfilter_hooks.c   |  4 ++--
 net/netfilter/core.c  | 39 -
 4 files changed, 42 insertions(+), 44 deletions(-)

diff --git a/include/linux/netfilter.h b/include/linux/netfilter.h
index e0d000f6c9bf..d0beb6072e14 100644
--- a/include/linux/netfilter.h
+++ b/include/linux/netfilter.h
@@ -150,7 +150,44 @@ void nf_unregister_sockopt(struct nf_sockopt_ops *reg);
 extern struct static_key nf_hooks_needed[NFPROTO_NUMPROTO][NF_MAX_HOOKS];
 #endif
 
-int nf_hook_slow(struct sk_buff *skb, struct nf_hook_state *state);
+/* Returns 1 if okfn() needs to be executed by the caller,
+ * -EPERM for NF_DROP, 0 otherwise.  Caller must hold rcu_read_lock.
+ */
+static inline int nf_hook_iterate(struct sk_buff *skb,
+ struct nf_hook_state *state)
+{
+   struct nf_hook_entry *entry;
+   unsigned int verdict;
+   int ret;
+
+   entry = rcu_dereference(state->hook_entries);
+   while (entry) {
+   RCU_INIT_POINTER(state->hook_entries, entry);
+repeat:
+   verdict = entry->ops.hook(entry->ops.priv, skb, state);
+   switch (verdict) {
+   case NF_ACCEPT:
+   entry = rcu_dereference(entry->next);
+   break;
+   case NF_DROP:
+   kfree_skb(skb);
+   ret = NF_DROP_GETERR(verdict);
+   if (ret == 0)
+   ret = -EPERM;
+
+   return ret;
+   case NF_REPEAT:
+   goto repeat;
+   default:
+   /* Implicit handling for NF_STOLEN, as well as any
+* other non conventional verdicts.
+*/
+   return 0;
+   }
+   }
+
+   return 1;
+}
 
 /**
  * nf_hook - call a netfilter hook
@@ -182,7 +219,7 @@ static inline int nf_hook(u_int8_t pf, unsigned int hook, 
struct net *net,
nf_hook_state_init(, hook_head, hook, pf, indev, outdev,
   sk, net, okfn);
 
-   ret = nf_hook_slow(skb, );
+   ret = nf_hook_iterate(skb, );
}
rcu_read_unlock();
 
diff --git a/include/linux/netfilter_ingress.h 
b/include/linux/netfilter_ingress.h
index fd44e4131710..c7056a1f9d36 100644
--- a/include/linux/netfilter_ingress.h
+++ b/include/linux/netfilter_ingress.h
@@ -29,7 +29,7 @@ static inline int nf_hook_ingress(struct sk_buff *skb)
nf_hook_state_init(, e, NF_NETDEV_INGRESS,
   NFPROTO_NETDEV, skb->dev, NULL, NULL,
   dev_net(skb->dev), NULL);
-   return nf_hook_slow(skb, );
+   return nf_hook_iterate(skb, );
 }
 
 static inline void nf_hook_ingress_init(struct net_device *dev)
diff --git a/net/bridge/br_netfilter_hooks.c b/net/bridge/br_netfilter_hooks.c
index 7e3645fa6339..d153925ec9ec 100644
--- a/net/bridge/br_netfilter_hooks.c
+++ b/net/bridge/br_netfilter_hooks.c
@@ -992,7 +992,7 @@ static struct notifier_block brnf_notifier __read_mostly = {
.notifier_call = brnf_device_event,
 };
 
-/* recursively invokes nf_hook_slow (again), skipping already-called
+/* recursively invokes nf_hook_iterate (again), skipping already-called
  * hooks (< NF_BR_PRI_BRNF).
  *
  * Called with rcu read lock held.
@@ -1021,7 +1021,7 @@ int br_nf_hook_thresh(unsigned int hook, struct net *net,
nf_hook_state_init(, elem, hook, NFPROTO_BRIDGE, indev, outdev,
   sk, net, okfn);
 
-   ret = nf_hook_slow(skb, );
+   ret = nf_hook_iterate(skb, );
rcu_read_unlock();
if (ret == 1)
ret = okfn(net, sk, skb);
diff --git a/net/netfilter/core.c b/net/netfilter/core.c
index dceb5f92c6a2..5cf941571ecd 100644
--- a/net/netfilter/core.c
+++ b/net/netfilter/core.c
@@ -302,45 +302,6 @@ void _nf_unregister_hooks(struct nf_hook_ops *reg, 
unsigned int n)
 }
 EXPORT_SYMBOL(_nf_unregister_hooks);
 
-/* Returns 1 if okfn() needs to be executed by the caller,
- * -EPERM for NF_DROP, 0 otherwise.  Caller must hold rcu_read_lock. */
-int nf_hook_slow(struct sk_buff *skb, struct nf_hook_state *state)
-{
-   struct nf_hook_entry *entry;
-   unsigned int verdict;
-   int ret;
-
-   entry = rcu_dereference(state->hook_entries);
-   while (entry) {
-   RCU_INIT_POINTER(state->hook_entries, entry);
-repeat:
-   verdict = entry->ops.hook(entry->ops.priv, skb, state);
-   switch (verdict) {
-   case NF_ACCEPT:
-   entry = rcu_dereference(entry->next);
-  

[PATCH nf-next,RFC 03/10] netfilter: bridge: kill NF_HOOK_THRESH() and state->tresh

2016-10-13 Thread Pablo Neira Ayuso
Patch c5136b15ea36 ("netfilter: bridge: add and use br_nf_hook_thresh")
introduced br_nf_hook_thresh().

Replace NF_HOOK_THRESH() by br_nf_hook_thresh from
br_nf_forward_finish(), so we have no more callers for this macro.

As a result, state->thresh and explicit thresh parameter in the hook
state structure is not required anymore.

And we can get rid of fast forward code in nf_iterate() in the core path
that is only used by br_netfilter to search for the filter hook.

Suggested-by: Florian Westphal 
Signed-off-by: Pablo Neira Ayuso 
---
 include/linux/netfilter.h | 50 +--
 include/linux/netfilter_ingress.h |  2 +-
 net/bridge/br_netfilter_hooks.c   |  8 +++---
 net/bridge/netfilter/ebtable_broute.c |  2 +-
 net/netfilter/core.c  |  4 ---
 net/netfilter/nf_queue.c  |  1 -
 6 files changed, 19 insertions(+), 48 deletions(-)

diff --git a/include/linux/netfilter.h b/include/linux/netfilter.h
index abc7fdcb9eb1..e0d000f6c9bf 100644
--- a/include/linux/netfilter.h
+++ b/include/linux/netfilter.h
@@ -49,7 +49,6 @@ struct sock;
 
 struct nf_hook_state {
unsigned int hook;
-   int thresh;
u_int8_t pf;
struct net_device *in;
struct net_device *out;
@@ -84,7 +83,7 @@ struct nf_hook_entry {
 static inline void nf_hook_state_init(struct nf_hook_state *p,
  struct nf_hook_entry *hook_entry,
  unsigned int hook,
- int thresh, u_int8_t pf,
+ u_int8_t pf,
  struct net_device *indev,
  struct net_device *outdev,
  struct sock *sk,
@@ -92,7 +91,6 @@ static inline void nf_hook_state_init(struct nf_hook_state *p,
  int (*okfn)(struct net *, struct sock *, 
struct sk_buff *))
 {
p->hook = hook;
-   p->thresh = thresh;
p->pf = pf;
p->in = indev;
p->out = outdev;
@@ -155,20 +153,16 @@ extern struct static_key 
nf_hooks_needed[NFPROTO_NUMPROTO][NF_MAX_HOOKS];
 int nf_hook_slow(struct sk_buff *skb, struct nf_hook_state *state);
 
 /**
- * nf_hook_thresh - call a netfilter hook
+ * nf_hook - call a netfilter hook
  *
  * Returns 1 if the hook has allowed the packet to pass.  The function
  * okfn must be invoked by the caller in this case.  Any other return
  * value indicates the packet has been consumed by the hook.
  */
-static inline int nf_hook_thresh(u_int8_t pf, unsigned int hook,
-struct net *net,
-struct sock *sk,
-struct sk_buff *skb,
-struct net_device *indev,
-struct net_device *outdev,
-int (*okfn)(struct net *, struct sock *, 
struct sk_buff *),
-int thresh)
+static inline int nf_hook(u_int8_t pf, unsigned int hook, struct net *net,
+ struct sock *sk, struct sk_buff *skb,
+ struct net_device *indev, struct net_device *outdev,
+ int (*okfn)(struct net *, struct sock *, struct 
sk_buff *))
 {
struct nf_hook_entry *hook_head;
int ret = 1;
@@ -185,8 +179,8 @@ static inline int nf_hook_thresh(u_int8_t pf, unsigned int 
hook,
if (hook_head) {
struct nf_hook_state state;
 
-   nf_hook_state_init(, hook_head, hook, thresh,
-  pf, indev, outdev, sk, net, okfn);
+   nf_hook_state_init(, hook_head, hook, pf, indev, outdev,
+  sk, net, okfn);
 
ret = nf_hook_slow(skb, );
}
@@ -195,14 +189,6 @@ static inline int nf_hook_thresh(u_int8_t pf, unsigned int 
hook,
return ret;
 }
 
-static inline int nf_hook(u_int8_t pf, unsigned int hook, struct net *net,
- struct sock *sk, struct sk_buff *skb,
- struct net_device *indev, struct net_device *outdev,
- int (*okfn)(struct net *, struct sock *, struct 
sk_buff *))
-{
-   return nf_hook_thresh(pf, hook, net, sk, skb, indev, outdev, okfn, 
INT_MIN);
-}
-   
 /* Activate hook; either okfn or kfree_skb called, unless a hook
returns NF_STOLEN (in which case, it's up to the hook to deal with
the consequences).
@@ -221,19 +207,6 @@ static inline int nf_hook(u_int8_t pf, unsigned int hook, 
struct net *net,
 */
 
 static inline int
-NF_HOOK_THRESH(uint8_t pf, unsigned int hook, struct net *net, struct sock *sk,
-  struct sk_buff *skb, struct net_device *in,
-  struct net_device *out,
-  int (*okfn)(struct net *, struct sock *, 

[PATCH nf-next,RFC 05/10] netfilter: x_tables: move hook state into xt_action_param structure

2016-10-13 Thread Pablo Neira Ayuso
Place pointer to hook state in xt_action_param structure instead of
copying the fields that we need. After this change xt_action_param fits
into one cacheline.

This patch also adds a set of new wrapper functions to fetch relevant
hook state structure fields.

Signed-off-by: Pablo Neira Ayuso 
---
 include/linux/netfilter/x_tables.h | 48 +++---
 include/net/netfilter/nf_tables.h  | 11 +++
 net/bridge/netfilter/ebt_arpreply.c|  3 +-
 net/bridge/netfilter/ebt_log.c | 11 +++
 net/bridge/netfilter/ebt_nflog.c   |  6 ++--
 net/bridge/netfilter/ebt_redirect.c|  6 ++--
 net/bridge/netfilter/ebtables.c|  6 +---
 net/ipv4/netfilter/arp_tables.c|  6 +---
 net/ipv4/netfilter/ip_tables.c |  6 +---
 net/ipv4/netfilter/ipt_MASQUERADE.c|  3 +-
 net/ipv4/netfilter/ipt_REJECT.c|  4 +--
 net/ipv4/netfilter/ipt_SYNPROXY.c  |  4 +--
 net/ipv4/netfilter/ipt_rpfilter.c  |  2 +-
 net/ipv6/netfilter/ip6_tables.c|  6 +---
 net/ipv6/netfilter/ip6t_MASQUERADE.c   |  2 +-
 net/ipv6/netfilter/ip6t_REJECT.c   | 23 --
 net/ipv6/netfilter/ip6t_SYNPROXY.c |  4 +--
 net/ipv6/netfilter/ip6t_rpfilter.c |  3 +-
 net/netfilter/ipset/ip_set_core.c  |  6 ++--
 net/netfilter/ipset/ip_set_hash_netiface.c |  2 +-
 net/netfilter/xt_AUDIT.c   | 10 +++
 net/netfilter/xt_LOG.c |  6 ++--
 net/netfilter/xt_NETMAP.c  | 20 ++---
 net/netfilter/xt_NFLOG.c   |  6 ++--
 net/netfilter/xt_NFQUEUE.c |  4 +--
 net/netfilter/xt_REDIRECT.c|  4 +--
 net/netfilter/xt_TCPMSS.c  |  4 +--
 net/netfilter/xt_TEE.c |  4 +--
 net/netfilter/xt_TPROXY.c  | 16 +-
 net/netfilter/xt_addrtype.c| 10 +++
 net/netfilter/xt_cluster.c |  2 +-
 net/netfilter/xt_connlimit.c   |  8 ++---
 net/netfilter/xt_conntrack.c   |  8 ++---
 net/netfilter/xt_devgroup.c|  4 +--
 net/netfilter/xt_dscp.c|  2 +-
 net/netfilter/xt_ipvs.c|  4 +--
 net/netfilter/xt_nfacct.c  |  2 +-
 net/netfilter/xt_osf.c | 10 +++
 net/netfilter/xt_owner.c   |  2 +-
 net/netfilter/xt_pkttype.c |  4 +--
 net/netfilter/xt_policy.c  |  4 +--
 net/netfilter/xt_recent.c  | 10 +++
 net/netfilter/xt_set.c | 26 
 net/netfilter/xt_socket.c  |  4 +--
 net/sched/act_ipt.c| 13 
 net/sched/em_ipset.c   | 18 ++-
 46 files changed, 198 insertions(+), 169 deletions(-)

diff --git a/include/linux/netfilter/x_tables.h 
b/include/linux/netfilter/x_tables.h
index 2ad1a2b289b5..cd4eaf8df445 100644
--- a/include/linux/netfilter/x_tables.h
+++ b/include/linux/netfilter/x_tables.h
@@ -4,6 +4,7 @@
 
 #include 
 #include 
+#include 
 #include 
 
 /* Test a struct->invflags and a boolean for inequality */
@@ -17,14 +18,9 @@
  * @target:the target extension
  * @matchinfo: per-match data
  * @targetinfo:per-target data
- * @netnetwork namespace through which the action was invoked
- * @in:input netdevice
- * @out:   output netdevice
+ * @state: pointer to hook state this packet came from
  * @fragoff:   packet is a fragment, this is the data offset
  * @thoff: position of transport header relative to skb->data
- * @hook:  hook number given packet came from
- * @family:Actual NFPROTO_* through which the function is invoked
- * (helpful when match->family == NFPROTO_UNSPEC)
  *
  * Fields written to by extensions:
  *
@@ -38,15 +34,47 @@ struct xt_action_param {
union {
const void *matchinfo, *targinfo;
};
-   struct net *net;
-   const struct net_device *in, *out;
+   const struct nf_hook_state *state;
int fragoff;
unsigned int thoff;
-   unsigned int hooknum;
-   u_int8_t family;
bool hotdrop;
 };
 
+static inline struct net *xt_net(const struct xt_action_param *par)
+{
+   return par->state->net;
+}
+
+static inline struct net_device *xt_in(const struct xt_action_param *par)
+{
+   return par->state->in;
+}
+
+static inline const char *xt_inname(const struct xt_action_param *par)
+{
+   return par->state->in->name;
+}
+
+static inline struct net_device *xt_out(const struct xt_action_param *par)
+{
+   return par->state->out;
+}
+
+static inline const char *xt_outname(const struct xt_action_param *par)
+{
+   return par->state->out->name;
+}
+
+static inline unsigned int xt_hooknum(const struct xt_action_param *par)
+{
+   return 

[PATCH nf-next,RFC 08/10] netfilter: move NF_QUEUE handling away from core

2016-10-13 Thread Pablo Neira Ayuso
Export a new nf_queue() function that translates the NF_QUEUE verdict
depending on the scenario:

1) Drop packet if queue is full.
2) Accept packet if bypass is enabled.
3) Return stolen if packet is enqueued.

We can call this function from xt_NFQUEUE and nft_queue. Thus, we
move packet queuing to userspace away from the core path.

We still have to handle the old QUEUE standard target for
{ip,ip6}_tables, which points to queue number zero. Just in case we
still have any user relying on this behaviour. No need to handle this
from arp and ebtables, they never got a native queue target.

After this patch, we have to inconditionally set state->hook_entries
before calling the hook since nf_iterate() since we need this to know
from what hook the packet is escaping to userspace in nf_queue.

>From nft_verdict_init(), disallow NF_QUEUE as verdict since we always
use the nft_queue expression for this and we don't have any userspace
code using this since the beginning.

Signed-off-by: Pablo Neira Ayuso 
---
 include/net/netfilter/nf_queue.h |  3 +++
 net/ipv4/netfilter/arp_tables.c  |  1 +
 net/ipv4/netfilter/ip_tables.c   |  4 
 net/ipv6/netfilter/ip6_tables.c  |  4 
 net/netfilter/core.c | 14 ++-
 net/netfilter/nf_internals.h |  2 --
 net/netfilter/nf_queue.c | 51 ++--
 net/netfilter/nf_tables_api.c|  3 +--
 net/netfilter/nf_tables_core.c   |  3 +--
 net/netfilter/nft_queue.c|  6 ++---
 net/netfilter/xt_NFQUEUE.c   | 29 ---
 11 files changed, 67 insertions(+), 53 deletions(-)

diff --git a/include/net/netfilter/nf_queue.h b/include/net/netfilter/nf_queue.h
index 2280cfe86c56..807b9de72b43 100644
--- a/include/net/netfilter/nf_queue.h
+++ b/include/net/netfilter/nf_queue.h
@@ -29,6 +29,9 @@ struct nf_queue_handler {
 
 void nf_register_queue_handler(struct net *net, const struct nf_queue_handler 
*qh);
 void nf_unregister_queue_handler(struct net *net);
+
+int nf_queue(struct sk_buff *skb, const struct nf_hook_state *state,
+unsigned int queuenum, bool bypass);
 void nf_reinject(struct nf_queue_entry *entry, unsigned int verdict);
 
 void nf_queue_entry_get_refs(struct nf_queue_entry *entry);
diff --git a/net/ipv4/netfilter/arp_tables.c b/net/ipv4/netfilter/arp_tables.c
index e76ab23a2deb..83d82f6be8dd 100644
--- a/net/ipv4/netfilter/arp_tables.c
+++ b/net/ipv4/netfilter/arp_tables.c
@@ -28,6 +28,7 @@
 
 #include 
 #include 
+#include 
 #include "../../netfilter/xt_repldata.h"
 
 MODULE_LICENSE("GPL");
diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c
index de4fa03f46f3..7040842c34f4 100644
--- a/net/ipv4/netfilter/ip_tables.c
+++ b/net/ipv4/netfilter/ip_tables.c
@@ -29,6 +29,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "../../netfilter/xt_repldata.h"
 
 MODULE_LICENSE("GPL");
@@ -329,6 +330,9 @@ ipt_do_table(struct sk_buff *skb,
/* Pop from stack? */
if (v != XT_RETURN) {
verdict = (unsigned int)(-v) - 1;
+   if (verdict == NF_QUEUE)
+   verdict = nf_queue(skb, state,
+  0, false);
break;
}
if (stackidx == 0) {
diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c
index 7eac01d5d621..7119daa19ba6 100644
--- a/net/ipv6/netfilter/ip6_tables.c
+++ b/net/ipv6/netfilter/ip6_tables.c
@@ -33,6 +33,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "../../netfilter/xt_repldata.h"
 
 MODULE_LICENSE("GPL");
@@ -361,6 +362,9 @@ ip6t_do_table(struct sk_buff *skb,
/* Pop from stack? */
if (v != XT_RETURN) {
verdict = (unsigned int)(-v) - 1;
+   if (verdict == NF_QUEUE)
+   verdict = nf_queue(skb, state,
+  0, false);
break;
}
if (stackidx == 0)
diff --git a/net/netfilter/core.c b/net/netfilter/core.c
index 2b3b2f8e39c4..9ae2febd86e3 100644
--- a/net/netfilter/core.c
+++ b/net/netfilter/core.c
@@ -309,6 +309,7 @@ unsigned int nf_iterate(struct sk_buff *skb,
unsigned int verdict;
 
while (*entryp) {
+   RCU_INIT_POINTER(state->hook_entries, *entryp);
 repeat:
verdict = (*entryp)->ops.hook((*entryp)->ops.priv, skb, state);
if (verdict != NF_ACCEPT) {
@@ -331,9 +332,8 @@ int nf_hook_slow(struct sk_buff *skb, struct nf_hook_state 
*state)
int ret;
 

[PATCH nf-next,RFC 04/10] netfilter: deprecate NF_STOP

2016-10-13 Thread Pablo Neira Ayuso
NF_STOP is only used by br_netfilter these days, and it can be emulated
with a combination of NF_STOLEN plus explicit call to the ->okfn()
function as Florian suggests.

To retain binary compatibility with userspace nf_queue application, we
have to keep NF_STOP around, so libnetfilter_queue userspace userspace
applications still work if they use NF_STOP for some exotic reason.

Out of tree modules using NF_STOP would break, we don't care about
those.

Signed-off-by: Pablo Neira Ayuso 
---
 include/uapi/linux/netfilter.h  | 2 +-
 net/bridge/br_netfilter_hooks.c | 6 --
 net/netfilter/core.c| 2 +-
 3 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/include/uapi/linux/netfilter.h b/include/uapi/linux/netfilter.h
index d93f949d1d9a..7550e9176a54 100644
--- a/include/uapi/linux/netfilter.h
+++ b/include/uapi/linux/netfilter.h
@@ -13,7 +13,7 @@
 #define NF_STOLEN 2
 #define NF_QUEUE 3
 #define NF_REPEAT 4
-#define NF_STOP 5
+#define NF_STOP 5  /* Deprecated, for userspace nf_queue compatibility. */
 #define NF_MAX_VERDICT NF_STOP
 
 /* we overload the higher bits for encoding auxiliary data such as the queue
diff --git a/net/bridge/br_netfilter_hooks.c b/net/bridge/br_netfilter_hooks.c
index d0d66faebe90..7e3645fa6339 100644
--- a/net/bridge/br_netfilter_hooks.c
+++ b/net/bridge/br_netfilter_hooks.c
@@ -845,8 +845,10 @@ static unsigned int ip_sabotage_in(void *priv,
   struct sk_buff *skb,
   const struct nf_hook_state *state)
 {
-   if (skb->nf_bridge && !skb->nf_bridge->in_prerouting)
-   return NF_STOP;
+   if (skb->nf_bridge && !skb->nf_bridge->in_prerouting) {
+   state->okfn(state->net, state->sk, skb);
+   return NF_STOLEN;
+   }
 
return NF_ACCEPT;
 }
diff --git a/net/netfilter/core.c b/net/netfilter/core.c
index 6b09d9ed2646..2a6ed7d29c6c 100644
--- a/net/netfilter/core.c
+++ b/net/netfilter/core.c
@@ -333,7 +333,7 @@ int nf_hook_slow(struct sk_buff *skb, struct nf_hook_state 
*state)
entry = rcu_dereference(state->hook_entries);
 next_hook:
verdict = nf_iterate(skb, state, );
-   if (verdict == NF_ACCEPT || verdict == NF_STOP) {
+   if (verdict == NF_ACCEPT) {
ret = 1;
} else if ((verdict & NF_VERDICT_MASK) == NF_DROP) {
kfree_skb(skb);
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH nf-next,RFC 07/10] netfilter: use switch() to handle verdict cases from nf_hook_slow()

2016-10-13 Thread Pablo Neira Ayuso
Use switch() for verdict handling and add explicit handling for
NF_STOLEN and other non-conventional verdicts.

Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/core.c | 28 ++--
 1 file changed, 18 insertions(+), 10 deletions(-)

diff --git a/net/netfilter/core.c b/net/netfilter/core.c
index 2a6ed7d29c6c..2b3b2f8e39c4 100644
--- a/net/netfilter/core.c
+++ b/net/netfilter/core.c
@@ -328,29 +328,37 @@ int nf_hook_slow(struct sk_buff *skb, struct 
nf_hook_state *state)
 {
struct nf_hook_entry *entry;
unsigned int verdict;
-   int ret = 0;
+   int ret;
 
entry = rcu_dereference(state->hook_entries);
 next_hook:
verdict = nf_iterate(skb, state, );
-   if (verdict == NF_ACCEPT) {
+   switch (verdict & NF_VERDICT_MASK) {
+   case NF_ACCEPT:
ret = 1;
-   } else if ((verdict & NF_VERDICT_MASK) == NF_DROP) {
+   break;
+   case NF_DROP:
kfree_skb(skb);
ret = NF_DROP_GETERR(verdict);
if (ret == 0)
ret = -EPERM;
-   } else if ((verdict & NF_VERDICT_MASK) == NF_QUEUE) {
-   int err;
-
+   break;
+   case NF_QUEUE:
RCU_INIT_POINTER(state->hook_entries, entry);
-   err = nf_queue(skb, state, verdict >> NF_VERDICT_QBITS);
-   if (err < 0) {
-   if (err == -ESRCH &&
-  (verdict & NF_VERDICT_FLAG_QUEUE_BYPASS))
+   ret = nf_queue(skb, state, verdict >> NF_VERDICT_QBITS);
+   if (ret < 0) {
+   if (ret == -ESRCH &&
+   (verdict & NF_VERDICT_FLAG_QUEUE_BYPASS))
goto next_hook;
kfree_skb(skb);
}
+   /* Fall through. */
+   default:
+   /* Implicit handling for NF_STOLEN, as well as any other non
+* conventional verdicts.
+*/
+   ret = 0;
+   break;
}
return ret;
 }
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH nf-next,RFC 02/10] netfilter: remove comments that predate rcu days

2016-10-13 Thread Pablo Neira Ayuso
We cannot block/sleep on nf_iterate because netfilter runs under rcu
read lock these days, where blocking is illegal. So let's remove these
old comments.

Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/core.c | 7 ---
 1 file changed, 7 deletions(-)

diff --git a/net/netfilter/core.c b/net/netfilter/core.c
index 7b723bcd2522..b193bd46ac30 100644
--- a/net/netfilter/core.c
+++ b/net/netfilter/core.c
@@ -308,18 +308,11 @@ unsigned int nf_iterate(struct sk_buff *skb,
 {
unsigned int verdict;
 
-   /*
-* The caller must not block between calls to this
-* function because of risk of continuing from deleted element.
-*/
while (*entryp) {
if (state->thresh > (*entryp)->ops.priority) {
*entryp = rcu_dereference((*entryp)->next);
continue;
}
-
-   /* Optimization: we don't need to hold module
-  reference here, since function can't sleep. --RR */
 repeat:
verdict = (*entryp)->ops.hook((*entryp)->ops.priv, skb, state);
if (verdict != NF_ACCEPT) {
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH nf-next,RFC 01/10] netfilter: get rid of useless debugging from core

2016-10-13 Thread Pablo Neira Ayuso
Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/core.c | 9 -
 1 file changed, 9 deletions(-)

diff --git a/net/netfilter/core.c b/net/netfilter/core.c
index fcb5d1df11e9..7b723bcd2522 100644
--- a/net/netfilter/core.c
+++ b/net/netfilter/core.c
@@ -323,15 +323,6 @@ unsigned int nf_iterate(struct sk_buff *skb,
 repeat:
verdict = (*entryp)->ops.hook((*entryp)->ops.priv, skb, state);
if (verdict != NF_ACCEPT) {
-#ifdef CONFIG_NETFILTER_DEBUG
-   if (unlikely((verdict & NF_VERDICT_MASK)
-   > NF_MAX_VERDICT)) {
-   NFDEBUG("Evil return from %p(%u).\n",
-   (*entryp)->ops.hook, state->hook);
-   *entryp = rcu_dereference((*entryp)->next);
-   continue;
-   }
-#endif
if (verdict != NF_REPEAT)
return verdict;
goto repeat;
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH nf-next,RFC 00/10] Netfilter core updates

2016-10-13 Thread Pablo Neira Ayuso
Hi,

This is a quick RFC that has passed just very basic testing here. This
patchset achieves what it has been discussed during NetDev 1.2:

1) Deprecate NF_STOP, as this is only used by br_netfilter.

2) Remove threshold handling, this is also only used by br_netfilter
   too.

3) Move NF_QUEUE handling away from the core. Adds a new nf_queue()
   function that must be explicitly called to queue packets to userspace.
   This function returns the verdict that is passed down to the core,
   basically NF_DROP if queue is full, NF_ACCEPT if bypass is enabled
   and NF_STOLEN if packet is succesfully enqueued to userspace.

4) Merge nf_iterate() into nf_hook_slow(), then inline the resulting
   function and rename it to nf_hook_iterate().

This patchset also modifies the pktinfo and xt_action_param structures
(that keep context around while performing packet processing) to store
the netfilter hook state. This change was required by the new NF_QUEUE
handling. As side effect, according to pahole, these two now fit into
one single cacheline after this update.

Thanks!

Pablo Neira Ayuso (10):
  netfilter: get rid of useless debugging from core
  netfilter: remove comments that predate rcu days
  netfilter: bridge: kill NF_HOOK_THRESH() and state->tresh
  netfilter: deprecate NF_STOP
  netfilter: x_tables: move hook state into xt_action_param structure
  netfilter: nf_tables: use hook state from xt_action_param structure
  netfilter: use switch() to handle verdict cases from nf_hook_slow()
  netfilter: move NF_QUEUE handling away from core
  netfilter: merge nf_iterate() into nf_hook_slow()
  netfilter: inline nf_hook_slow() and rename it to nf_hook_iterate()

 include/linux/netfilter.h  | 91 +-
 include/linux/netfilter/x_tables.h | 48 
 include/linux/netfilter_ingress.h  |  4 +-
 include/net/netfilter/nf_queue.h   |  3 +
 include/net/netfilter/nf_tables.h  | 36 
 include/uapi/linux/netfilter.h |  2 +-
 net/bridge/br_netfilter_hooks.c| 18 +++---
 net/bridge/netfilter/ebt_arpreply.c|  3 +-
 net/bridge/netfilter/ebt_log.c | 11 ++--
 net/bridge/netfilter/ebt_nflog.c   |  6 +-
 net/bridge/netfilter/ebt_redirect.c|  6 +-
 net/bridge/netfilter/ebtable_broute.c  |  2 +-
 net/bridge/netfilter/ebtables.c|  6 +-
 net/bridge/netfilter/nft_meta_bridge.c |  2 +-
 net/bridge/netfilter/nft_reject_bridge.c   | 30 ++
 net/ipv4/netfilter/arp_tables.c|  7 +--
 net/ipv4/netfilter/ip_tables.c | 10 ++--
 net/ipv4/netfilter/ipt_MASQUERADE.c|  3 +-
 net/ipv4/netfilter/ipt_REJECT.c|  4 +-
 net/ipv4/netfilter/ipt_SYNPROXY.c  |  4 +-
 net/ipv4/netfilter/ipt_rpfilter.c  |  2 +-
 net/ipv4/netfilter/nft_dup_ipv4.c  |  2 +-
 net/ipv4/netfilter/nft_masq_ipv4.c |  4 +-
 net/ipv4/netfilter/nft_redir_ipv4.c|  3 +-
 net/ipv4/netfilter/nft_reject_ipv4.c   |  4 +-
 net/ipv6/netfilter/ip6_tables.c| 10 ++--
 net/ipv6/netfilter/ip6t_MASQUERADE.c   |  2 +-
 net/ipv6/netfilter/ip6t_REJECT.c   | 23 +---
 net/ipv6/netfilter/ip6t_SYNPROXY.c |  4 +-
 net/ipv6/netfilter/ip6t_rpfilter.c |  3 +-
 net/ipv6/netfilter/nft_dup_ipv6.c  |  2 +-
 net/ipv6/netfilter/nft_masq_ipv6.c |  3 +-
 net/ipv6/netfilter/nft_redir_ipv6.c|  3 +-
 net/ipv6/netfilter/nft_reject_ipv6.c   |  6 +-
 net/netfilter/core.c   | 75 
 net/netfilter/ipset/ip_set_core.c  |  6 +-
 net/netfilter/ipset/ip_set_hash_netiface.c |  2 +-
 net/netfilter/nf_dup_netdev.c  |  2 +-
 net/netfilter/nf_internals.h   |  7 ---
 net/netfilter/nf_queue.c   | 72 +--
 net/netfilter/nf_tables_api.c  |  3 +-
 net/netfilter/nf_tables_core.c | 13 ++---
 net/netfilter/nf_tables_trace.c|  8 +--
 net/netfilter/nft_log.c|  5 +-
 net/netfilter/nft_lookup.c |  5 +-
 net/netfilter/nft_meta.c   |  6 +-
 net/netfilter/nft_queue.c  |  8 +--
 net/netfilter/nft_reject_inet.c| 18 +++---
 net/netfilter/xt_AUDIT.c   | 10 ++--
 net/netfilter/xt_LOG.c |  6 +-
 net/netfilter/xt_NETMAP.c  | 20 +++
 net/netfilter/xt_NFLOG.c   |  6 +-
 net/netfilter/xt_NFQUEUE.c | 33 +--
 net/netfilter/xt_REDIRECT.c|  4 +-
 net/netfilter/xt_TCPMSS.c  |  4 +-
 net/netfilter/xt_TEE.c |  4 +-
 net/netfilter/xt_TPROXY.c  | 16 +++---
 net/netfilter/xt_addrtype.c| 10 ++--
 net/netfilter/xt_cluster.c |  2 +-
 net/netfilter/xt_connlimit.c   |  8 +--
 net/netfilter/xt_conntrack.c   | 

[PATCH 1/2 nf] netfilter: nf_queue: don't re-enter same hook on packet reinjection

2016-10-13 Thread Pablo Neira Ayuso
Make sure we skip the current hook from where the packet was enqueued,
otherwise the packets gets enqueued over and over again.

Fixes: e3b37f11e6e4 ("netfilter: replace list_head with single linked list")
Signed-off-by: Pablo Neira Ayuso 
---
I managed to reproduce this with a simple test.

 # iptables -I OUTPUT -j QUEUE
 # cd libnetfilter_queue/utils/
 # ./nfqnl_test

Without my patch, netfilter munches packets that are reinjected.

@Aaron: Please, I'd appreciate if you can have a look to confirm this bug
and the fix. Thanks.

 net/netfilter/nf_queue.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/netfilter/nf_queue.c b/net/netfilter/nf_queue.c
index 96964a0070e1..221d7a5c2fec 100644
--- a/net/netfilter/nf_queue.c
+++ b/net/netfilter/nf_queue.c
@@ -184,6 +184,7 @@ void nf_reinject(struct nf_queue_entry *entry, unsigned int 
verdict)
verdict = NF_DROP;
}
 
+   hook_entry = rcu_dereference(hook_entry->next);
entry->state.thresh = INT_MIN;
 
if (verdict == NF_ACCEPT) {
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2 nf] netfilter: nft_range: validate operation netlink attribute

2016-10-13 Thread Pablo Neira Ayuso
Use nft_parse_u32_check() to make sure we don't get a value over the
unsigned 8-bit integer. Moreover, make sure this value doesn't go over
the two supported range comparison modes.

Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/nft_range.c | 16 +++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/net/netfilter/nft_range.c b/net/netfilter/nft_range.c
index c6d5358482d1..9bc4586c3006 100644
--- a/net/netfilter/nft_range.c
+++ b/net/netfilter/nft_range.c
@@ -59,6 +59,7 @@ static int nft_range_init(const struct nft_ctx *ctx, const 
struct nft_expr *expr
struct nft_range_expr *priv = nft_expr_priv(expr);
struct nft_data_desc desc_from, desc_to;
int err;
+   u32 op;
 
err = nft_data_init(NULL, >data_from, sizeof(priv->data_from),
_from, tb[NFTA_RANGE_FROM_DATA]);
@@ -80,7 +81,20 @@ static int nft_range_init(const struct nft_ctx *ctx, const 
struct nft_expr *expr
if (err < 0)
goto err2;
 
-   priv->op  = ntohl(nla_get_be32(tb[NFTA_RANGE_OP]));
+   err = nft_parse_u32_check(tb[NFTA_RANGE_OP], U8_MAX, );
+   if (err < 0)
+   goto err2;
+
+   switch (op) {
+   case NFT_RANGE_EQ:
+   case NFT_RANGE_NEQ:
+   break;
+   default:
+   err = -EINVAL;
+   goto err2;
+   }
+
+   priv->op  = op;
priv->len = desc_from.len;
return 0;
 err2:
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: slab corruption with current -git

2016-10-13 Thread Markus Trippelsdorf
On 2016.10.13 at 08:02 +0200, Markus Trippelsdorf wrote:
> On 2016.10.11 at 04:57 -0400, David Miller wrote:
> > From: Linus Torvalds 
> > Date: Mon, 10 Oct 2016 22:47:50 -0700
> > 
> > > On Mon, Oct 10, 2016 at 10:39 PM, Linus Torvalds
> > >  wrote:
> > >>
> > >> I guess I will have to double-check that the slub corruption is gone
> > >> still with that fixed.
> > > 
> > > So I'm not getting any warnings now from SLUB debugging. So the
> > > original bug seems to not have re-surfaced, and the registration bug
> > > is gone, so now the unregistration doesn't warn about anything either.
> > > 
> > > But I only rebooted three times.
> > 
> > Looks good to me, I applied it to my tree with your signoff and will
> > send you a pull request right now.
> 
> I'm still seeing:
> 
> nf_conntrack version 0.5.0 (4096 buckets, 16384 max)
> ctnetlink v0.93: registering with nfnetlink.
> ip_tables: (C) 2000-2006 Netfilter Core Team
> WARNING: kmemcheck: Caught 64-bit read from uninitialized memory 
> (88001e605480)
> 4055601e008890686d81
>  u u u u u u u u u u u u u u u u i i i i i i i i u u u u u u u u
>  ^
> RIP: 0010:[]  [] 
> nf_register_net_hook+0x51/0x160

This is nf_register_net_hook at net/netfilter/core.c:106

-- 
Markus
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH nft] src: use new range expression for != [a,b] intervals

2016-10-13 Thread Pablo Neira Ayuso
Use new range expression in the kernel to fix wrong bytecode generation.
This patch also adjust tests so we don't hit problems there.

Signed-off-by: Pablo Neira Ayuso 
---
 include/linux/netfilter/nf_tables.h| 29 +
 src/netlink_delinearize.c  | 45 +
 src/netlink_linearize.c| 46 --
 tests/py/any/ct.t.payload  |  6 ++---
 tests/py/any/meta.t.payload| 21 ++--
 tests/py/arp/arp.t.payload |  9 +++
 tests/py/arp/arp.t.payload.netdev  |  9 +++
 tests/py/inet/ah.t.payload.inet| 12 +++--
 tests/py/inet/ah.t.payload.ip  | 12 +++--
 tests/py/inet/ah.t.payload.ip6 | 12 +++--
 tests/py/inet/ah.t.payload.netdev  | 12 +++--
 tests/py/inet/comp.t.payload.inet  |  6 ++---
 tests/py/inet/comp.t.payload.ip|  6 ++---
 tests/py/inet/comp.t.payload.ip6   |  6 ++---
 tests/py/inet/comp.t.payload.netdev|  6 ++---
 tests/py/inet/dccp.t.payload.inet  |  3 +--
 tests/py/inet/dccp.t.payload.ip|  3 +--
 tests/py/inet/dccp.t.payload.ip6   |  3 +--
 tests/py/inet/dccp.t.payload.netdev|  3 +--
 tests/py/inet/esp.t.payload.inet   |  6 ++---
 tests/py/inet/esp.t.payload.ip |  6 ++---
 tests/py/inet/esp.t.payload.ip6|  6 ++---
 tests/py/inet/esp.t.payload.netdev |  6 ++---
 tests/py/inet/sctp.t.payload.inet  | 12 +++--
 tests/py/inet/sctp.t.payload.ip| 12 +++--
 tests/py/inet/sctp.t.payload.ip6   | 12 +++--
 tests/py/inet/sctp.t.payload.netdev| 12 +++--
 tests/py/inet/tcp.t.payload.inet   | 21 ++--
 tests/py/inet/tcp.t.payload.ip | 21 ++--
 tests/py/inet/tcp.t.payload.ip6| 21 ++--
 tests/py/inet/tcp.t.payload.netdev | 21 ++--
 tests/py/inet/udp.t.payload.inet   | 12 +++--
 tests/py/inet/udp.t.payload.ip | 12 +++--
 tests/py/inet/udp.t.payload.ip6| 12 +++--
 tests/py/inet/udp.t.payload.netdev | 12 +++--
 tests/py/inet/udplite.t.payload.inet   |  9 +++
 tests/py/inet/udplite.t.payload.ip |  9 +++
 tests/py/inet/udplite.t.payload.ip6|  9 +++
 tests/py/inet/udplite.t.payload.netdev |  9 +++
 tests/py/ip/dnat.t.payload.ip  |  6 ++---
 tests/py/ip/icmp.t.payload.ip  | 18 +
 tests/py/ip/ip.t.payload   | 24 ++
 tests/py/ip/ip.t.payload.inet  | 24 ++
 tests/py/ip/ip.t.payload.netdev| 24 ++
 tests/py/ip/snat.t.payload |  6 ++---
 tests/py/ip6/dst.t.payload.inet|  6 ++---
 tests/py/ip6/dst.t.payload.ip6 |  6 ++---
 tests/py/ip6/frag.t.payload.inet   |  9 +++
 tests/py/ip6/frag.t.payload.ip6|  9 +++
 tests/py/ip6/hbh.t.payload.inet|  6 ++---
 tests/py/ip6/hbh.t.payload.ip6 |  6 ++---
 tests/py/ip6/icmpv6.t.payload.ip6  |  9 +++
 tests/py/ip6/ip6.t.payload.inet| 12 +++--
 tests/py/ip6/ip6.t.payload.ip6 | 12 +++--
 tests/py/ip6/mh.t.payload.inet | 12 +++--
 tests/py/ip6/mh.t.payload.ip6  | 12 +++--
 tests/py/ip6/rt.t.payload.inet | 12 +++--
 tests/py/ip6/rt.t.payload.ip6  | 12 +++--
 58 files changed, 293 insertions(+), 421 deletions(-)

diff --git a/include/linux/netfilter/nf_tables.h 
b/include/linux/netfilter/nf_tables.h
index 1bec149b2200..b21a844cf5d5 100644
--- a/include/linux/netfilter/nf_tables.h
+++ b/include/linux/netfilter/nf_tables.h
@@ -546,6 +546,35 @@ enum nft_cmp_attributes {
 };
 #define NFTA_CMP_MAX   (__NFTA_CMP_MAX - 1)
 
+/**
+ * enum nft_range_ops - nf_tables range operator
+ *
+ * @NFT_RANGE_EQ: equal
+ * @NFT_RANGE_NEQ: not equal
+ */
+enum nft_range_ops {
+   NFT_RANGE_EQ,
+   NFT_RANGE_NEQ,
+};
+
+/**
+ * enum nft_range_attributes - nf_tables range expression netlink attributes
+ *
+ * @NFTA_RANGE_SREG: source register of data to compare (NLA_U32: 
nft_registers)
+ * @NFTA_RANGE_OP: cmp operation (NLA_U32: nft_cmp_ops)
+ * @NFTA_RANGE_FROM_DATA: data range from (NLA_NESTED: nft_data_attributes)
+ * @NFTA_RANGE_TO_DATA: data range to (NLA_NESTED: nft_data_attributes)
+ */
+enum nft_range_attributes {
+   NFTA_RANGE_UNSPEC,
+   NFTA_RANGE_SREG,
+   NFTA_RANGE_OP,
+   NFTA_RANGE_FROM_DATA,
+   NFTA_RANGE_TO_DATA,
+   __NFTA_RANGE_MAX
+};
+#define NFTA_RANGE_MAX (__NFTA_RANGE_MAX - 1)
+
 enum nft_lookup_flags {
NFT_LOOKUP_F_INV = (1 << 0),
 };
diff --git a/src/netlink_delinearize.c b/src/netlink_delinearize.c
index 6bb27b6fa2c8..d8d1d7d7aaa7 100644
--- a/src/netlink_delinearize.c
+++ b/src/netlink_delinearize.c
@@ -186,6 +186,46 @@ static void netlink_parse_immediate(struct 
netlink_parse_ctx *ctx,
netlink_set_register(ctx, dreg, expr);
 }
 
+static 

[PATCH libnftnl] src: add range expression

2016-10-13 Thread Pablo Neira Ayuso
Add range expression available that is scheduled for linux kernel 4.9.
This range expression allows us to check if a given value placed in a
register is within/outside a specified interval.

Signed-off-by: Pablo Neira Ayuso 
---
 include/libnftnl/expr.h |   7 +
 include/linux/netfilter/nf_tables.h |  29 
 src/Makefile.am |   1 +
 src/expr/range.c| 288 
 src/expr_ops.c  |   2 +
 tests/Makefile.am   |   4 +
 tests/nft-expr_range-test.c | 109 ++
 tests/test-script.sh|   1 +
 8 files changed, 441 insertions(+)
 create mode 100644 src/expr/range.c
 create mode 100644 tests/nft-expr_range-test.c

diff --git a/include/libnftnl/expr.h b/include/libnftnl/expr.h
index 4ce2592b1b50..edf86a966fc0 100644
--- a/include/libnftnl/expr.h
+++ b/include/libnftnl/expr.h
@@ -70,6 +70,13 @@ enum {
 };
 
 enum {
+   NFTNL_EXPR_RANGE_SREG   = NFTNL_EXPR_BASE,
+   NFTNL_EXPR_RANGE_OP,
+   NFTNL_EXPR_RANGE_FROM_DATA,
+   NFTNL_EXPR_RANGE_TO_DATA,
+};
+
+enum {
NFTNL_EXPR_IMM_DREG = NFTNL_EXPR_BASE,
NFTNL_EXPR_IMM_DATA,
NFTNL_EXPR_IMM_VERDICT,
diff --git a/include/linux/netfilter/nf_tables.h 
b/include/linux/netfilter/nf_tables.h
index 681082e68130..30e3b21418c5 100644
--- a/include/linux/netfilter/nf_tables.h
+++ b/include/linux/netfilter/nf_tables.h
@@ -546,6 +546,35 @@ enum nft_cmp_attributes {
 };
 #define NFTA_CMP_MAX   (__NFTA_CMP_MAX - 1)
 
+/**
+ * enum nft_range_ops - nf_tables range operator
+ *
+ * @NFT_RANGE_EQ: equal
+ * @NFT_RANGE_NEQ: not equal
+ */
+enum nft_range_ops {
+   NFT_RANGE_EQ,
+   NFT_RANGE_NEQ,
+};
+
+/**
+ * enum nft_range_attributes - nf_tables range expression netlink attributes
+ *
+ * @NFTA_RANGE_SREG: source register of data to compare (NLA_U32: 
nft_registers)
+ * @NFTA_RANGE_OP: cmp operation (NLA_U32: nft_cmp_ops)
+ * @NFTA_RANGE_FROM_DATA: data range from (NLA_NESTED: nft_data_attributes)
+ * @NFTA_RANGE_TO_DATA: data range to (NLA_NESTED: nft_data_attributes)
+ */
+enum nft_range_attributes {
+   NFTA_RANGE_UNSPEC,
+   NFTA_RANGE_SREG,
+   NFTA_RANGE_OP,
+   NFTA_RANGE_FROM_DATA,
+   NFTA_RANGE_TO_DATA,
+   __NFTA_RANGE_MAX
+};
+#define NFTA_RANGE_MAX (__NFTA_RANGE_MAX - 1)
+
 enum nft_lookup_flags {
NFT_LOOKUP_F_INV = (1 << 0),
 };
diff --git a/src/Makefile.am b/src/Makefile.am
index 4ab8fcaff5ce..eac7a5678009 100644
--- a/src/Makefile.am
+++ b/src/Makefile.am
@@ -24,6 +24,7 @@ libnftnl_la_SOURCES = utils.c \
  expr/bitwise.c\
  expr/byteorder.c  \
  expr/cmp.c\
+ expr/range.c  \
  expr/counter.c\
  expr/ct.c \
  expr/data_reg.c   \
diff --git a/src/expr/range.c b/src/expr/range.c
new file mode 100644
index ..1489d5849451
--- /dev/null
+++ b/src/expr/range.c
@@ -0,0 +1,288 @@
+/*
+ * (C) 2016 by Pablo Neira Ayuso 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published
+ * by the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include "internal.h"
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+
+struct nftnl_expr_range {
+   union nftnl_data_regdata_from;
+   union nftnl_data_regdata_to;
+   enum nft_registers  sreg;
+   enum nft_range_ops  op;
+};
+
+static int nftnl_expr_range_set(struct nftnl_expr *e, uint16_t type,
+   const void *data, uint32_t data_len)
+{
+   struct nftnl_expr_range *range = nftnl_expr_data(e);
+
+   switch(type) {
+   case NFTNL_EXPR_RANGE_SREG:
+   range->sreg = *((uint32_t *)data);
+   break;
+   case NFTNL_EXPR_RANGE_OP:
+   range->op = *((uint32_t *)data);
+   break;
+   case NFTNL_EXPR_RANGE_FROM_DATA:
+   memcpy(>data_from.val, data, data_len);
+   range->data_from.len = data_len;
+   break;
+   case NFTNL_EXPR_RANGE_TO_DATA:
+   memcpy(>data_to.val, data, data_len);
+   range->data_to.len = data_len;
+   break;
+   default:
+   return -1;
+   }
+   return 0;
+}
+
+static const void *nftnl_expr_range_get(const struct nftnl_expr *e,
+   uint16_t type, uint32_t *data_len)
+{
+   struct nftnl_expr_range *range = nftnl_expr_data(e);
+
+   switch(type) {
+   case NFTNL_EXPR_RANGE_SREG:
+   *data_len = sizeof(range->sreg);
+   return >sreg;
+   case NFTNL_EXPR_RANGE_OP:
+

Re: slab corruption with current -git

2016-10-13 Thread Markus Trippelsdorf
On 2016.10.12 at 23:18 -0700, Linus Torvalds wrote:
> On Oct 12, 2016 23:07, "Markus Trippelsdorf"  wrote:
> >
> > This is nf_register_net_hook at net/netfilter/core.c:106
>
> The "*regs" access?

Yeah.

105 entry->orig_ops = reg;
106 entry->ops  = *reg;
107 entry->next = NULL;

--
Markus
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: slab corruption with current -git

2016-10-13 Thread Markus Trippelsdorf
On 2016.10.11 at 04:57 -0400, David Miller wrote:
> From: Linus Torvalds 
> Date: Mon, 10 Oct 2016 22:47:50 -0700
> 
> > On Mon, Oct 10, 2016 at 10:39 PM, Linus Torvalds
> >  wrote:
> >>
> >> I guess I will have to double-check that the slub corruption is gone
> >> still with that fixed.
> > 
> > So I'm not getting any warnings now from SLUB debugging. So the
> > original bug seems to not have re-surfaced, and the registration bug
> > is gone, so now the unregistration doesn't warn about anything either.
> > 
> > But I only rebooted three times.
> 
> Looks good to me, I applied it to my tree with your signoff and will
> send you a pull request right now.

I'm still seeing:

nf_conntrack version 0.5.0 (4096 buckets, 16384 max)
ctnetlink v0.93: registering with nfnetlink.
ip_tables: (C) 2000-2006 Netfilter Core Team
WARNING: kmemcheck: Caught 64-bit read from uninitialized memory 
(88001e605480)
4055601e008890686d81
 u u u u u u u u u u u u u u u u i i i i i i i i u u u u u u u u
 ^
RIP: 0010:[]  [] 
nf_register_net_hook+0x51/0x160
RSP: 0018:c900bcc0  EFLAGS: 00010286
RAX: 88001e5af9c0 RBX: 88001e605480 RCX: 
RDX:  RSI: 0001 RDI: 88001e5b0a20
RBP: c900bcd8 R08: 1fd0e000 R09: 
R10: 88001e5b09c0 R11: 0067 R12: 88001e5af9c0
R13: 81c5c0c8 R14: 0003 R15: 88001e605480
FS:  () GS:88001fa0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 88001f45ca18 CR3: 01c07000 CR4: 06f0
 [] nf_register_net_hook+0x51/0x160
 [] nf_register_net_hooks+0x3f/0xa0
 [] ipt_register_table+0xe5/0x110
 [] iptable_filter_table_init.part.1+0x55/0x80
 [] iptable_filter_net_init+0x2b/0x30
 [] ops_init+0x47/0x150
 [] register_pernet_operations+0xd6/0x170
 [] register_pernet_subsys+0x27/0x40
 [] iptable_filter_init+0x33/0x4b
 [] do_one_initcall+0x8b/0x113
 [] kernel_init_freeable+0x119/0x1a1
 [] kernel_init+0x9/0x100
 [] ret_from_fork+0x22/0x30
 [] 0x
NET: Registered protocol family 17
9pnet: Installing 9P2000 support


-- 
Markus
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html