Re: [PATCH 00/28] Reenable maybe-uninitialized warnings

2016-10-17 Thread Christoph Hellwig
On Tue, Oct 18, 2016 at 12:03:28AM +0200, Arnd Bergmann wrote:
> This is a set of patches that I hope to get into v4.9 in some form
> in order to turn on the -Wmaybe-uninitialized warnings again.

Hi Arnd,

I jsut complained to Geert that I was introducing way to many
bugs or pointless warnings for some compilers lately, but gcc didn't
warn me about them.  From a little research the lack of
-Wmaybe-uninitialized seems to be the reason for it, so I'm all
for re-enabling it.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH nftables] Fix register allocation for EXPR_SET_ELEM

2016-10-17 Thread Anders K. Pedersen | Cohaesio
From: Anders K. Pedersen 

I noticed that while

 # nft add rule ip6 filter postrouting \
flow table acct_out \{ meta iif . ip6 saddr timeout 600s counter \}

works, the opposite order for the concatenated expressions fails:

 # nft add rule ip6 filter postrouting \
flow table acct_out \{ ip6 saddr . meta iif timeout 600s counter \}
 nft: netlink_linearize.c:634: netlink_gen_expr: Assertion `dreg < 
ctx->reg_low' failed.

I traced this down to get_register() and release_register(), where the
EXPR_CONCAT handling isn't hit, when it's embedded in EXPR_SET_ELEM, and
fixed it similarly to how EXPR_SET_ELEM is handled in netlink_gen_expr().

Signed-off-by: Anders K. Pedersen 
---
 src/netlink_linearize.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/src/netlink_linearize.c b/src/netlink_linearize.c
--- a/src/netlink_linearize.c
+++ b/src/netlink_linearize.c
@@ -73,6 +73,9 @@ static void __release_register(struct netlink_linearize_ctx 
*ctx,
 static enum nft_registers get_register(struct netlink_linearize_ctx *ctx,
   const struct expr *expr)
 {
+   if (expr && expr->ops->type == EXPR_SET_ELEM)
+   return get_register(ctx, expr->key);
+
if (expr && expr->ops->type == EXPR_CONCAT)
return __get_register(ctx, expr->len);
else
@@ -82,6 +85,9 @@ static enum nft_registers get_register(struct 
netlink_linearize_ctx *ctx,
 static void release_register(struct netlink_linearize_ctx *ctx,
 const struct expr *expr)
 {
+   if (expr && expr->ops->type == EXPR_SET_ELEM)
+   return release_register(ctx, expr->key);
+
if (expr && expr->ops->type == EXPR_CONCAT)
__release_register(ctx, expr->len);
else


[PATCH 28/28] Kbuild: bring back -Wmaybe-uninitialized warning

2016-10-17 Thread Arnd Bergmann
Traditionally, we have always had warnings about uninitialized variables
enabled, as this is part of -Wall, and generally a good idea [1], but it
also always produced false positives, mainly because this is a variation
of the halting problem and provably impossible to get right in all cases
[2].

Various people have identified cases that are particularly bad for false
positives, and in commit e74fc973b6e5 ("Turn off -Wmaybe-uninitialized
when building with -Os"), I turned off the warning for any build that
was done with CC_OPTIMIZE_FOR_SIZE.  This drastically reduced the number
of false positive warnings in the default build but unfortunately had
the side effect of turning the warning off completely in 'allmodconfig'
builds, which in turn led to a lot of warnings (both actual bugs, and
remaining false positives) to go in unnoticed.

With commit 877417e6ffb9 ("Kbuild: change CC_OPTIMIZE_FOR_SIZE
definition") enabled the warning again for allmodconfig builds in v4.7
and in v4.8-rc1, I had finally managed to address all warnings I get in
an ARM allmodconfig build and most other maybe-uninitialized warnings
for ARM randconfig builds.

However, commit 6e8d666e9253 ("Disable "maybe-uninitialized" warning
globally") was merged at the same time and disabled it completely for
all configurations, because of false-positive warnings on x86 that
I had not addressed until then. This caused a lot of actual bugs to
get merged into mainline, and I sent several dozen patches for these
during the v4.9 development cycle. Most of these are actual bugs,
some are for correct code that is safe because it is only called
under external constraints that make it impossible to run into
the case that gcc sees, and in a few cases gcc is just stupid and
finds something that can obviously never happen.

I have now done a few thousand randconfig builds on x86 and collected
all patches that I needed to address every single warning I got
(I can provide the combined patch for the other warnings if anyone
is interested), so I hope we can get the warning back and let people
catch the actual bugs earlier.

Note that the majority of the patches I created are for the third kind
of problem (stupid false-positives), for one of two reasons:
- some of them only get triggered in certain combinations of config
  options, so we don't always run into them, and
- the actual bugs tend to get addressed much quicker as they also
  lead to incorrect runtime behavior.

These 27 patches address the warnings that either occur in one of the more
common configurations (defconfig, allmodconfig, or something built by the
kbuild robot or kernelci.org), or they are about a real bug. It would be
good to get these all into v4.9 if we want to turn on the warning again.
I have tested these extensively with gcc-4.9 and gcc-6 and done a bit
of testing with gcc-5, and all of these should now be fine. gcc-4.8
is much worse about the false-positive warnings and is also fairly old
now, so I'm leaving the warning disabled with that version. gcc-4.7 and
older don't understand the -Wno-maybe-uninitialized option and are not
affected by this patch either way.

I have another (smaller) series of patches for warnings that are both
harmless and not as easy to trigger, and I will send them for inclusion
in v4.10.

Link: https://rusty.ozlabs.org/?p=232 [1]
Link: https://gcc.gnu.org/wiki/Better_Uninitialized_Warnings [2]
Signed-off-by: Arnd Bergmann 
---
 Makefile   | 10 ++
 arch/arc/Makefile  |  4 +++-
 scripts/Makefile.ubsan |  4 
 3 files changed, 13 insertions(+), 5 deletions(-)

Cc: x...@kernel.org
Cc: linux-me...@vger.kernel.org
Cc: Mauro Carvalho Chehab 
Cc: Martin Schwidefsky 
Cc: linux-s...@vger.kernel.org
Cc: Ilya Dryomov 
Cc: dri-de...@lists.freedesktop.org
Cc: linux-...@lists.infradead.org
Cc: Herbert Xu 
Cc: linux-cry...@vger.kernel.org
Cc: "David S. Miller" 
Cc: net...@vger.kernel.org
Cc: Greg Kroah-Hartman 
Cc: ceph-de...@vger.kernel.org
Cc: linux-f2fs-de...@lists.sourceforge.net
Cc: linux-e...@vger.kernel.org
Cc: netfilter-devel@vger.kernel.org

diff --git a/Makefile b/Makefile
index 512e47a..43cd3d9 100644
--- a/Makefile
+++ b/Makefile
@@ -370,7 +370,7 @@ LDFLAGS_MODULE  =
 CFLAGS_KERNEL  =
 AFLAGS_KERNEL  =
 LDFLAGS_vmlinux =
-CFLAGS_GCOV= -fprofile-arcs -ftest-coverage -fno-tree-loop-im
+CFLAGS_GCOV= -fprofile-arcs -ftest-coverage -fno-tree-loop-im  
-Wno-maybe-uninitialized
 CFLAGS_KCOV:= $(call cc-option,-fsanitize-coverage=trace-pc,)
 
 
@@ -620,7 +620,6 @@ ARCH_CFLAGS :=
 include arch/$(SRCARCH)/Makefile
 
 KBUILD_CFLAGS  += $(call cc-option,-fno-delete-null-pointer-checks,)
-KBUILD_CFLAGS  += $(call cc-disable-warning,maybe-uninitialized,)
 KBUILD_CFLAGS  += $(call cc-disable-warning,frame-address,)
 
 ifdef CONFIG_LD_DEAD_CODE_DATA_ELIMINATION
@@ -629,15 +628,18 @@ 

[PATCH 01/28] [v2] netfilter: nf_tables: avoid uninitialized variable warning

2016-10-17 Thread Arnd Bergmann
The newly added nft_range_eval() function handles the two possible
nft range operations, but as the compiler warning points out,
any unexpected value would lead to the 'mismatch' variable being
used without being initialized:

net/netfilter/nft_range.c: In function 'nft_range_eval':
net/netfilter/nft_range.c:45:5: error: 'mismatch' may be used uninitialized in 
this function [-Werror=maybe-uninitialized]

This removes the variable in question and instead moves the
condition into the switch itself, which is potentially more
efficient than adding a bogus 'default' clause as in my
first approach, and is nicer than using the 'uninitialized_var'
macro.

Fixes: 0f3cd9b36977 ("netfilter: nf_tables: add range expression")
Link: http://patchwork.ozlabs.org/patch/677114/
Signed-off-by: Arnd Bergmann 
---
 net/netfilter/nft_range.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

Cc: Pablo Neira Ayuso 

diff --git a/net/netfilter/nft_range.c b/net/netfilter/nft_range.c
index c6d5358..2dd80f4 100644
--- a/net/netfilter/nft_range.c
+++ b/net/netfilter/nft_range.c
@@ -28,22 +28,20 @@ static void nft_range_eval(const struct nft_expr *expr,
 const struct nft_pktinfo *pkt)
 {
const struct nft_range_expr *priv = nft_expr_priv(expr);
-   bool mismatch;
int d1, d2;
 
d1 = memcmp(>data[priv->sreg], >data_from, priv->len);
d2 = memcmp(>data[priv->sreg], >data_to, priv->len);
switch (priv->op) {
case NFT_RANGE_EQ:
-   mismatch = (d1 < 0 || d2 > 0);
+   if (d1 < 0 || d2 > 0)
+   regs->verdict.code = NFT_BREAK;
break;
case NFT_RANGE_NEQ:
-   mismatch = (d1 >= 0 && d2 <= 0);
+   if (d1 >= 0 && d2 <= 0)
+   regs->verdict.code = NFT_BREAK;
break;
}
-
-   if (mismatch)
-   regs->verdict.code = NFT_BREAK;
 }
 
 static const struct nla_policy nft_range_policy[NFTA_RANGE_MAX + 1] = {
-- 
2.9.0

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 00/28] Reenable maybe-uninitialized warnings

2016-10-17 Thread Arnd Bergmann
This is a set of patches that I hope to get into v4.9 in some form
in order to turn on the -Wmaybe-uninitialized warnings again.

After talking to Linus in person at Linaro Connect about this, I
spent some time on finding all the remaining warnings, and this
is the resulting patch series. More details are in the description
of the last patch that actually enables the warning.

Let me know if there are other warnings that I missed, and whether
you think these are still appropriate for v4.9 or not.
A couple of patches are non-obvious, and could use some more
detailed review.

Arnd

Arnd Bergmann (28):
  [v2] netfilter: nf_tables: avoid uninitialized variable warning
  [v2] mtd: mtk: avoid warning in mtk_ecc_encode
  [v2] infiniband: shut up a maybe-uninitialized warning
  f2fs: replace a build-time warning with runtime WARN_ON
  ext2: avoid bogus -Wmaybe-uninitialized warning
  NFSv4.1: work around -Wmaybe-uninitialized warning
  ceph: avoid false positive maybe-uninitialized warning
  staging: lustre: restore initialization of return code
  staging: lustre: remove broken dead code in
cfs_cpt_table_create_pattern
  UBI: fix uninitialized access of vid_hdr pointer
  block: rdb: false-postive gcc-4.9 -Wmaybe-uninitialized
  [media] rc: print correct variable for z8f0811
  [media] dib0700: fix uninitialized data on 'repeat' event
  iio: accel: sca3000_core: avoid potentially uninitialized variable
  crypto: aesni: avoid -Wmaybe-uninitialized warning
  pcmcia: fix return value of soc_pcmcia_regulator_set
  spi: fsl-espi: avoid processing uninitalized data on error
  drm: avoid uninitialized timestamp use in wait_vblank
  brcmfmac: avoid maybe-uninitialized warning in brcmf_cfg80211_start_ap
  net: bcm63xx: avoid referencing uninitialized variable
  net/hyperv: avoid uninitialized variable
  x86: apm: avoid uninitialized data
  x86: mark target address as output in 'insb' asm
  x86: math-emu: possible uninitialized variable use
  s390: pci: don't print uninitialized data for debugging
  nios2: fix timer initcall return value
  rocker: fix maybe-uninitialized warning
  Kbuild: bring back -Wmaybe-uninitialized warning

 Makefile   |  10 +-
 arch/arc/Makefile  |   4 +-
 arch/nios2/kernel/time.c   |   1 +
 arch/s390/pci/pci_dma.c|   2 +-
 arch/x86/crypto/aesni-intel_glue.c | 121 +
 arch/x86/include/asm/io.h  |   4 +-
 arch/x86/kernel/apm_32.c   |   5 +-
 arch/x86/math-emu/Makefile |   4 +-
 arch/x86/math-emu/reg_compare.c|  16 +--
 drivers/block/rbd.c|   1 +
 drivers/gpu/drm/drm_irq.c  |   4 +-
 drivers/infiniband/core/cma.c  |  56 +-
 drivers/media/i2c/ir-kbd-i2c.c |   2 +-
 drivers/media/usb/dvb-usb/dib0700_core.c   |  10 +-
 drivers/mtd/nand/mtk_ecc.c |  19 ++--
 drivers/mtd/ubi/eba.c  |   2 +-
 drivers/net/ethernet/broadcom/bcm63xx_enet.c   |   3 +-
 drivers/net/ethernet/rocker/rocker_ofdpa.c |   4 +-
 drivers/net/hyperv/netvsc_drv.c|   2 +-
 .../broadcom/brcm80211/brcmfmac/cfg80211.c |   2 +-
 drivers/pcmcia/soc_common.c|   2 +-
 drivers/spi/spi-fsl-espi.c |   2 +-
 drivers/staging/iio/accel/sca3000_core.c   |   2 +
 .../staging/lustre/lnet/libcfs/linux/linux-cpu.c   |   7 --
 drivers/staging/lustre/lustre/lov/lov_pack.c   |   2 +
 fs/ceph/super.c|   3 +-
 fs/ext2/inode.c|   7 +-
 fs/f2fs/data.c |   7 ++
 fs/nfs/nfs4session.c   |  10 +-
 net/netfilter/nft_range.c  |  10 +-
 scripts/Makefile.ubsan |   4 +
 31 files changed, 187 insertions(+), 141 deletions(-)

-- 
Cc: x...@kernel.org
Cc: linux-me...@vger.kernel.org
Cc: Mauro Carvalho Chehab 
Cc: Martin Schwidefsky 
Cc: linux-s...@vger.kernel.org
Cc: Ilya Dryomov 
Cc: dri-de...@lists.freedesktop.org
Cc: linux-...@lists.infradead.org
Cc: Herbert Xu 
Cc: linux-cry...@vger.kernel.org
Cc: "David S. Miller" 
Cc: net...@vger.kernel.org
Cc: Greg Kroah-Hartman 
Cc: ceph-de...@vger.kernel.org
Cc: linux-f2fs-de...@lists.sourceforge.net
Cc: linux-e...@vger.kernel.org
Cc: netfilter-devel@vger.kernel.org
2.9.0

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH nf] netfilter: x_tables: suppress kmemcheck warning

2016-10-17 Thread Florian Westphal
Markus Trippelsdorf reports:

WARNING: kmemcheck: Caught 64-bit read from uninitialized memory 
(88001e605480)
4055601e008890686d81
 u u u u u u u u u u u u u u u u i i i i i i i i u u u u u u u u
 ^
|RIP: 0010:[]  [] 
nf_register_net_hook+0x51/0x160
[..]
 [] nf_register_net_hook+0x51/0x160
 [] nf_register_net_hooks+0x3f/0xa0
 [] ipt_register_table+0xe5/0x110
[..]

This warning is harmless; we copy 'uninitialized' data from the hook ops
but it will not be used.
Long term the structures keeping run-time data should be disentangled
from those only containing config-time data (such as where in the list
to insert a hook), but thats -next material.

Reported-by: Markus Trippelsdorf 
Suggested-by: Al Viro 
Signed-off-by: Florian Westphal 
---
 net/netfilter/x_tables.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
index e0aa7c1d0224..fc4977456c30 100644
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -1513,7 +1513,7 @@ xt_hook_ops_alloc(const struct xt_table *table, nf_hookfn 
*fn)
if (!num_hooks)
return ERR_PTR(-EINVAL);
 
-   ops = kmalloc(sizeof(*ops) * num_hooks, GFP_KERNEL);
+   ops = kcalloc(num_hooks, sizeof(*ops), GFP_KERNEL);
if (ops == NULL)
return ERR_PTR(-ENOMEM);
 
-- 
2.7.3

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH nf,v2] netfilter: nf_queue: don't re-enter same hook on packet reinjection

2016-10-17 Thread Aaron Conole
Pablo Neira Ayuso  writes:

> On Mon, Oct 17, 2016 at 11:23:01AM -0400, Aaron Conole wrote:
>> Pablo Neira Ayuso  writes:
>>
>> > Make sure we skip the current hook from where the packet was enqueued,
>> > otherwise the packets gets enqueued over and over again.
>> >
>> > Fixes: e3b37f11e6e4 ("netfilter: replace list_head with single linked 
>> > list")
>> > Signed-off-by: Pablo Neira Ayuso 
>> > ---
>> > v2: Make sure next hook is non-null, otherwise we are at the end of the
>> >hook list and we can skip nf_iterate().
>> >
>> >  net/netfilter/nf_queue.c | 3 ++-
>> >  1 file changed, 2 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/net/netfilter/nf_queue.c b/net/netfilter/nf_queue.c
>> > index 96964a0070e1..691e713d70f5 100644
>> > --- a/net/netfilter/nf_queue.c
>> > +++ b/net/netfilter/nf_queue.c
>> > @@ -185,8 +185,9 @@ void nf_reinject(struct nf_queue_entry *entry, 
>> > unsigned int verdict)
>> >}
>> >
>> >entry->state.thresh = INT_MIN;
>> > +  hook_entry = rcu_dereference(hook_entry->next);
>> >
>> > -  if (verdict == NF_ACCEPT) {
>> > +  if (hook_entry && verdict == NF_ACCEPT) {
>> >next_hook:
>> >verdict = nf_iterate(skb, >state, _entry);
>> >}
>>
>> ACK. I thought switch case below could have a problem, but re-checked
>> the first nf_queue leg, and it seems okay.
>
> Argh, still not right. If we get a NF_QUEUE verdict to re-enqueue
> again, then hook_entry may become NULL.
>
>   switch (verdict & NF_VERDICT_MASK) {
>   case NF_ACCEPT:
>   case NF_STOP:
>   local_bh_disable();
>   entry->state.okfn(entry->state.net, entry->state.sk, skb);
>   local_bh_enable();
>   break;
>   case NF_QUEUE:
>   RCU_INIT_POINTER(entry->state.hook_entries, hook_entry); <--
>
> Attaching new patch.
>
> From c1a731c68791bcd504a7fe5d28f5f0fd59d66118 Mon Sep 17 00:00:00 2001
> From: Pablo Neira Ayuso 
> Date: Thu, 13 Oct 2016 08:14:03 +0200
> Subject: [PATCH nf,v3] netfilter: nf_queue: don't re-enter same hook on packet
>  reinjection
>
> If the packet is accepted, we have to skip the current hook from where
> the packet was enqueued. Thus, we can emulate the previous
> list_for_each_entry_continue() behaviour happening from nf_reinject(),
> otherwise the packets gets enqueued over and over again.
>
> Fixes: e3b37f11e6e4 ("netfilter: replace list_head with single linked list")
> Signed-off-by: Pablo Neira Ayuso 
> ---
>  net/netfilter/nf_queue.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/net/netfilter/nf_queue.c b/net/netfilter/nf_queue.c
> index 96964a0070e1..0b5ac3c9c2bc 100644
> --- a/net/netfilter/nf_queue.c
> +++ b/net/netfilter/nf_queue.c
> @@ -187,8 +187,10 @@ void nf_reinject(struct nf_queue_entry *entry, unsigned 
> int verdict)
>   entry->state.thresh = INT_MIN;
>  
>   if (verdict == NF_ACCEPT) {
> - next_hook:
> - verdict = nf_iterate(skb, >state, _entry);
> + hook_entry = rcu_dereference(hook_entry->next);
> + if (hook_entry)
> +next_hook:

Should the above two lines be transposed to this?

 next_hook:
if (hook_entry)

Sorry if I'm misunderstanding it.  Too many special cases for my tiny
brain...

-Aaron
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH nf-next 0/2] netfilter: autoload NAT support for non-builtin L4 protocols

2016-10-17 Thread Pablo Neira Ayuso
On Thu, Oct 06, 2016 at 07:09:27PM +0200, Davide Caratti wrote:
> this series fixes SNAT/DNAT rules where port number translation is
> explicitly configured, but only the L3 address is translated:
> 
> # iptables -t nat -A POSTROUTING -o eth1 -p stcp -j SNAT --to-source 
> 10.0.0.1:61000
> # tcpdump -s46 -tni eth1 sctp
> tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
> listening on eth1, link-type EN10MB (Ethernet), capture size 46 bytes
> IP 10.0.0.1.37788 > 10.0.0.2.2000: sctp
> ^
> IP 10.0.0.2.2000 > 10.0.0.1.37788: sctp
> IP 10.0.0.1.37788 > 10.0.0.2.2000: sctp
> IP 10.0.0.2.2000 > 10.0.0.1.37788: sctp
> IP 10.0.0.2.2000 > 10.0.0.1.37788: sctp
> IP 10.0.0.1.37788 > 10.0.0.2.2000: sctp
> IP 10.0.0.2.2000 > 10.0.0.1.37788: sctp
> 
> This happens for all protocols that don't have L4 NAT support built into
> nf_nat.ko, such as DCCP, SCTP and UDPLite: unless the user modprobes
> nf_nat_proto_{dccp,sctp,udplite}.ko, port translation as specified in the
> above rule will not be done. 
> The first patch provides persistent and generic aliases for the above
> modules; the second patch autoloads nf_nat_proto_{dccp,sctp,udplite} when a
> SNAT/DNAT rule matching one of the above protocols is created.

I would really like to see DCCP, SCTP and UDPlite built-in, just like
other protocol trackers (TCP, UDP...). This may require a bit of
review work on your/our side, but it would greatly appreciated.

We discussed this during the last Netfilter Workshop, the current
situation is not good, we're in some way responsible for breaking the
deployment of new protocols on the Internet.

Many vendors rely on default configurations, not even looking into
modprobing things, so these protocols are hopeless in the current
situation since routers running Netfilter will likely not supported
them. This is worse since nf_conntrack drops packets for protocols
like SCTP and DCCP since the generic protocol can no longer be used.

Once these protocols are supported built-in, users can configure from
our control plane, ie. iptables/nft, if they explicitly don't want to
allow them by dropping protocols of this kind. But in that case we
would not be responsible anymore for the current situation at least.

Moreover, following this approach, we would also avoid the new
attribute in nft_nat to indicate the layer 4 protocol that you have
mentioned already.

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH nf,v2] netfilter: nf_queue: don't re-enter same hook on packet reinjection

2016-10-17 Thread Pablo Neira Ayuso
On Mon, Oct 17, 2016 at 11:23:01AM -0400, Aaron Conole wrote:
> Pablo Neira Ayuso  writes:
>
> > Make sure we skip the current hook from where the packet was enqueued,
> > otherwise the packets gets enqueued over and over again.
> >
> > Fixes: e3b37f11e6e4 ("netfilter: replace list_head with single linked list")
> > Signed-off-by: Pablo Neira Ayuso 
> > ---
> > v2: Make sure next hook is non-null, otherwise we are at the end of the
> > hook list and we can skip nf_iterate().
> >
> >  net/netfilter/nf_queue.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/net/netfilter/nf_queue.c b/net/netfilter/nf_queue.c
> > index 96964a0070e1..691e713d70f5 100644
> > --- a/net/netfilter/nf_queue.c
> > +++ b/net/netfilter/nf_queue.c
> > @@ -185,8 +185,9 @@ void nf_reinject(struct nf_queue_entry *entry, unsigned 
> > int verdict)
> > }
> >
> > entry->state.thresh = INT_MIN;
> > +   hook_entry = rcu_dereference(hook_entry->next);
> >
> > -   if (verdict == NF_ACCEPT) {
> > +   if (hook_entry && verdict == NF_ACCEPT) {
> > next_hook:
> > verdict = nf_iterate(skb, >state, _entry);
> > }
>
> ACK.  I thought switch case below could have a problem, but re-checked
> the first nf_queue leg, and it seems okay.

Argh, still not right. If we get a NF_QUEUE verdict to re-enqueue
again, then hook_entry may become NULL.

switch (verdict & NF_VERDICT_MASK) {
case NF_ACCEPT:
case NF_STOP:
local_bh_disable();
entry->state.okfn(entry->state.net, entry->state.sk, skb);
local_bh_enable();
break;
case NF_QUEUE:
RCU_INIT_POINTER(entry->state.hook_entries, hook_entry); <--

Attaching new patch.
>From c1a731c68791bcd504a7fe5d28f5f0fd59d66118 Mon Sep 17 00:00:00 2001
From: Pablo Neira Ayuso 
Date: Thu, 13 Oct 2016 08:14:03 +0200
Subject: [PATCH nf,v3] netfilter: nf_queue: don't re-enter same hook on packet
 reinjection

If the packet is accepted, we have to skip the current hook from where
the packet was enqueued. Thus, we can emulate the previous
list_for_each_entry_continue() behaviour happening from nf_reinject(),
otherwise the packets gets enqueued over and over again.

Fixes: e3b37f11e6e4 ("netfilter: replace list_head with single linked list")
Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/nf_queue.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/netfilter/nf_queue.c b/net/netfilter/nf_queue.c
index 96964a0070e1..0b5ac3c9c2bc 100644
--- a/net/netfilter/nf_queue.c
+++ b/net/netfilter/nf_queue.c
@@ -187,8 +187,10 @@ void nf_reinject(struct nf_queue_entry *entry, unsigned int verdict)
 	entry->state.thresh = INT_MIN;
 
 	if (verdict == NF_ACCEPT) {
-	next_hook:
-		verdict = nf_iterate(skb, >state, _entry);
+		hook_entry = rcu_dereference(hook_entry->next);
+		if (hook_entry)
+next_hook:
+			verdict = nf_iterate(skb, >state, _entry);
 	}
 
 	switch (verdict & NF_VERDICT_MASK) {
-- 
2.1.4



Re: [PATCH 00/10, nf-next] Netfilter core updates

2016-10-17 Thread Pablo Neira Ayuso
On Mon, Oct 17, 2016 at 09:52:14AM -0400, Aaron Conole wrote:
> Florian Westphal  writes:
> 
> > Pablo Neira Ayuso  wrote:
> >> Let me know if you have any comment, otherwise I'll place this in the
> >> nf-next tree so we can follow up working on top of these.
> >
> > Please do, thanks!
> 
> +1.  Some of this work was in my back burner, so thanks Pablo :)

Thanks. I still need that the fix for nf_queue propagates to David's
net tree. Will request him to pull net into net-next. May take a
little while.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH nft] src: support ct l3proto/protocol without direction syntax

2016-10-17 Thread Pablo Neira Ayuso
On Thu, Sep 22, 2016 at 10:34:52PM +0800, Liping Zhang wrote:
> From: Liping Zhang 
> 
> Acctually, ct l3proto and ct protocol are unrelated to direction, so
> it's unnecessary that we must specify dir if we want to use them.
> 
> Now add support that we can match ct l3proto/protocol without direction:
>   # nft add rule filter input ct l3proto ipv4
>   # nft add rule filter output ct protocol 17
> 
> Note: existing syntax is still preserved, so "ct reply l3proto ipv6"
> is still fine.

Applied, thanks.

Sorry, it seems I accidentally left this patch behind.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [libnftnl PATCH] libnftnl: update Arturo Borrero Gonzalez email

2016-10-17 Thread Pablo Neira Ayuso
On Mon, Oct 10, 2016 at 12:26:34PM +0200, Arturo Borrero Gonzalez wrote:
> Update Arturo Borrero Gonzalez email address.

Applied, thanks Arturo.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH libnftnl] set_elem: don't add NFTA_SET_ELEM_LIST_ELEMENTS attribute if set is empty

2016-10-17 Thread Pablo Neira Ayuso
If the set is empty, don't send an empty NFTA_SET_ELEM_LIST_ELEMENTS
netlink attributes with no elements.

Signed-off-by: Pablo Neira Ayuso 
---
 src/set_elem.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/set_elem.c b/src/set_elem.c
index 46fb7c6e424b..4d2b4f6074b7 100644
--- a/src/set_elem.c
+++ b/src/set_elem.c
@@ -304,6 +304,9 @@ void nftnl_set_elems_nlmsg_build_payload(struct nlmsghdr 
*nlh, struct nftnl_set
 
nftnl_set_elem_nlmsg_build_def(nlh, s);
 
+   if (list_empty(>element_list))
+   return;
+
nest1 = mnl_attr_nest_start(nlh, NFTA_SET_ELEM_LIST_ELEMENTS);
list_for_each_entry(elem, >element_list, head)
nftnl_set_elem_build(nlh, elem, ++i);
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH ulogd2] ulogd: fix crash when ipv4 packet is truncated

2016-10-17 Thread Pablo Neira Ayuso
On Tue, Oct 11, 2016 at 10:22:27PM +0800, Liping Zhang wrote:
> From: Liping Zhang 
> 
> If ipv4 packet is truncated, we should not try to dereference the
> iph pointer. Otherwise, if the user add such iptables rules
> "-j NFLOG --nflog-size 0", we will dereference the NULL pointer
> and crash may happen.

With Eric's permission, I'm applying this. Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch v2] netfilter: nf_tables: underflow in nft_parse_u32_check()

2016-10-17 Thread Pablo Neira Ayuso
On Wed, Oct 12, 2016 at 12:14:29PM +0300, Dan Carpenter wrote:
> We don't want to allow negatives here.

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch] netfilter: nft_exthdr: fix error handling in nft_exthdr_init()

2016-10-17 Thread Pablo Neira Ayuso
On Wed, Oct 12, 2016 at 09:09:12AM +0300, Dan Carpenter wrote:
> "err" needs to be signed for the error handling to work.

Applied, thanks Dan.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net 1/2] conntrack: remove obsolete sysctl (nf_conntrack_events_retry_timeout)

2016-10-17 Thread Pablo Neira Ayuso
On Mon, Oct 10, 2016 at 03:57:37PM +0200, Florian Westphal wrote:
> Nicolas Dichtel  wrote:
> > This entry has been removed in commit 9500507c6138.
> > 
> > Fixes: 9500507c6138 ("netfilter: conntrack: remove timer from ecache 
> > extension")
> > Signed-off-by: Nicolas Dichtel 
> 
> Acked-by: Florian Westphal 

Applied, thanks Nicolas.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH nf] netfilter: xt_NFLOG: fix unexpected truncated packet

2016-10-17 Thread Pablo Neira Ayuso
On Tue, Oct 11, 2016 at 10:26:27PM +0800, Liping Zhang wrote:
> From: Liping Zhang 
> 
> Justin and Chris spotted that iptables NFLOG target was broken when they
> upgraded the kernel to 4.8: "ulogd-2.0.5- IPs are no longer logged" or
> "results in segfaults in ulogd-2.0.5".
> 
> Because "struct nf_loginfo li;" is a local variable, and flags will be
> filled with garbage value, not inited to zero. So if it contains 0x1,
> packets will not be logged to the userspace anymore.

Applied and enqueued for -stable, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH nf] netfilter: xt_ipcomp: add "ip[6]t_ipcomp" module alias name

2016-10-17 Thread Pablo Neira Ayuso
On Wed, Oct 12, 2016 at 09:09:22PM +0800, Liping Zhang wrote:
> From: Liping Zhang 
> 
> Otherwise, user cannot add related rules if xt_ipcomp.ko is not loaded:
>   # iptables -A OUTPUT -p 108 -m ipcomp --ipcompspi 1
>   iptables: No chain/target/match by that name.

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH nf] netfilter: nft_hash: add missing NFTA_HASH_OFFSET's nla_policy

2016-10-17 Thread Pablo Neira Ayuso
On Wed, Oct 12, 2016 at 09:10:45PM +0800, Liping Zhang wrote:
> From: Liping Zhang 
> 
> Missing the nla_policy description will also miss the validation check
> in kernel.

Also applied, thanks Liping.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH nf,v2] netfilter: nf_queue: don't re-enter same hook on packet reinjection

2016-10-17 Thread Aaron Conole
Pablo Neira Ayuso  writes:

> Make sure we skip the current hook from where the packet was enqueued,
> otherwise the packets gets enqueued over and over again.
>
> Fixes: e3b37f11e6e4 ("netfilter: replace list_head with single linked list")
> Signed-off-by: Pablo Neira Ayuso 
> ---
> v2: Make sure next hook is non-null, otherwise we are at the end of the
> hook list and we can skip nf_iterate().
>
>  net/netfilter/nf_queue.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/net/netfilter/nf_queue.c b/net/netfilter/nf_queue.c
> index 96964a0070e1..691e713d70f5 100644
> --- a/net/netfilter/nf_queue.c
> +++ b/net/netfilter/nf_queue.c
> @@ -185,8 +185,9 @@ void nf_reinject(struct nf_queue_entry *entry, unsigned 
> int verdict)
>   }
>  
>   entry->state.thresh = INT_MIN;
> + hook_entry = rcu_dereference(hook_entry->next);
>  
> - if (verdict == NF_ACCEPT) {
> + if (hook_entry && verdict == NF_ACCEPT) {
>   next_hook:
>   verdict = nf_iterate(skb, >state, _entry);
>   }

ACK.  I thought switch case below could have a problem, but re-checked
the first nf_queue leg, and it seems okay.

-Aaron
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ANNOUNCE] ipset 6.30 released

2016-10-17 Thread Jozsef Kadlecsik
Hi,

I'm happy to announce ipset 6.30 which introduces a new set type, 
hash:ip,mac, and brings a couple of small corrections and backports from 
the most recent kernel tree.

Userspace changes:
  - Drop extra comma from error message (Neutron Soutmun)
  - Fix the incorrect dynamic/static modules list (Neutron Soutmun)
  - Correct tests to check the number of entries too
  - hash:ipmac type support added to ipset, userspace part (Tomasz 
Chilinski)
Kernel part changes:
  - netfilter: ipset: hash: fix boolreturn.cocci warnings
(Fengguang Wu)
  - Fix the nla_put_net64() API changes backport
  - netfilter: ipset: Fixing unnamed union init (Elad Raz)
  - netfilter: x_tables: Use par->net instead of computing from the passed
net devices (Eric W. Biederman)
  - Correct the reported memory size for bitmap:* types
  - Fix coding styles reported by checkpatch.pl, already in kernel
  - netfilter: x_tables: Pass struct net in xt_action_param
(Eric W. Biederman)
  - net: sched: fix skb->protocol use in case of accelerated vlan path
(Jiri Pirko)
  - Check IPSET_ATTR_ETHER netlink attribute length in hash:ipmac too
  - netfilter: fix include files for compilation (Mikko Rapeli)
  - ipset: Backports for the nla_put_net64() API changes (Neutron Soutmun)
  - netfilter: ipset: use setup_timer() and mod_timer().
(Muhammad Falak R Wani)
  - hash:ipmac type support added to ipset (Tomasz Chilinski)

You can download the source code of ipset from:
http://ipset.netfilter.org
ftp://ftp.netfilter.org/pub/ipset/
git://git.netfilter.org/ipset.git

Best regards,
Jozsef
-
E-mail  : kad...@blackhole.kfki.hu, kadlecsik.joz...@wigner.mta.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : Wigner Research Centre for Physics, Hungarian Academy of Sciences
  H-1525 Budapest 114, POB. 49, Hungary
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/10, nf-next] Netfilter core updates

2016-10-17 Thread Aaron Conole
Florian Westphal  writes:

> Pablo Neira Ayuso  wrote:
>> Let me know if you have any comment, otherwise I'll place this in the
>> nf-next tree so we can follow up working on top of these.
>
> Please do, thanks!

+1.  Some of this work was in my back burner, so thanks Pablo :)
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[PATCH 18/22] netfilter: ipset: hash:ipmac type support added to ipset

2016-10-17 Thread Jozsef Kadlecsik
From: Tomasz Chilinski 

Signed-off-by: Tomasz Chili??ski 
Signed-off-by: Jozsef Kadlecsik 
---
 net/netfilter/ipset/Kconfig |   9 +
 net/netfilter/ipset/Makefile|   1 +
 net/netfilter/ipset/ip_set_hash_ipmac.c | 315 
 3 files changed, 325 insertions(+)
 create mode 100644 net/netfilter/ipset/ip_set_hash_ipmac.c

diff --git a/net/netfilter/ipset/Kconfig b/net/netfilter/ipset/Kconfig
index 234a8ec..4083a80 100644
--- a/net/netfilter/ipset/Kconfig
+++ b/net/netfilter/ipset/Kconfig
@@ -99,6 +99,15 @@ config IP_SET_HASH_IPPORTNET
 
  To compile it as a module, choose M here.  If unsure, say N.
 
+config IP_SET_HASH_IPMAC
+   tristate "hash:ip,mac set support"
+   depends on IP_SET
+   help
+ This option adds the hash:ip,mac set type support, by which
+ one can store IPv4/IPv6 address and MAC (ethernet address) pairs in a 
set.
+
+ To compile it as a module, choose M here.  If unsure, say N.
+
 config IP_SET_HASH_MAC
tristate "hash:mac set support"
depends on IP_SET
diff --git a/net/netfilter/ipset/Makefile b/net/netfilter/ipset/Makefile
index 3dbd5e9..28ec148 100644
--- a/net/netfilter/ipset/Makefile
+++ b/net/netfilter/ipset/Makefile
@@ -14,6 +14,7 @@ obj-$(CONFIG_IP_SET_BITMAP_PORT) += ip_set_bitmap_port.o
 
 # hash types
 obj-$(CONFIG_IP_SET_HASH_IP) += ip_set_hash_ip.o
+obj-$(CONFIG_IP_SET_HASH_IPMAC) += ip_set_hash_ipmac.o
 obj-$(CONFIG_IP_SET_HASH_IPMARK) += ip_set_hash_ipmark.o
 obj-$(CONFIG_IP_SET_HASH_IPPORT) += ip_set_hash_ipport.o
 obj-$(CONFIG_IP_SET_HASH_IPPORTIP) += ip_set_hash_ipportip.o
diff --git a/net/netfilter/ipset/ip_set_hash_ipmac.c 
b/net/netfilter/ipset/ip_set_hash_ipmac.c
new file mode 100644
index 000..d9eb144
--- /dev/null
+++ b/net/netfilter/ipset/ip_set_hash_ipmac.c
@@ -0,0 +1,315 @@
+/* Copyright (C) 2016 Tomasz Chilinski 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+/* Kernel module implementing an IP set type: the hash:ip,mac type */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+
+#define IPSET_TYPE_REV_MIN 0
+#define IPSET_TYPE_REV_MAX 0
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Tomasz Chilinski ");
+IP_SET_MODULE_DESC("hash:ip,mac", IPSET_TYPE_REV_MIN, IPSET_TYPE_REV_MAX);
+MODULE_ALIAS("ip_set_hash:ip,mac");
+
+/* Type specific function prefix */
+#define HTYPE  hash_ipmac
+
+/* Zero valued element is not supported */
+static const unsigned char invalid_ether[ETH_ALEN] = { 0 };
+
+/* IPv4 variant */
+
+/* Member elements */
+struct hash_ipmac4_elem {
+   /* Zero valued IP addresses cannot be stored */
+   __be32 ip;
+   union {
+   unsigned char ether[ETH_ALEN];
+   __be32 foo[2];
+   };
+};
+
+/* Common functions */
+
+static inline bool
+hash_ipmac4_data_equal(const struct hash_ipmac4_elem *e1,
+  const struct hash_ipmac4_elem *e2,
+  u32 *multi)
+{
+   return e1->ip == e2->ip && ether_addr_equal(e1->ether, e2->ether);
+}
+
+static bool
+hash_ipmac4_data_list(struct sk_buff *skb, const struct hash_ipmac4_elem *e)
+{
+   if (nla_put_ipaddr4(skb, IPSET_ATTR_IP, e->ip) ||
+   nla_put(skb, IPSET_ATTR_ETHER, ETH_ALEN, e->ether))
+   goto nla_put_failure;
+   return 0;
+
+nla_put_failure:
+   return 1;
+}
+
+static inline void
+hash_ipmac4_data_next(struct hash_ipmac4_elem *next,
+ const struct hash_ipmac4_elem *e)
+{
+   next->ip = e->ip;
+}
+
+#define MTYPE  hash_ipmac4
+#define PF 4
+#define HOST_MASK  32
+#define HKEY_DATALEN   sizeof(struct hash_ipmac4_elem)
+#include "ip_set_hash_gen.h"
+
+static int
+hash_ipmac4_kadt(struct ip_set *set, const struct sk_buff *skb,
+const struct xt_action_param *par,
+enum ipset_adt adt, struct ip_set_adt_opt *opt)
+{
+   ipset_adtfn adtfn = set->variant->adt[adt];
+   struct hash_ipmac4_elem e = { .ip = 0, { .foo[0] = 0, .foo[1] = 0 } };
+   struct ip_set_ext ext = IP_SET_INIT_KEXT(skb, opt, set);
+
+/* MAC can be src only */
+   if (!(opt->flags & IPSET_DIM_TWO_SRC))
+   return 0;
+
+   if (skb_mac_header(skb) < skb->head ||
+   (skb_mac_header(skb) + ETH_HLEN) > skb->data)
+   return -EINVAL;
+
+   memcpy(e.ether, eth_hdr(skb)->h_source, ETH_ALEN);
+   if (ether_addr_equal(e.ether, invalid_ether))
+   return -EINVAL;
+
+   ip4addrptr(skb, opt->flags & IPSET_DIM_ONE_SRC, );
+
+   return adtfn(set, , , 

[PATCH 07/22] netfilter: ipset: Regroup ip_set_put_extensions and add extern

2016-10-17 Thread Jozsef Kadlecsik
Signed-off-by: Jozsef Kadlecsik 
---
 include/linux/netfilter/ipset/ip_set.h | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/include/linux/netfilter/ipset/ip_set.h 
b/include/linux/netfilter/ipset/ip_set.h
index b5bd0fb3..7a218eb 100644
--- a/include/linux/netfilter/ipset/ip_set.h
+++ b/include/linux/netfilter/ipset/ip_set.h
@@ -331,6 +331,8 @@ extern size_t ip_set_elem_len(struct ip_set *set, struct 
nlattr *tb[],
  size_t len, size_t align);
 extern int ip_set_get_extensions(struct ip_set *set, struct nlattr *tb[],
 struct ip_set_ext *ext);
+extern int ip_set_put_extensions(struct sk_buff *skb, const struct ip_set *set,
+const void *e, bool active);
 
 static inline int
 ip_set_get_hostipaddr4(struct nlattr *nla, u32 *ipaddr)
@@ -449,10 +451,6 @@ static inline int nla_put_ipaddr6(struct sk_buff *skb, int 
type,
 #include 
 #include 
 
-int
-ip_set_put_extensions(struct sk_buff *skb, const struct ip_set *set,
- const void *e, bool active);
-
 #define IP_SET_INIT_KEXT(skb, opt, set)\
{ .bytes = (skb)->len, .packets = 1,\
  .timeout = ip_set_adt_opt_timeout(opt, set) }
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 10/22] netfilter: ipset: Count non-static extension memory for userspace

2016-10-17 Thread Jozsef Kadlecsik
Non-static (i.e. comment) extension was not counted into the memory
size. A new internal counter is introduced for this. In the case of
the hash types the sizes of the arrays are counted there as well so
that we can avoid to scan the whole set when just the header data
is requested.

Signed-off-by: Jozsef Kadlecsik 
---
 include/linux/netfilter/ipset/ip_set.h |  8 ++--
 include/linux/netfilter/ipset/ip_set_comment.h |  7 +--
 net/netfilter/ipset/ip_set_bitmap_gen.h|  5 +++--
 net/netfilter/ipset/ip_set_core.c  |  2 +-
 net/netfilter/ipset/ip_set_hash_gen.h  | 26 ++
 net/netfilter/ipset/ip_set_list_set.c  |  5 +++--
 6 files changed, 32 insertions(+), 21 deletions(-)

diff --git a/include/linux/netfilter/ipset/ip_set.h 
b/include/linux/netfilter/ipset/ip_set.h
index 4671d74..8e42253 100644
--- a/include/linux/netfilter/ipset/ip_set.h
+++ b/include/linux/netfilter/ipset/ip_set.h
@@ -79,10 +79,12 @@ enum ip_set_ext_id {
IPSET_EXT_ID_MAX,
 };
 
+struct ip_set;
+
 /* Extension type */
 struct ip_set_ext_type {
/* Destroy extension private data (can be NULL) */
-   void (*destroy)(void *ext);
+   void (*destroy)(struct ip_set *set, void *ext);
enum ip_set_extension type;
enum ipset_cadt_flags flag;
/* Size and minimal alignment */
@@ -252,6 +254,8 @@ struct ip_set {
u32 timeout;
/* Number of elements (vs timeout) */
u32 elements;
+   /* Size of the dynamic extensions (vs timeout) */
+   size_t ext_size;
/* Element data size */
size_t dsize;
/* Offsets to extensions in elements */
@@ -268,7 +272,7 @@ struct ip_set {
 */
if (SET_WITH_COMMENT(set))
ip_set_extensions[IPSET_EXT_ID_COMMENT].destroy(
-   ext_comment(data, set));
+   set, ext_comment(data, set));
 }
 
 static inline int
diff --git a/include/linux/netfilter/ipset/ip_set_comment.h 
b/include/linux/netfilter/ipset/ip_set_comment.h
index 5444b1b..8e2bab1 100644
--- a/include/linux/netfilter/ipset/ip_set_comment.h
+++ b/include/linux/netfilter/ipset/ip_set_comment.h
@@ -20,13 +20,14 @@
  * The kadt functions don't use the comment extensions in any way.
  */
 static inline void
-ip_set_init_comment(struct ip_set_comment *comment,
+ip_set_init_comment(struct ip_set *set, struct ip_set_comment *comment,
const struct ip_set_ext *ext)
 {
struct ip_set_comment_rcu *c = rcu_dereference_protected(comment->c, 1);
size_t len = ext->comment ? strlen(ext->comment) : 0;
 
if (unlikely(c)) {
+   set->ext_size -= sizeof(*c) + strlen(c->str) + 1;
kfree_rcu(c, rcu);
rcu_assign_pointer(comment->c, NULL);
}
@@ -38,6 +39,7 @@
if (unlikely(!c))
return;
strlcpy(c->str, ext->comment, len + 1);
+   set->ext_size += sizeof(*c) + strlen(c->str) + 1;
rcu_assign_pointer(comment->c, c);
 }
 
@@ -58,13 +60,14 @@
  * of the set data anymore.
  */
 static inline void
-ip_set_comment_free(struct ip_set_comment *comment)
+ip_set_comment_free(struct ip_set *set, struct ip_set_comment *comment)
 {
struct ip_set_comment_rcu *c;
 
c = rcu_dereference_protected(comment->c, 1);
if (unlikely(!c))
return;
+   set->ext_size -= sizeof(*c) + strlen(c->str) + 1;
kfree_rcu(c, rcu);
rcu_assign_pointer(comment->c, NULL);
 }
diff --git a/net/netfilter/ipset/ip_set_bitmap_gen.h 
b/net/netfilter/ipset/ip_set_bitmap_gen.h
index 13a7021..5a9fa61 100644
--- a/net/netfilter/ipset/ip_set_bitmap_gen.h
+++ b/net/netfilter/ipset/ip_set_bitmap_gen.h
@@ -84,6 +84,7 @@
mtype_ext_cleanup(set);
memset(map->members, 0, map->memsize);
set->elements = 0;
+   set->ext_size = 0;
 }
 
 /* Calculate the actual memory size of the set data */
@@ -101,7 +102,7 @@
 {
const struct mtype *map = set->data;
struct nlattr *nested;
-   size_t memsize = mtype_memsize(map, set->dsize);
+   size_t memsize = mtype_memsize(map, set->dsize) + set->ext_size;
 
nested = ipset_nest_start(skb, IPSET_ATTR_DATA);
if (!nested)
@@ -175,7 +176,7 @@
if (SET_WITH_COUNTER(set))
ip_set_init_counter(ext_counter(x, set), ext);
if (SET_WITH_COMMENT(set))
-   ip_set_init_comment(ext_comment(x, set), ext);
+   ip_set_init_comment(set, ext_comment(x, set), ext);
if (SET_WITH_SKBINFO(set))
ip_set_init_skbinfo(ext_skbinfo(x, set), ext);
 
diff --git a/net/netfilter/ipset/ip_set_core.c 
b/net/netfilter/ipset/ip_set_core.c
index 3bca341..cd8961e 100644
--- a/net/netfilter/ipset/ip_set_core.c
+++ b/net/netfilter/ipset/ip_set_core.c
@@ -324,7 +324,7 @@ static inline struct ip_set_net *ip_set_pernet(struct net 
*net)
 }
 

[PATCH 02/22] netfilter: ipset: Headers file cleanup

2016-10-17 Thread Jozsef Kadlecsik
Remove extra whitespace, group counter helper together. Mark some of
the helpers arguments as const.

Ported from a patch proposed by Sergey Popovich .

Suggested-by: Sergey Popovich 
Signed-off-by: Jozsef Kadlecsik 
---
 include/linux/netfilter/ipset/ip_set.h | 57 +-
 include/linux/netfilter/ipset/ip_set_comment.h |  2 +-
 include/linux/netfilter/ipset/ip_set_timeout.h |  4 +-
 3 files changed, 32 insertions(+), 31 deletions(-)

diff --git a/include/linux/netfilter/ipset/ip_set.h 
b/include/linux/netfilter/ipset/ip_set.h
index 83b9a2e..1ea28e3 100644
--- a/include/linux/netfilter/ipset/ip_set.h
+++ b/include/linux/netfilter/ipset/ip_set.h
@@ -334,18 +334,40 @@ struct ip_set {
}
 }
 
+static inline bool
+ip_set_put_counter(struct sk_buff *skb, const struct ip_set_counter *counter)
+{
+   return nla_put_net64(skb, IPSET_ATTR_BYTES,
+cpu_to_be64(ip_set_get_bytes(counter)),
+IPSET_ATTR_PAD) ||
+  nla_put_net64(skb, IPSET_ATTR_PACKETS,
+cpu_to_be64(ip_set_get_packets(counter)),
+IPSET_ATTR_PAD);
+}
+
+static inline void
+ip_set_init_counter(struct ip_set_counter *counter,
+   const struct ip_set_ext *ext)
+{
+   if (ext->bytes != ULLONG_MAX)
+   atomic64_set(&(counter)->bytes, (long long)(ext->bytes));
+   if (ext->packets != ULLONG_MAX)
+   atomic64_set(&(counter)->packets, (long long)(ext->packets));
+}
+
 static inline void
 ip_set_get_skbinfo(struct ip_set_skbinfo *skbinfo,
- const struct ip_set_ext *ext,
- struct ip_set_ext *mext, u32 flags)
+  const struct ip_set_ext *ext,
+  struct ip_set_ext *mext, u32 flags)
 {
-   mext->skbmark = skbinfo->skbmark;
-   mext->skbmarkmask = skbinfo->skbmarkmask;
-   mext->skbprio = skbinfo->skbprio;
-   mext->skbqueue = skbinfo->skbqueue;
+   mext->skbmark = skbinfo->skbmark;
+   mext->skbmarkmask = skbinfo->skbmarkmask;
+   mext->skbprio = skbinfo->skbprio;
+   mext->skbqueue = skbinfo->skbqueue;
 }
+
 static inline bool
-ip_set_put_skbinfo(struct sk_buff *skb, struct ip_set_skbinfo *skbinfo)
+ip_set_put_skbinfo(struct sk_buff *skb, const struct ip_set_skbinfo *skbinfo)
 {
/* Send nonzero parameters only */
return ((skbinfo->skbmark || skbinfo->skbmarkmask) &&
@@ -371,27 +393,6 @@ struct ip_set {
skbinfo->skbqueue = ext->skbqueue;
 }
 
-static inline bool
-ip_set_put_counter(struct sk_buff *skb, struct ip_set_counter *counter)
-{
-   return nla_put_net64(skb, IPSET_ATTR_BYTES,
-cpu_to_be64(ip_set_get_bytes(counter)),
-IPSET_ATTR_PAD) ||
-  nla_put_net64(skb, IPSET_ATTR_PACKETS,
-cpu_to_be64(ip_set_get_packets(counter)),
-IPSET_ATTR_PAD);
-}
-
-static inline void
-ip_set_init_counter(struct ip_set_counter *counter,
-   const struct ip_set_ext *ext)
-{
-   if (ext->bytes != ULLONG_MAX)
-   atomic64_set(&(counter)->bytes, (long long)(ext->bytes));
-   if (ext->packets != ULLONG_MAX)
-   atomic64_set(&(counter)->packets, (long long)(ext->packets));
-}
-
 /* Netlink CB args */
 enum {
IPSET_CB_NET = 0,   /* net namespace */
diff --git a/include/linux/netfilter/ipset/ip_set_comment.h 
b/include/linux/netfilter/ipset/ip_set_comment.h
index 8d02485..bae5c76 100644
--- a/include/linux/netfilter/ipset/ip_set_comment.h
+++ b/include/linux/netfilter/ipset/ip_set_comment.h
@@ -43,7 +43,7 @@
 
 /* Used only when dumping a set, protected by rcu_read_lock_bh() */
 static inline int
-ip_set_put_comment(struct sk_buff *skb, struct ip_set_comment *comment)
+ip_set_put_comment(struct sk_buff *skb, const struct ip_set_comment *comment)
 {
struct ip_set_comment_rcu *c = rcu_dereference_bh(comment->c);
 
diff --git a/include/linux/netfilter/ipset/ip_set_timeout.h 
b/include/linux/netfilter/ipset/ip_set_timeout.h
index 1d6a935..bfb3531 100644
--- a/include/linux/netfilter/ipset/ip_set_timeout.h
+++ b/include/linux/netfilter/ipset/ip_set_timeout.h
@@ -40,7 +40,7 @@
 }
 
 static inline bool
-ip_set_timeout_expired(unsigned long *t)
+ip_set_timeout_expired(const unsigned long *t)
 {
return *t != IPSET_ELEM_PERMANENT && time_is_before_jiffies(*t);
 }
@@ -63,7 +63,7 @@
 }
 
 static inline u32
-ip_set_timeout_get(unsigned long *timeout)
+ip_set_timeout_get(const unsigned long *timeout)
 {
return *timeout == IPSET_ELEM_PERMANENT ? 0 :
jiffies_to_msecs(*timeout - jiffies)/MSEC_PER_SEC;
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More 

[PATCH 09/22] netfilter: ipset: Add element count to all set types header

2016-10-17 Thread Jozsef Kadlecsik
It is better to list the set elements for all set types, thus the
header information is uniform. Element counts are therefore added
to the bitmap and list types.

Signed-off-by: Jozsef Kadlecsik 
---
 include/linux/netfilter/ipset/ip_set.h|  2 ++
 include/linux/netfilter/ipset/ip_set_bitmap.h |  2 +-
 net/netfilter/ipset/ip_set_bitmap_gen.h   | 10 +-
 net/netfilter/ipset/ip_set_hash_gen.h | 21 ++---
 net/netfilter/ipset/ip_set_list_set.c |  6 +-
 5 files changed, 27 insertions(+), 14 deletions(-)

diff --git a/include/linux/netfilter/ipset/ip_set.h 
b/include/linux/netfilter/ipset/ip_set.h
index 7a218eb..4671d74 100644
--- a/include/linux/netfilter/ipset/ip_set.h
+++ b/include/linux/netfilter/ipset/ip_set.h
@@ -250,6 +250,8 @@ struct ip_set {
u8 flags;
/* Default timeout value, if enabled */
u32 timeout;
+   /* Number of elements (vs timeout) */
+   u32 elements;
/* Element data size */
size_t dsize;
/* Offsets to extensions in elements */
diff --git a/include/linux/netfilter/ipset/ip_set_bitmap.h 
b/include/linux/netfilter/ipset/ip_set_bitmap.h
index 5e4662a..366d6c0 100644
--- a/include/linux/netfilter/ipset/ip_set_bitmap.h
+++ b/include/linux/netfilter/ipset/ip_set_bitmap.h
@@ -6,8 +6,8 @@
 #define IPSET_BITMAP_MAX_RANGE 0x
 
 enum {
+   IPSET_ADD_STORE_PLAIN_TIMEOUT = -1,
IPSET_ADD_FAILED = 1,
-   IPSET_ADD_STORE_PLAIN_TIMEOUT,
IPSET_ADD_START_STORED_TIMEOUT,
 };
 
diff --git a/net/netfilter/ipset/ip_set_bitmap_gen.h 
b/net/netfilter/ipset/ip_set_bitmap_gen.h
index c22cdde..13a7021 100644
--- a/net/netfilter/ipset/ip_set_bitmap_gen.h
+++ b/net/netfilter/ipset/ip_set_bitmap_gen.h
@@ -83,6 +83,7 @@
if (set->extensions & IPSET_EXT_DESTROY)
mtype_ext_cleanup(set);
memset(map->members, 0, map->memsize);
+   set->elements = 0;
 }
 
 /* Calculate the actual memory size of the set data */
@@ -107,7 +108,8 @@
goto nla_put_failure;
if (mtype_do_head(skb, map) ||
nla_put_net32(skb, IPSET_ATTR_REFERENCES, htonl(set->ref)) ||
-   nla_put_net32(skb, IPSET_ATTR_MEMSIZE, htonl(memsize)))
+   nla_put_net32(skb, IPSET_ATTR_MEMSIZE, htonl(memsize)) ||
+   nla_put_net32(skb, IPSET_ATTR_ELEMENTS, htonl(set->elements)))
goto nla_put_failure;
if (unlikely(ip_set_put_flags(skb, set)))
goto nla_put_failure;
@@ -151,6 +153,7 @@
if (ret == IPSET_ADD_FAILED) {
if (SET_WITH_TIMEOUT(set) &&
ip_set_timeout_expired(ext_timeout(x, set))) {
+   set->elements--;
ret = 0;
} else if (!(flags & IPSET_FLAG_EXIST)) {
set_bit(e->id, map->members);
@@ -159,6 +162,8 @@
/* Element is re-added, cleanup extensions */
ip_set_ext_destroy(set, x);
}
+   if (ret > 0)
+   set->elements--;
 
if (SET_WITH_TIMEOUT(set))
 #ifdef IP_SET_BITMAP_STORED_TIMEOUT
@@ -176,6 +181,7 @@
 
/* Activate element */
set_bit(e->id, map->members);
+   set->elements++;
 
return 0;
 }
@@ -192,6 +198,7 @@
return -IPSET_ERR_EXIST;
 
ip_set_ext_destroy(set, x);
+   set->elements--;
if (SET_WITH_TIMEOUT(set) &&
ip_set_timeout_expired(ext_timeout(x, set)))
return -IPSET_ERR_EXIST;
@@ -287,6 +294,7 @@
if (ip_set_timeout_expired(ext_timeout(x, set))) {
clear_bit(id, map->members);
ip_set_ext_destroy(set, x);
+   set->elements--;
}
}
spin_unlock_bh(>lock);
diff --git a/net/netfilter/ipset/ip_set_hash_gen.h 
b/net/netfilter/ipset/ip_set_hash_gen.h
index 66a55a5..09465d1 100644
--- a/net/netfilter/ipset/ip_set_hash_gen.h
+++ b/net/netfilter/ipset/ip_set_hash_gen.h
@@ -277,7 +277,6 @@ struct net_prefixes {
 struct htype {
struct htable __rcu *table; /* the hash table */
u32 maxelem;/* max elements in the hash */
-   u32 elements;   /* current element (vs timeout) */
u32 initval;/* random jhash init value */
 #ifdef IP_SET_HASH_WITH_MARKMASK
u32 markmask;   /* markmask value for mark mask to store */
@@ -402,7 +401,7 @@ struct htype {
 #ifdef IP_SET_HASH_WITH_NETS
memset(h->nets, 0, sizeof(struct net_prefixes) * NLEN(set->family));
 #endif
-   h->elements = 0;
+   set->elements = 0;
 }
 
 /* Destroy the hashtable part of the set */
@@ -508,7 +507,7 @@ struct htype {
nets_length, k);
 #endif
ip_set_ext_destroy(set, data);
-   h->elements--;
+  

[PATCH 00/22] ipset patches for nf-next

2016-10-17 Thread Jozsef Kadlecsik
Hi Pablo,

Please consider to apply the next bunch of patches for ipset.
There is new set type in it (hash:ip,mac), elemet counts are reported
to userspace in the set headers data and a couple of small cleanups,
improvements

* rcu_dereference_bh_nfnl() redefined to accept netfilter subsys id.
* Header files cleanup: counter helper functions are grouped together,
  some args are changed to const.
* struct ip_set_skbinfo is introduced instead of open coded fields
  in skbinfo get/init helper funcions.
* In comment extension allocate area with kmalloc() rather than kzalloc().
* Split all extensions into separate files.
* Separate memsize calculation into dedicated functions.
* ip_set_put_extensions() is regrouped and extern is added.
* Add element count to hash headers by Eric B Munson.
* Add element count to all set types header for uniform output.
* Count non-static extension memory into memsize calculation for
  userspace.
* Simplify mtype_expire() for hash types by removing redundant
  parameters which can be get from other ones.
* Make NLEN compile time constant for hash types.
* Make sure element data size is a multiple of u32.
* Optimize hash creation routine, exit as early as possible.
* Make struct htype per ipset family.
* Collapse same condition body into a single one.
* Fix reported memory size for hash:* types.
* hash:ipmac type support added to ipset by Tomasz Chilinski.
* Use setup_timer() and mod_timer() instead of init_timer()
  by Muhammad Falak R Wani, individually for the set type families.
* hash: fix boolreturn.cocci warnings avout bool should use true/false
  by Fengguang Wu.

The following changes since commit 1b830996c1603225a96e233c3b09bf2b12607d78:

  Merge branch 's390-net' (2016-10-12 01:56:10 -0400)

are available in the git repository at:

  git://blackhole.kfki.hu/nf-next master

for you to fetch changes up to 214ee1d9a5e73f13a126849c69fdb29dfe2bdb3f:

  netfilter: ipset: hash: fix boolreturn.cocci warnings (2016-10-15 14:51:59 
+0200)


Eric B Munson (1):
  netfilter: ipset: Add element count to hash headers

Jozsef Kadlecsik (16):
  netfilter: ipset: Correct rcu_dereference_bh_nfnl() usage
  netfilter: ipset: Headers file cleanup
  netfilter: ipset: Improve skbinfo get/init helpers
  netfilter: ipset: Improve comment extension helpers
  netfilter: ipset: Split extensions into separate files
  netfilter: ipset: Separate memsize calculation code into dedicated 
function
  netfilter: ipset: Regroup ip_set_put_extensions and add extern
  netfilter: ipset: Add element count to all set types header
  netfilter: ipset: Count non-static extension memory for userspace
  netfilter: ipset: Simplify mtype_expire() for hash types
  netfilter: ipset: Make NLEN compile time constant for hash types
  netfilter: ipset: Make sure element data size is a multiple of u32
  netfilter: ipset: Optimize hash creation routine
  netfilter: ipset: Make struct htype per ipset family
  netfilter: ipset: Collapse same condition body to a single one
  netfilter: ipset: Fix reported memory size for hash:* types

Muhammad Falak R Wani (3):
  netfilter: ipset: use setup_timer() and mod_timer().
  netfilter: ipset: use setup_timer() and mod_timer().
  netfilter: ipset: use setup_timer() and mod_timer().

Tomasz Chilinski (1):
  netfilter: ipset: hash:ipmac type support added to ipset

kbuild test robot (1):
  netfilter: ipset: hash: fix boolreturn.cocci warnings

 include/linux/netfilter/ipset/ip_set.h | 136 ++-
 include/linux/netfilter/ipset/ip_set_bitmap.h  |   2 +-
 include/linux/netfilter/ipset/ip_set_comment.h |  11 +-
 include/linux/netfilter/ipset/ip_set_counter.h |  75 ++
 include/linux/netfilter/ipset/ip_set_skbinfo.h |  46 
 include/linux/netfilter/ipset/ip_set_timeout.h |   4 +-
 net/netfilter/ipset/Kconfig|   9 +
 net/netfilter/ipset/Makefile   |   1 +
 net/netfilter/ipset/ip_set_bitmap_gen.h|  33 ++-
 net/netfilter/ipset/ip_set_core.c  |  14 +-
 net/netfilter/ipset/ip_set_hash_gen.h  | 264 ++---
 net/netfilter/ipset/ip_set_hash_ip.c   |  10 +-
 net/netfilter/ipset/ip_set_hash_ipmac.c| 315 +
 net/netfilter/ipset/ip_set_hash_ipmark.c   |  10 +-
 net/netfilter/ipset/ip_set_hash_ipport.c   |   6 +-
 net/netfilter/ipset/ip_set_hash_ipportip.c |   6 +-
 net/netfilter/ipset/ip_set_hash_ipportnet.c|  10 +-
 net/netfilter/ipset/ip_set_hash_net.c  |   8 +-
 net/netfilter/ipset/ip_set_hash_netiface.c |   8 +-
 net/netfilter/ipset/ip_set_hash_netnet.c   |   8 +-
 net/netfilter/ipset/ip_set_hash_netport.c  |  10 +-
 net/netfilter/ipset/ip_set_hash_netportnet.c   |  10 +-
 net/netfilter/ipset/ip_set_list_set.c  |  37 ++-
 net/netfilter/xt_set.c |  12 +-

[PATCH 17/22] netfilter: ipset: Fix reported memory size for hash:* types

2016-10-17 Thread Jozsef Kadlecsik
The calculation of the full allocated memory did not take
into account the size of the base hash bucket structure at some
places.

Signed-off-by: Jozsef Kadlecsik 
---
 net/netfilter/ipset/ip_set_hash_gen.h | 16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/net/netfilter/ipset/ip_set_hash_gen.h 
b/net/netfilter/ipset/ip_set_hash_gen.h
index f4b30b6..295ad84 100644
--- a/net/netfilter/ipset/ip_set_hash_gen.h
+++ b/net/netfilter/ipset/ip_set_hash_gen.h
@@ -87,6 +87,8 @@ struct htable {
 };
 
 #define hbucket(h, i)  ((h)->bucket[i])
+#define ext_size(n, dsize) \
+   (sizeof(struct hbucket) + (n) * (dsize))
 
 #ifndef IPSET_NET_COUNT
 #define IPSET_NET_COUNT1
@@ -521,7 +523,7 @@ struct htype {
d++;
}
tmp->pos = d;
-   set->ext_size -= AHASH_INIT_SIZE * dsize;
+   set->ext_size -= ext_size(AHASH_INIT_SIZE, dsize);
rcu_assign_pointer(hbucket(t, i), tmp);
kfree_rcu(n, rcu);
}
@@ -627,7 +629,7 @@ struct htype {
goto cleanup;
}
m->size = AHASH_INIT_SIZE;
-   extsize = sizeof(*m) + AHASH_INIT_SIZE * dsize;
+   extsize = ext_size(AHASH_INIT_SIZE, dsize);
RCU_INIT_POINTER(hbucket(t, key), m);
} else if (m->pos >= m->size) {
struct hbucket *ht;
@@ -647,7 +649,7 @@ struct htype {
memcpy(ht, m, sizeof(struct hbucket) +
  m->size * dsize);
ht->size = m->size + AHASH_INIT_SIZE;
-   extsize += AHASH_INIT_SIZE * dsize;
+   extsize += ext_size(AHASH_INIT_SIZE, dsize);
kfree(m);
m = ht;
RCU_INIT_POINTER(hbucket(t, key), ht);
@@ -729,7 +731,7 @@ struct htype {
if (!n)
return -ENOMEM;
n->size = AHASH_INIT_SIZE;
-   set->ext_size += sizeof(*n) + AHASH_INIT_SIZE * set->dsize;
+   set->ext_size += ext_size(AHASH_INIT_SIZE, set->dsize);
goto copy_elem;
}
for (i = 0; i < n->pos; i++) {
@@ -793,7 +795,7 @@ struct htype {
memcpy(n, old, sizeof(struct hbucket) +
   old->size * set->dsize);
n->size = old->size + AHASH_INIT_SIZE;
-   set->ext_size += AHASH_INIT_SIZE * set->dsize;
+   set->ext_size += ext_size(AHASH_INIT_SIZE, set->dsize);
}
 
 copy_elem:
@@ -885,7 +887,7 @@ struct htype {
k++;
}
if (n->pos == 0 && k == 0) {
-   set->ext_size -= sizeof(*n) + n->size * dsize;
+   set->ext_size -= ext_size(n->size, dsize);
rcu_assign_pointer(hbucket(t, key), NULL);
kfree_rcu(n, rcu);
} else if (k >= AHASH_INIT_SIZE) {
@@ -904,7 +906,7 @@ struct htype {
k++;
}
tmp->pos = k;
-   set->ext_size -= AHASH_INIT_SIZE * dsize;
+   set->ext_size -= ext_size(AHASH_INIT_SIZE, dsize);
rcu_assign_pointer(hbucket(t, key), tmp);
kfree_rcu(n, rcu);
}
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 03/22] netfilter: ipset: Improve skbinfo get/init helpers

2016-10-17 Thread Jozsef Kadlecsik
Use struct ip_set_skbinfo in struct ip_set_ext instead of open
coded fields and assign structure members in get/init helpers
instead of copying members one by one.

Ported from a patch proposed by Sergey Popovich .

Suggested-by: Sergey Popovich 
Signed-off-by: Jozsef Kadlecsik 
---
 include/linux/netfilter/ipset/ip_set.h | 30 +++---
 net/netfilter/ipset/ip_set_core.c  | 12 ++--
 net/netfilter/xt_set.c | 12 +++-
 3 files changed, 24 insertions(+), 30 deletions(-)

diff --git a/include/linux/netfilter/ipset/ip_set.h 
b/include/linux/netfilter/ipset/ip_set.h
index 1ea28e3..7802621 100644
--- a/include/linux/netfilter/ipset/ip_set.h
+++ b/include/linux/netfilter/ipset/ip_set.h
@@ -92,17 +92,6 @@ struct ip_set_ext_type {
 
 extern const struct ip_set_ext_type ip_set_extensions[];
 
-struct ip_set_ext {
-   u64 packets;
-   u64 bytes;
-   u32 timeout;
-   u32 skbmark;
-   u32 skbmarkmask;
-   u32 skbprio;
-   u16 skbqueue;
-   char *comment;
-};
-
 struct ip_set_counter {
atomic64_t bytes;
atomic64_t packets;
@@ -122,6 +111,15 @@ struct ip_set_skbinfo {
u32 skbmarkmask;
u32 skbprio;
u16 skbqueue;
+   u16 __pad;
+};
+
+struct ip_set_ext {
+   struct ip_set_skbinfo skbinfo;
+   u64 packets;
+   u64 bytes;
+   char *comment;
+   u32 timeout;
 };
 
 struct ip_set;
@@ -360,10 +358,7 @@ struct ip_set {
   const struct ip_set_ext *ext,
   struct ip_set_ext *mext, u32 flags)
 {
-   mext->skbmark = skbinfo->skbmark;
-   mext->skbmarkmask = skbinfo->skbmarkmask;
-   mext->skbprio = skbinfo->skbprio;
-   mext->skbqueue = skbinfo->skbqueue;
+   mext->skbinfo = *skbinfo;
 }
 
 static inline bool
@@ -387,10 +382,7 @@ struct ip_set {
 ip_set_init_skbinfo(struct ip_set_skbinfo *skbinfo,
const struct ip_set_ext *ext)
 {
-   skbinfo->skbmark = ext->skbmark;
-   skbinfo->skbmarkmask = ext->skbmarkmask;
-   skbinfo->skbprio = ext->skbprio;
-   skbinfo->skbqueue = ext->skbqueue;
+   *skbinfo = ext->skbinfo;
 }
 
 /* Netlink CB args */
diff --git a/net/netfilter/ipset/ip_set_core.c 
b/net/netfilter/ipset/ip_set_core.c
index a748b0c..3bca341 100644
--- a/net/netfilter/ipset/ip_set_core.c
+++ b/net/netfilter/ipset/ip_set_core.c
@@ -426,20 +426,20 @@ static inline struct ip_set_net *ip_set_pernet(struct net 
*net)
if (!SET_WITH_SKBINFO(set))
return -IPSET_ERR_SKBINFO;
fullmark = be64_to_cpu(nla_get_be64(tb[IPSET_ATTR_SKBMARK]));
-   ext->skbmark = fullmark >> 32;
-   ext->skbmarkmask = fullmark & 0x;
+   ext->skbinfo.skbmark = fullmark >> 32;
+   ext->skbinfo.skbmarkmask = fullmark & 0x;
}
if (tb[IPSET_ATTR_SKBPRIO]) {
if (!SET_WITH_SKBINFO(set))
return -IPSET_ERR_SKBINFO;
-   ext->skbprio = be32_to_cpu(nla_get_be32(
-   tb[IPSET_ATTR_SKBPRIO]));
+   ext->skbinfo.skbprio =
+   be32_to_cpu(nla_get_be32(tb[IPSET_ATTR_SKBPRIO]));
}
if (tb[IPSET_ATTR_SKBQUEUE]) {
if (!SET_WITH_SKBINFO(set))
return -IPSET_ERR_SKBINFO;
-   ext->skbqueue = be16_to_cpu(nla_get_be16(
-   tb[IPSET_ATTR_SKBQUEUE]));
+   ext->skbinfo.skbqueue =
+   be16_to_cpu(nla_get_be16(tb[IPSET_ATTR_SKBQUEUE]));
}
return 0;
 }
diff --git a/net/netfilter/xt_set.c b/net/netfilter/xt_set.c
index 5669e5b..e6a8232 100644
--- a/net/netfilter/xt_set.c
+++ b/net/netfilter/xt_set.c
@@ -423,6 +423,8 @@ struct ip_set_adt_opt n = { \
 
 /* Revision 3 target */
 
+#define MOPT(opt, member)  ((opt).ext.skbinfo.member)
+
 static unsigned int
 set_target_v3(struct sk_buff *skb, const struct xt_action_param *par)
 {
@@ -453,14 +455,14 @@ struct ip_set_adt_opt n = {   \
if (!ret)
return XT_CONTINUE;
if (map_opt.cmdflags & IPSET_FLAG_MAP_SKBMARK)
-   skb->mark = (skb->mark & ~(map_opt.ext.skbmarkmask))
-   ^ (map_opt.ext.skbmark);
+   skb->mark = (skb->mark & ~MOPT(map_opt,skbmarkmask))
+   ^ MOPT(map_opt, skbmark);
if (map_opt.cmdflags & IPSET_FLAG_MAP_SKBPRIO)
-   skb->priority = map_opt.ext.skbprio;
+   skb->priority = MOPT(map_opt, skbprio);
if ((map_opt.cmdflags & IPSET_FLAG_MAP_SKBQUEUE) &&
skb->dev &&
-   skb->dev->real_num_tx_queues > map_opt.ext.skbqueue)
-  

[PATCH 22/22] netfilter: ipset: hash: fix boolreturn.cocci warnings

2016-10-17 Thread Jozsef Kadlecsik
From: kbuild test robot 

net/netfilter/ipset/ip_set_hash_ipmac.c:70:8-9: WARNING: return of 0/1 in 
function 'hash_ipmac4_data_list' with return type bool
net/netfilter/ipset/ip_set_hash_ipmac.c:178:8-9: WARNING: return of 0/1 in 
function 'hash_ipmac6_data_list' with return type bool

 Return statements in functions returning bool should use
 true/false instead of 1/0.
Generated by: scripts/coccinelle/misc/boolreturn.cocci

CC: Tomasz Chilinski 
Signed-off-by: Fengguang Wu 
Signed-off-by: Jozsef Kadlecsik 
---
 net/netfilter/ipset/ip_set_hash_ipmac.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/netfilter/ipset/ip_set_hash_ipmac.c 
b/net/netfilter/ipset/ip_set_hash_ipmac.c
index d9eb144..1ab5ed2 100644
--- a/net/netfilter/ipset/ip_set_hash_ipmac.c
+++ b/net/netfilter/ipset/ip_set_hash_ipmac.c
@@ -67,10 +67,10 @@ struct hash_ipmac4_elem {
if (nla_put_ipaddr4(skb, IPSET_ATTR_IP, e->ip) ||
nla_put(skb, IPSET_ATTR_ETHER, ETH_ALEN, e->ether))
goto nla_put_failure;
-   return 0;
+   return false;
 
 nla_put_failure:
-   return 1;
+   return true;
 }
 
 static inline void
@@ -175,10 +175,10 @@ struct hash_ipmac6_elem {
if (nla_put_ipaddr6(skb, IPSET_ATTR_IP, >ip.in6) ||
nla_put(skb, IPSET_ATTR_ETHER, ETH_ALEN, e->ether))
goto nla_put_failure;
-   return 0;
+   return false;
 
 nla_put_failure:
-   return 1;
+   return true;
 }
 
 static inline void
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 01/22] netfilter: ipset: Correct rcu_dereference_bh_nfnl() usage

2016-10-17 Thread Jozsef Kadlecsik
When rcu_dereference_bh_nfnl() macro would be defined on the target
system it will accept pointer and subsystem id.

Check if rcu_dereference_bh_nfnl() is defined and make it accepting two
arguments.

Ported from a patch proposed by Sergey Popovich .

Suggested-by: Sergey Popovich 
Signed-off-by: Jozsef Kadlecsik 
---
 net/netfilter/ipset/ip_set_hash_gen.h | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/net/netfilter/ipset/ip_set_hash_gen.h 
b/net/netfilter/ipset/ip_set_hash_gen.h
index d32fd6b..bc54be4 100644
--- a/net/netfilter/ipset/ip_set_hash_gen.h
+++ b/net/netfilter/ipset/ip_set_hash_gen.h
@@ -17,7 +17,9 @@
 #define ipset_dereference_protected(p, set) \
__ipset_dereference_protected(p, spin_is_locked(&(set)->lock))
 
-#define rcu_dereference_bh_nfnl(p) rcu_dereference_bh_check(p, 1)
+#ifndef rcu_dereference_bh_nfnl
+#define rcu_dereference_bh_nfnl(p, ss) rcu_dereference_bh_check(p, 1)
+#endif
 
 /* Hashing which uses arrays to resolve clashing. The hash table is resized
  * (doubled) when searching becomes too long.
@@ -580,7 +582,7 @@ struct htype {
return -ENOMEM;
 #endif
rcu_read_lock_bh();
-   orig = rcu_dereference_bh_nfnl(h->table);
+   orig = rcu_dereference_bh_nfnl(h->table, NFNL_SUBSYS_IPSET);
htable_bits = orig->htable_bits;
rcu_read_unlock_bh();
 
@@ -1061,7 +1063,7 @@ struct htype {
u8 htable_bits;
 
rcu_read_lock_bh();
-   t = rcu_dereference_bh_nfnl(h->table);
+   t = rcu_dereference_bh_nfnl(h->table, NFNL_SUBSYS_IPSET);
memsize = mtype_ahash_memsize(h, t, NLEN(set->family), set->dsize);
htable_bits = t->htable_bits;
rcu_read_unlock_bh();
@@ -1103,7 +1105,7 @@ struct htype {
 
if (start) {
rcu_read_lock_bh();
-   t = rcu_dereference_bh_nfnl(h->table);
+   t = rcu_dereference_bh_nfnl(h->table, NFNL_SUBSYS_IPSET);
atomic_inc(>uref);
cb->args[IPSET_CB_PRIVATE] = (unsigned long)t;
rcu_read_unlock_bh();
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 11/22] netfilter: ipset: Simplify mtype_expire() for hash types

2016-10-17 Thread Jozsef Kadlecsik
Remove redundant parameters nets_length and dsize:
they could be get from other parameters.

Remove one leve of intendation by using continue while
iterating over elements in bucket.

Ported from a patch proposed by Sergey Popovich .

Signed-off-by: Jozsef Kadlecsik 
---
 net/netfilter/ipset/ip_set_hash_gen.h | 34 +-
 1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/net/netfilter/ipset/ip_set_hash_gen.h 
b/net/netfilter/ipset/ip_set_hash_gen.h
index 37afa68..79e158d 100644
--- a/net/netfilter/ipset/ip_set_hash_gen.h
+++ b/net/netfilter/ipset/ip_set_hash_gen.h
@@ -467,14 +467,15 @@ struct htype {
 
 /* Delete expired elements from the hashtable */
 static void
-mtype_expire(struct ip_set *set, struct htype *h, u8 nets_length, size_t dsize)
+mtype_expire(struct ip_set *set, struct htype *h)
 {
struct htable *t;
struct hbucket *n, *tmp;
struct mtype_elem *data;
u32 i, j, d;
+   size_t dsize = set->dsize;
 #ifdef IP_SET_HASH_WITH_NETS
-   u8 k;
+   u8 k, nets_length = NLEN(set->family);
 #endif
 
t = ipset_dereference_protected(h->table, set);
@@ -488,21 +489,20 @@ struct htype {
continue;
}
data = ahash_data(n, j, dsize);
-   if (ip_set_timeout_expired(ext_timeout(data, set))) {
-   pr_debug("expired %u/%u\n", i, j);
-   clear_bit(j, n->used);
-   smp_mb__after_atomic();
+   if (!ip_set_timeout_expired(ext_timeout(data, set)))
+   continue;
+   pr_debug("expired %u/%u\n", i, j);
+   clear_bit(j, n->used);
+   smp_mb__after_atomic();
 #ifdef IP_SET_HASH_WITH_NETS
-   for (k = 0; k < IPSET_NET_COUNT; k++)
-   mtype_del_cidr(h,
-   NCIDR_PUT(DCIDR_GET(data->cidr,
-   k)),
-   nets_length, k);
+   for (k = 0; k < IPSET_NET_COUNT; k++)
+   mtype_del_cidr(h,
+   NCIDR_PUT(DCIDR_GET(data->cidr, k)),
+   nets_length, k);
 #endif
-   ip_set_ext_destroy(set, data);
-   set->elements--;
-   d++;
-   }
+   ip_set_ext_destroy(set, data);
+   set->elements--;
+   d++;
}
if (d >= AHASH_INIT_SIZE) {
if (d >= n->size) {
@@ -541,7 +541,7 @@ struct htype {
 
pr_debug("called\n");
spin_lock_bh(>lock);
-   mtype_expire(set, h, NLEN(set->family), set->dsize);
+   mtype_expire(set, h);
spin_unlock_bh(>lock);
 
h->gc.expires = jiffies + IPSET_GC_PERIOD(set->timeout) * HZ;
@@ -717,7 +717,7 @@ struct htype {
if (set->elements >= h->maxelem) {
if (SET_WITH_TIMEOUT(set))
/* FIXME: when set is full, we slow down here */
-   mtype_expire(set, h, NLEN(set->family), set->dsize);
+   mtype_expire(set, h);
if (set->elements >= h->maxelem && SET_WITH_FORCEADD(set))
forceadd = true;
}
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 19/22] netfilter: ipset: use setup_timer() and mod_timer().

2016-10-17 Thread Jozsef Kadlecsik
From: Muhammad Falak R Wani 

Use setup_timer() and instead of init_timer(), being the preferred way
of setting up a timer.

Also, quoting the mod_timer() function comment:
-> mod_timer() is a more efficient way to update the expire field of an
   active timer (if the timer is inactive it will be activated).

Use setup_timer() and mod_timer() to setup and arm a timer, making the
code compact and easier to read.

Signed-off-by: Muhammad Falak R Wani 
Signed-off-by: Jozsef Kadlecsik 
---
 net/netfilter/ipset/ip_set_bitmap_gen.h | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/net/netfilter/ipset/ip_set_bitmap_gen.h 
b/net/netfilter/ipset/ip_set_bitmap_gen.h
index 5a9fa61..77dd415 100644
--- a/net/netfilter/ipset/ip_set_bitmap_gen.h
+++ b/net/netfilter/ipset/ip_set_bitmap_gen.h
@@ -41,11 +41,8 @@
 {
struct mtype *map = set->data;
 
-   init_timer(>gc);
-   map->gc.data = (unsigned long)set;
-   map->gc.function = gc;
-   map->gc.expires = jiffies + IPSET_GC_PERIOD(set->timeout) * HZ;
-   add_timer(>gc);
+   setup_timer(>gc, gc, (unsigned long)set);
+   mod_timer(>gc, jiffies + IPSET_GC_PERIOD(set->timeout) * HZ);
 }
 
 static void
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 12/22] netfilter: ipset: Make NLEN compile time constant for hash types

2016-10-17 Thread Jozsef Kadlecsik
Hash types define HOST_MASK before inclusion of ip_set_hash_gen.h
and the only place where NLEN needed to be calculated at runtime
is *_create() method.

Ported from a patch proposed by Sergey Popovich .

Signed-off-by: Jozsef Kadlecsik 
---
 net/netfilter/ipset/ip_set_hash_gen.h | 51 ---
 1 file changed, 23 insertions(+), 28 deletions(-)

diff --git a/net/netfilter/ipset/ip_set_hash_gen.h 
b/net/netfilter/ipset/ip_set_hash_gen.h
index 79e158d..ab5b57c 100644
--- a/net/netfilter/ipset/ip_set_hash_gen.h
+++ b/net/netfilter/ipset/ip_set_hash_gen.h
@@ -152,20 +152,18 @@ struct net_prefixes {
 #define INIT_CIDR(cidr, host_mask) \
DCIDR_PUT(((cidr) ? NCIDR_GET(cidr) : host_mask))
 
-#define SET_HOST_MASK(family)  (family == AF_INET ? 32 : 128)
-
 #ifdef IP_SET_HASH_WITH_NET0
-/* cidr from 0 to SET_HOST_MASK() value and c = cidr + 1 */
-#define NLEN(family)   (SET_HOST_MASK(family) + 1)
+/* cidr from 0 to HOST_MASK value and c = cidr + 1 */
+#define NLEN   (HOST_MASK + 1)
 #define CIDR_POS(c)((c) - 1)
 #else
-/* cidr from 1 to SET_HOST_MASK() value and c = cidr + 1 */
-#define NLEN(family)   SET_HOST_MASK(family)
+/* cidr from 1 to HOST_MASK value and c = cidr + 1 */
+#define NLEN   HOST_MASK
 #define CIDR_POS(c)((c) - 2)
 #endif
 
 #else
-#define NLEN(family)   0
+#define NLEN   0
 #endif /* IP_SET_HASH_WITH_NETS */
 
 #endif /* _IP_SET_HASH_GEN_H */
@@ -300,12 +298,12 @@ struct htype {
  * sized networks. cidr == real cidr + 1 to support /0.
  */
 static void
-mtype_add_cidr(struct htype *h, u8 cidr, u8 nets_length, u8 n)
+mtype_add_cidr(struct htype *h, u8 cidr, u8 n)
 {
int i, j;
 
/* Add in increasing prefix order, so larger cidr first */
-   for (i = 0, j = -1; i < nets_length && h->nets[i].cidr[n]; i++) {
+   for (i = 0, j = -1; i < NLEN && h->nets[i].cidr[n]; i++) {
if (j != -1) {
continue;
} else if (h->nets[i].cidr[n] < cidr) {
@@ -324,11 +322,11 @@ struct htype {
 }
 
 static void
-mtype_del_cidr(struct htype *h, u8 cidr, u8 nets_length, u8 n)
+mtype_del_cidr(struct htype *h, u8 cidr, u8 n)
 {
-   u8 i, j, net_end = nets_length - 1;
+   u8 i, j, net_end = NLEN - 1;
 
-   for (i = 0; i < nets_length; i++) {
+   for (i = 0; i < NLEN; i++) {
if (h->nets[i].cidr[n] != cidr)
continue;
h->nets[CIDR_POS(cidr)].nets[n]--;
@@ -344,13 +342,12 @@ struct htype {
 
 /* Calculate the actual memory size of the set data */
 static size_t
-mtype_ahash_memsize(const struct htype *h, const struct htable *t,
-   u8 nets_length)
+mtype_ahash_memsize(const struct htype *h, const struct htable *t)
 {
size_t memsize = sizeof(*h) + sizeof(*t);
 
 #ifdef IP_SET_HASH_WITH_NETS
-   memsize += sizeof(struct net_prefixes) * nets_length;
+   memsize += sizeof(struct net_prefixes) * NLEN;
 #endif
 
return memsize;
@@ -391,7 +388,7 @@ struct htype {
kfree_rcu(n, rcu);
}
 #ifdef IP_SET_HASH_WITH_NETS
-   memset(h->nets, 0, sizeof(struct net_prefixes) * NLEN(set->family));
+   memset(h->nets, 0, sizeof(struct net_prefixes) * NLEN);
 #endif
set->elements = 0;
set->ext_size = 0;
@@ -475,7 +472,7 @@ struct htype {
u32 i, j, d;
size_t dsize = set->dsize;
 #ifdef IP_SET_HASH_WITH_NETS
-   u8 k, nets_length = NLEN(set->family);
+   u8 k;
 #endif
 
t = ipset_dereference_protected(h->table, set);
@@ -498,7 +495,7 @@ struct htype {
for (k = 0; k < IPSET_NET_COUNT; k++)
mtype_del_cidr(h,
NCIDR_PUT(DCIDR_GET(data->cidr, k)),
-   nets_length, k);
+   k);
 #endif
ip_set_ext_destroy(set, data);
set->elements--;
@@ -778,7 +775,7 @@ struct htype {
for (i = 0; i < IPSET_NET_COUNT; i++)
mtype_del_cidr(h,
NCIDR_PUT(DCIDR_GET(data->cidr, i)),
-   NLEN(set->family), i);
+   i);
 #endif
ip_set_ext_destroy(set, data);
set->elements--;
@@ -814,8 +811,7 @@ struct htype {
set->elements++;
 #ifdef IP_SET_HASH_WITH_NETS
for (i = 0; i < IPSET_NET_COUNT; i++)
-   mtype_add_cidr(h, NCIDR_PUT(DCIDR_GET(d->cidr, i)),
-  NLEN(set->family), i);
+   mtype_add_cidr(h, NCIDR_PUT(DCIDR_GET(d->cidr, i)), i);
 #endif
memcpy(data, d, sizeof(struct mtype_elem));
 overwrite_extensions:
@@ -888,7 +884,7 @@ struct htype {
 

[PATCH 15/22] netfilter: ipset: Make struct htype per ipset family

2016-10-17 Thread Jozsef Kadlecsik
Before this patch struct htype created at the first source
of ip_set_hash_gen.h and it is common for both IPv4 and IPv6
set variants.

Make struct htype per ipset family and use NLEN to make
nets array fixed size to simplify struct htype allocation.

Ported from a patch proposed by Sergey Popovich .

Signed-off-by: Jozsef Kadlecsik 
---
 net/netfilter/ipset/ip_set_hash_gen.h| 51 +++-
 net/netfilter/ipset/ip_set_hash_ip.c | 10 +++---
 net/netfilter/ipset/ip_set_hash_ipmark.c | 10 +++---
 net/netfilter/ipset/ip_set_hash_ipport.c |  6 ++--
 net/netfilter/ipset/ip_set_hash_ipportip.c   |  6 ++--
 net/netfilter/ipset/ip_set_hash_ipportnet.c  | 10 +++---
 net/netfilter/ipset/ip_set_hash_net.c|  8 ++---
 net/netfilter/ipset/ip_set_hash_netiface.c   |  8 ++---
 net/netfilter/ipset/ip_set_hash_netnet.c |  8 ++---
 net/netfilter/ipset/ip_set_hash_netport.c| 10 +++---
 net/netfilter/ipset/ip_set_hash_netportnet.c | 10 +++---
 11 files changed, 63 insertions(+), 74 deletions(-)

diff --git a/net/netfilter/ipset/ip_set_hash_gen.h 
b/net/netfilter/ipset/ip_set_hash_gen.h
index cc9208b..0082ccf 100644
--- a/net/netfilter/ipset/ip_set_hash_gen.h
+++ b/net/netfilter/ipset/ip_set_hash_gen.h
@@ -168,6 +168,18 @@ struct net_prefixes {
 
 #endif /* _IP_SET_HASH_GEN_H */
 
+#ifndef MTYPE
+#error "MTYPE is not defined!"
+#endif
+
+#ifndef HTYPE
+#error "HTYPE is not defined!"
+#endif
+
+#ifndef HOST_MASK
+#error "HOST_MASK is not defined!"
+#endif
+
 /* Family dependent templates */
 
 #undef ahash_data
@@ -191,7 +203,6 @@ struct net_prefixes {
 #undef mtype_same_set
 #undef mtype_kadt
 #undef mtype_uadt
-#undef mtype
 
 #undef mtype_add
 #undef mtype_del
@@ -207,6 +218,7 @@ struct net_prefixes {
 #undef mtype_variant
 #undef mtype_data_match
 
+#undef htype
 #undef HKEY
 
 #define mtype_data_equal   IPSET_TOKEN(MTYPE, _data_equal)
@@ -233,7 +245,6 @@ struct net_prefixes {
 #define mtype_same_set IPSET_TOKEN(MTYPE, _same_set)
 #define mtype_kadt IPSET_TOKEN(MTYPE, _kadt)
 #define mtype_uadt IPSET_TOKEN(MTYPE, _uadt)
-#define mtype  MTYPE
 
 #define mtype_add  IPSET_TOKEN(MTYPE, _add)
 #define mtype_del  IPSET_TOKEN(MTYPE, _del)
@@ -249,18 +260,12 @@ struct net_prefixes {
 #define mtype_variant  IPSET_TOKEN(MTYPE, _variant)
 #define mtype_data_match   IPSET_TOKEN(MTYPE, _data_match)
 
-#ifndef MTYPE
-#error "MTYPE is not defined!"
-#endif
-
-#ifndef HOST_MASK
-#error "HOST_MASK is not defined!"
-#endif
-
 #ifndef HKEY_DATALEN
 #define HKEY_DATALEN   sizeof(struct mtype_elem)
 #endif
 
+#define htype  MTYPE
+
 #define HKEY(data, initval, htable_bits)   \
 ({ \
const u32 *__k = (const u32 *)data; \
@@ -271,33 +276,26 @@ struct net_prefixes {
jhash2(__k, __l, initval) & jhash_mask(htable_bits);\
 })
 
-#ifndef htype
-#ifndef HTYPE
-#error "HTYPE is not defined!"
-#endif /* HTYPE */
-#define htype  HTYPE
-
 /* The generic hash structure */
 struct htype {
struct htable __rcu *table; /* the hash table */
+   struct timer_list gc;   /* garbage collection when timeout enabled */
u32 maxelem;/* max elements in the hash */
u32 initval;/* random jhash init value */
 #ifdef IP_SET_HASH_WITH_MARKMASK
u32 markmask;   /* markmask value for mark mask to store */
 #endif
-   struct timer_list gc;   /* garbage collection when timeout enabled */
-   struct mtype_elem next; /* temporary storage for uadd */
 #ifdef IP_SET_HASH_WITH_MULTI
u8 ahash_max;   /* max elements in an array block */
 #endif
 #ifdef IP_SET_HASH_WITH_NETMASK
u8 netmask; /* netmask value for subnets to store */
 #endif
+   struct mtype_elem next; /* temporary storage for uadd */
 #ifdef IP_SET_HASH_WITH_NETS
-   struct net_prefixes nets[0]; /* book-keeping of prefixes */
+   struct net_prefixes nets[NLEN]; /* book-keeping of prefixes */
 #endif
 };
-#endif /* htype */
 
 #ifdef IP_SET_HASH_WITH_NETS
 /* Network cidr size book keeping when the hash stores different
@@ -350,13 +348,7 @@ struct htype {
 static size_t
 mtype_ahash_memsize(const struct htype *h, const struct htable *t)
 {
-   size_t memsize = sizeof(*h) + sizeof(*t);
-
-#ifdef IP_SET_HASH_WITH_NETS
-   memsize += sizeof(struct net_prefixes) * NLEN;
-#endif
-
-   return memsize;
+   return sizeof(*h) + sizeof(*t);
 }
 
 /* Get the ith element from the array block n */
@@ -394,7 +386,7 @@ struct htype {
kfree_rcu(n, rcu);
}
 #ifdef IP_SET_HASH_WITH_NETS
-   memset(h->nets, 0, sizeof(struct net_prefixes) * NLEN);
+   memset(h->nets, 0, sizeof(h->nets));
 #endif
set->elements = 0;
   

[PATCH 04/22] netfilter: ipset: Improve comment extension helpers

2016-10-17 Thread Jozsef Kadlecsik
Allocate memory with kmalloc() rather than kzalloc().

Ported from a patch proposed by Sergey Popovich .

Suggested-by: Sergey Popovich 
Signed-off-by: Jozsef Kadlecsik 
---
 include/linux/netfilter/ipset/ip_set_comment.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/netfilter/ipset/ip_set_comment.h 
b/include/linux/netfilter/ipset/ip_set_comment.h
index bae5c76..5444b1b 100644
--- a/include/linux/netfilter/ipset/ip_set_comment.h
+++ b/include/linux/netfilter/ipset/ip_set_comment.h
@@ -34,7 +34,7 @@
return;
if (unlikely(len > IPSET_MAX_COMMENT_SIZE))
len = IPSET_MAX_COMMENT_SIZE;
-   c = kzalloc(sizeof(*c) + len + 1, GFP_ATOMIC);
+   c = kmalloc(sizeof(*c) + len + 1, GFP_ATOMIC);
if (unlikely(!c))
return;
strlcpy(c->str, ext->comment, len + 1);
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 05/22] netfilter: ipset: Split extensions into separate files

2016-10-17 Thread Jozsef Kadlecsik
Ported from a patch proposed by Sergey Popovich .

Suggested-by: Sergey Popovich 
Signed-off-by: Jozsef Kadlecsik 
---
 include/linux/netfilter/ipset/ip_set.h | 95 +-
 include/linux/netfilter/ipset/ip_set_counter.h | 75 
 include/linux/netfilter/ipset/ip_set_skbinfo.h | 46 +
 3 files changed, 123 insertions(+), 93 deletions(-)
 create mode 100644 include/linux/netfilter/ipset/ip_set_counter.h
 create mode 100644 include/linux/netfilter/ipset/ip_set_skbinfo.h

diff --git a/include/linux/netfilter/ipset/ip_set.h 
b/include/linux/netfilter/ipset/ip_set.h
index 7802621..b5bd0fb3 100644
--- a/include/linux/netfilter/ipset/ip_set.h
+++ b/include/linux/netfilter/ipset/ip_set.h
@@ -292,99 +292,6 @@ struct ip_set {
return nla_put_net32(skb, IPSET_ATTR_CADT_FLAGS, htonl(cadt_flags));
 }
 
-static inline void
-ip_set_add_bytes(u64 bytes, struct ip_set_counter *counter)
-{
-   atomic64_add((long long)bytes, &(counter)->bytes);
-}
-
-static inline void
-ip_set_add_packets(u64 packets, struct ip_set_counter *counter)
-{
-   atomic64_add((long long)packets, &(counter)->packets);
-}
-
-static inline u64
-ip_set_get_bytes(const struct ip_set_counter *counter)
-{
-   return (u64)atomic64_read(&(counter)->bytes);
-}
-
-static inline u64
-ip_set_get_packets(const struct ip_set_counter *counter)
-{
-   return (u64)atomic64_read(&(counter)->packets);
-}
-
-static inline void
-ip_set_update_counter(struct ip_set_counter *counter,
- const struct ip_set_ext *ext,
- struct ip_set_ext *mext, u32 flags)
-{
-   if (ext->packets != ULLONG_MAX &&
-   !(flags & IPSET_FLAG_SKIP_COUNTER_UPDATE)) {
-   ip_set_add_bytes(ext->bytes, counter);
-   ip_set_add_packets(ext->packets, counter);
-   }
-   if (flags & IPSET_FLAG_MATCH_COUNTERS) {
-   mext->packets = ip_set_get_packets(counter);
-   mext->bytes = ip_set_get_bytes(counter);
-   }
-}
-
-static inline bool
-ip_set_put_counter(struct sk_buff *skb, const struct ip_set_counter *counter)
-{
-   return nla_put_net64(skb, IPSET_ATTR_BYTES,
-cpu_to_be64(ip_set_get_bytes(counter)),
-IPSET_ATTR_PAD) ||
-  nla_put_net64(skb, IPSET_ATTR_PACKETS,
-cpu_to_be64(ip_set_get_packets(counter)),
-IPSET_ATTR_PAD);
-}
-
-static inline void
-ip_set_init_counter(struct ip_set_counter *counter,
-   const struct ip_set_ext *ext)
-{
-   if (ext->bytes != ULLONG_MAX)
-   atomic64_set(&(counter)->bytes, (long long)(ext->bytes));
-   if (ext->packets != ULLONG_MAX)
-   atomic64_set(&(counter)->packets, (long long)(ext->packets));
-}
-
-static inline void
-ip_set_get_skbinfo(struct ip_set_skbinfo *skbinfo,
-  const struct ip_set_ext *ext,
-  struct ip_set_ext *mext, u32 flags)
-{
-   mext->skbinfo = *skbinfo;
-}
-
-static inline bool
-ip_set_put_skbinfo(struct sk_buff *skb, const struct ip_set_skbinfo *skbinfo)
-{
-   /* Send nonzero parameters only */
-   return ((skbinfo->skbmark || skbinfo->skbmarkmask) &&
-   nla_put_net64(skb, IPSET_ATTR_SKBMARK,
- cpu_to_be64((u64)skbinfo->skbmark << 32 |
- skbinfo->skbmarkmask),
- IPSET_ATTR_PAD)) ||
-  (skbinfo->skbprio &&
-   nla_put_net32(skb, IPSET_ATTR_SKBPRIO,
- cpu_to_be32(skbinfo->skbprio))) ||
-  (skbinfo->skbqueue &&
-   nla_put_net16(skb, IPSET_ATTR_SKBQUEUE,
-cpu_to_be16(skbinfo->skbqueue)));
-}
-
-static inline void
-ip_set_init_skbinfo(struct ip_set_skbinfo *skbinfo,
-   const struct ip_set_ext *ext)
-{
-   *skbinfo = ext->skbinfo;
-}
-
 /* Netlink CB args */
 enum {
IPSET_CB_NET = 0,   /* net namespace */
@@ -539,6 +446,8 @@ static inline int nla_put_ipaddr6(struct sk_buff *skb, int 
type,
 
 #include 
 #include 
+#include 
+#include 
 
 int
 ip_set_put_extensions(struct sk_buff *skb, const struct ip_set *set,
diff --git a/include/linux/netfilter/ipset/ip_set_counter.h 
b/include/linux/netfilter/ipset/ip_set_counter.h
new file mode 100644
index 000..2b5e784
--- /dev/null
+++ b/include/linux/netfilter/ipset/ip_set_counter.h
@@ -0,0 +1,75 @@
+#ifndef _IP_SET_COUNTER_H
+#define _IP_SET_COUNTER_H
+
+/* Copyright (C) 2015 Sergey Popovich 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#ifdef __KERNEL__
+
+static inline void
+ip_set_add_bytes(u64 bytes, struct 

[PATCH 13/22] netfilter: ipset: Make sure element data size is a multiple of u32

2016-10-17 Thread Jozsef Kadlecsik
Data for hashing required to be array of u32. Make sure that
element data always multiple of u32.

Ported from a patch proposed by Sergey Popovich .

Signed-off-by: Jozsef Kadlecsik 
---
 net/netfilter/ipset/ip_set_hash_gen.h | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/net/netfilter/ipset/ip_set_hash_gen.h 
b/net/netfilter/ipset/ip_set_hash_gen.h
index ab5b57c..e2f4925 100644
--- a/net/netfilter/ipset/ip_set_hash_gen.h
+++ b/net/netfilter/ipset/ip_set_hash_gen.h
@@ -262,8 +262,14 @@ struct net_prefixes {
 #endif
 
 #define HKEY(data, initval, htable_bits)   \
-(jhash2((u32 *)(data), HKEY_DATALEN / sizeof(u32), initval)\
-   & jhash_mask(htable_bits))
+({ \
+   const u32 *__k = (const u32 *)data; \
+   u32 __l = HKEY_DATALEN / sizeof(u32);   \
+   \
+   BUILD_BUG_ON(HKEY_DATALEN % sizeof(u32) != 0);  \
+   \
+   jhash2(__k, __l, initval) & jhash_mask(htable_bits);\
+})
 
 #ifndef htype
 #ifndef HTYPE
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 21/22] netfilter: ipset: use setup_timer() and mod_timer().

2016-10-17 Thread Jozsef Kadlecsik
From: Muhammad Falak R Wani 

Use setup_timer() and instead of init_timer(), being the preferred way
of setting up a timer.

Also, quoting the mod_timer() function comment:
-> mod_timer() is a more efficient way to update the expire field of an
   active timer (if the timer is inactive it will be activated).

Use setup_timer() and mod_timer() to setup and arm a timer, making the
code compact and easier to read.

Signed-off-by: Muhammad Falak R Wani 
Signed-off-by: Jozsef Kadlecsik 
---
 net/netfilter/ipset/ip_set_list_set.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/net/netfilter/ipset/ip_set_list_set.c 
b/net/netfilter/ipset/ip_set_list_set.c
index dede343..51077c5 100644
--- a/net/netfilter/ipset/ip_set_list_set.c
+++ b/net/netfilter/ipset/ip_set_list_set.c
@@ -586,11 +586,8 @@ struct list_set {
 {
struct list_set *map = set->data;
 
-   init_timer(>gc);
-   map->gc.data = (unsigned long)set;
-   map->gc.function = gc;
-   map->gc.expires = jiffies + IPSET_GC_PERIOD(set->timeout) * HZ;
-   add_timer(>gc);
+   setup_timer(>gc, gc, (unsigned long)set);
+   mod_timer(>gc, jiffies + IPSET_GC_PERIOD(set->timeout) * HZ);
 }
 
 /* Create list:set type of sets */
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 20/22] netfilter: ipset: use setup_timer() and mod_timer().

2016-10-17 Thread Jozsef Kadlecsik
From: Muhammad Falak R Wani 

Use setup_timer() and instead of init_timer(), being the preferred way
of setting up a timer.

Also, quoting the mod_timer() function comment:
-> mod_timer() is a more efficient way to update the expire field of an
   active timer (if the timer is inactive it will be activated).

Use setup_timer() and mod_timer() to setup and arm a timer, making the
code compact and easier to read.

Signed-off-by: Muhammad Falak R Wani 
Signed-off-by: Jozsef Kadlecsik 
---
 net/netfilter/ipset/ip_set_hash_gen.h | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/net/netfilter/ipset/ip_set_hash_gen.h 
b/net/netfilter/ipset/ip_set_hash_gen.h
index 295ad84..0d5f83e 100644
--- a/net/netfilter/ipset/ip_set_hash_gen.h
+++ b/net/netfilter/ipset/ip_set_hash_gen.h
@@ -435,11 +435,8 @@ struct htype {
 {
struct htype *h = set->data;
 
-   init_timer(>gc);
-   h->gc.data = (unsigned long)set;
-   h->gc.function = gc;
-   h->gc.expires = jiffies + IPSET_GC_PERIOD(set->timeout) * HZ;
-   add_timer(>gc);
+   setup_timer(>gc, gc, (unsigned long)set);
+   mod_timer(>gc, jiffies + IPSET_GC_PERIOD(set->timeout) * HZ);
pr_debug("gc initialized, run in every %u\n",
 IPSET_GC_PERIOD(set->timeout));
 }
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 06/22] netfilter: ipset: Separate memsize calculation code into dedicated function

2016-10-17 Thread Jozsef Kadlecsik
Hash types already has it's memsize calculation code in separate
functions. Do the same for *bitmap* and *list* sets.

Ported from a patch proposed by Sergey Popovich .

Suggested-by: Sergey Popovich 
Signed-off-by: Jozsef Kadlecsik 
---
 net/netfilter/ipset/ip_set_bitmap_gen.h | 13 -
 net/netfilter/ipset/ip_set_list_set.c   | 23 +--
 2 files changed, 29 insertions(+), 7 deletions(-)

diff --git a/net/netfilter/ipset/ip_set_bitmap_gen.h 
b/net/netfilter/ipset/ip_set_bitmap_gen.h
index 2e8e7e5..c22cdde 100644
--- a/net/netfilter/ipset/ip_set_bitmap_gen.h
+++ b/net/netfilter/ipset/ip_set_bitmap_gen.h
@@ -22,6 +22,7 @@
 #define mtype_kadt IPSET_TOKEN(MTYPE, _kadt)
 #define mtype_uadt IPSET_TOKEN(MTYPE, _uadt)
 #define mtype_destroy  IPSET_TOKEN(MTYPE, _destroy)
+#define mtype_memsize  IPSET_TOKEN(MTYPE, _memsize)
 #define mtype_flushIPSET_TOKEN(MTYPE, _flush)
 #define mtype_head IPSET_TOKEN(MTYPE, _head)
 #define mtype_same_set IPSET_TOKEN(MTYPE, _same_set)
@@ -84,12 +85,22 @@
memset(map->members, 0, map->memsize);
 }
 
+/* Calculate the actual memory size of the set data */
+static size_t
+mtype_memsize(const struct mtype *map, size_t dsize)
+{
+   size_t memsize = sizeof(*map) +
+map->memsize +
+map->elements * dsize;
+   return memsize;
+}
+
 static int
 mtype_head(struct ip_set *set, struct sk_buff *skb)
 {
const struct mtype *map = set->data;
struct nlattr *nested;
-   size_t memsize = sizeof(*map) + map->memsize;
+   size_t memsize = mtype_memsize(map, set->dsize);
 
nested = ipset_nest_start(skb, IPSET_ATTR_DATA);
if (!nested)
diff --git a/net/netfilter/ipset/ip_set_list_set.c 
b/net/netfilter/ipset/ip_set_list_set.c
index a2a89e4..462b0b1 100644
--- a/net/netfilter/ipset/ip_set_list_set.c
+++ b/net/netfilter/ipset/ip_set_list_set.c
@@ -441,12 +441,12 @@ struct list_set {
set->data = NULL;
 }
 
-static int
-list_set_head(struct ip_set *set, struct sk_buff *skb)
+/* Calculate the actual memory size of the set data */
+static size_t
+list_set_memsize(const struct list_set *map, size_t dsize)
 {
-   const struct list_set *map = set->data;
-   struct nlattr *nested;
struct set_elem *e;
+   size_t memsize;
u32 n = 0;
 
rcu_read_lock();
@@ -454,13 +454,24 @@ struct list_set {
n++;
rcu_read_unlock();
 
+   memsize = sizeof(*map) + n * dsize;
+
+   return memsize;
+}
+
+static int
+list_set_head(struct ip_set *set, struct sk_buff *skb)
+{
+   const struct list_set *map = set->data;
+   struct nlattr *nested;
+   size_t memsize = list_set_memsize(map, set->dsize);
+
nested = ipset_nest_start(skb, IPSET_ATTR_DATA);
if (!nested)
goto nla_put_failure;
if (nla_put_net32(skb, IPSET_ATTR_SIZE, htonl(map->size)) ||
nla_put_net32(skb, IPSET_ATTR_REFERENCES, htonl(set->ref)) ||
-   nla_put_net32(skb, IPSET_ATTR_MEMSIZE,
- htonl(sizeof(*map) + n * set->dsize)))
+   nla_put_net32(skb, IPSET_ATTR_MEMSIZE, htonl(memsize)))
goto nla_put_failure;
if (unlikely(ip_set_put_flags(skb, set)))
goto nla_put_failure;
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/10, nf-next] Netfilter core updates

2016-10-17 Thread Florian Westphal
Pablo Neira Ayuso  wrote:
> Let me know if you have any comment, otherwise I'll place this in the
> nf-next tree so we can follow up working on top of these.

Please do, thanks!
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 00/10, nf-next] Netfilter core updates

2016-10-17 Thread Pablo Neira Ayuso
This is second round of patches to improve Netfilter hooks performance,
following several of the ideas that we discussed during NetDev 1.2. This
patchset implements the following:

1) Deprecate NF_STOP, as this is only used by br_netfilter.

2) Remove threshold handling, this is also only used by br_netfilter
   too.

3) Place nf_state_hook pointer into xt_action_param structure, so
   this structure fits into one single cacheline according to pahole.
   This also implicit affects nftables since it also relies on the
   xt_action_param structure.

4) Move state->hook_entries into nf_queue entry. The hook_entries
   pointer is only required by nf_queue(), so we can store this in the
   queue entry instead.

5) Handle queue bypass flag from nf_queue(), to keep this little
   nf_queue specific handling away from the core path.

6) Merge nf_iterate() into nf_hook_slow() that results in a much more
   simple and readable function.

I have kept back the patches that move NF_QUEUE handling away from the
core and nf_hook_slow() inlining, I would like to explore other options
before following this path.

Using this simple drop-all packets ruleset from ingress:

nft add table netdev x
nft add chain netdev x y { type filter hook ingress device eth0 
priority 0\; }
nft add rule netdev x y drop

I generated traffic through Jesper Brouer's
samples/pktgen/pktgen_bench_xmit_mode_netif_receive.sh script using -i
option. perf report shows nf_tables calls in its top 10:

17.30%  kpktgend_0   [nf_tables][k] nft_do_chain
15.75%  kpktgend_0   [kernel.vmlinux]   [k] __netif_receive_skb_core
10.39%  kpktgend_0   [nf_tables_netdev] [k] nft_do_chain_netdev

I'm measuring here an improvement of ~15% in performance with this
patchset, so we got +2.5Mpps more. I have used my old laptop Intel(R)
Core(TM) i5-3320M CPU @ 2.60GHz 4-cores.

Let me know if you have any comment, otherwise I'll place this in the
nf-next tree so we can follow up working on top of these.

Thanks!

Pablo Neira Ayuso (10):
  netfilter: get rid of useless debugging from core
  netfilter: remove comments that predate rcu days
  netfilter: kill NF_HOOK_THRESH() and state->tresh
  netfilter: deprecate NF_STOP
  netfilter: x_tables: move hook state into xt_action_param structure
  netfilter: nf_tables: use hook state from xt_action_param structure
  netfilter: use switch() to handle verdict cases from nf_hook_slow()
  netfilter: remove hook_entries field from nf_hook_state
  netfilter: handle queue bypass flag from nf_queue
  netfilter: merge nf_iterate() into nf_hook_slow()

 include/linux/netfilter.h  | 58 ++-
 include/linux/netfilter/x_tables.h | 48 
 include/linux/netfilter_ingress.h  |  4 +-
 include/net/netfilter/nf_queue.h   |  1 +
 include/net/netfilter/nf_tables.h  | 36 
 include/uapi/linux/netfilter.h |  2 +-
 net/bridge/br_netfilter_hooks.c| 16 +++---
 net/bridge/netfilter/ebt_arpreply.c|  3 +-
 net/bridge/netfilter/ebt_log.c | 11 ++--
 net/bridge/netfilter/ebt_nflog.c   |  6 +-
 net/bridge/netfilter/ebt_redirect.c|  6 +-
 net/bridge/netfilter/ebtable_broute.c  |  2 +-
 net/bridge/netfilter/ebtables.c|  6 +-
 net/bridge/netfilter/nft_meta_bridge.c |  2 +-
 net/bridge/netfilter/nft_reject_bridge.c   | 30 ++
 net/ipv4/netfilter/arp_tables.c|  6 +-
 net/ipv4/netfilter/ip_tables.c |  6 +-
 net/ipv4/netfilter/ipt_MASQUERADE.c|  3 +-
 net/ipv4/netfilter/ipt_REJECT.c|  4 +-
 net/ipv4/netfilter/ipt_SYNPROXY.c  |  4 +-
 net/ipv4/netfilter/ipt_rpfilter.c  |  2 +-
 net/ipv4/netfilter/nft_dup_ipv4.c  |  2 +-
 net/ipv4/netfilter/nft_masq_ipv4.c |  4 +-
 net/ipv4/netfilter/nft_redir_ipv4.c|  3 +-
 net/ipv4/netfilter/nft_reject_ipv4.c   |  4 +-
 net/ipv6/netfilter/ip6_tables.c|  6 +-
 net/ipv6/netfilter/ip6t_MASQUERADE.c   |  2 +-
 net/ipv6/netfilter/ip6t_REJECT.c   | 23 +---
 net/ipv6/netfilter/ip6t_SYNPROXY.c |  4 +-
 net/ipv6/netfilter/ip6t_rpfilter.c |  3 +-
 net/ipv6/netfilter/nft_dup_ipv6.c  |  2 +-
 net/ipv6/netfilter/nft_masq_ipv6.c |  3 +-
 net/ipv6/netfilter/nft_redir_ipv6.c|  3 +-
 net/ipv6/netfilter/nft_reject_ipv6.c   |  6 +-
 net/netfilter/core.c   | 92 ++
 net/netfilter/ipset/ip_set_core.c  |  6 +-
 net/netfilter/ipset/ip_set_hash_netiface.c |  2 +-
 net/netfilter/nf_dup_netdev.c  |  2 +-
 net/netfilter/nf_internals.h   |  9 +--
 net/netfilter/nf_queue.c   | 70 +++
 net/netfilter/nf_tables_core.c | 10 ++--
 net/netfilter/nf_tables_trace.c|  8 +--
 net/netfilter/nfnetlink_queue.c|  2 +-
 net/netfilter/nft_log.c  

[PATCH 10/10] netfilter: merge nf_iterate() into nf_hook_slow()

2016-10-17 Thread Pablo Neira Ayuso
nf_iterate() has become rather simple, we can integrate this code into
nf_hook_slow() to reduce the amount of LOC in the core path.

However, we still need nf_iterate() around for nf_queue packet handling,
so move this function there where we only need it. I think it should be
possible to refactor nf_queue code to get rid of it definitely, but
given this is slow path anyway, let's have a look this later.

Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/core.c | 72 +---
 net/netfilter/nf_internals.h |  5 ---
 net/netfilter/nf_queue.c | 20 
 3 files changed, 48 insertions(+), 49 deletions(-)

diff --git a/net/netfilter/core.c b/net/netfilter/core.c
index f299fbde150d..5f015b1948f7 100644
--- a/net/netfilter/core.c
+++ b/net/netfilter/core.c
@@ -302,26 +302,6 @@ void _nf_unregister_hooks(struct nf_hook_ops *reg, 
unsigned int n)
 }
 EXPORT_SYMBOL(_nf_unregister_hooks);
 
-unsigned int nf_iterate(struct sk_buff *skb,
-   struct nf_hook_state *state,
-   struct nf_hook_entry **entryp)
-{
-   unsigned int verdict;
-
-   do {
-repeat:
-   verdict = (*entryp)->ops.hook((*entryp)->ops.priv, skb, state);
-   if (verdict != NF_ACCEPT) {
-   if (verdict != NF_REPEAT)
-   return verdict;
-   goto repeat;
-   }
-   *entryp = rcu_dereference((*entryp)->next);
-   } while (*entryp);
-   return NF_ACCEPT;
-}
-
-
 /* Returns 1 if okfn() needs to be executed by the caller,
  * -EPERM for NF_DROP, 0 otherwise.  Caller must hold rcu_read_lock. */
 int nf_hook_slow(struct sk_buff *skb, struct nf_hook_state *state,
@@ -330,31 +310,35 @@ int nf_hook_slow(struct sk_buff *skb, struct 
nf_hook_state *state,
unsigned int verdict;
int ret;
 
+   do {
+   verdict = entry->ops.hook(entry->ops.priv, skb, state);
+   switch (verdict & NF_VERDICT_MASK) {
+   case NF_ACCEPT:
 next_hook:
-   verdict = nf_iterate(skb, state, );
-   switch (verdict & NF_VERDICT_MASK) {
-   case NF_ACCEPT:
-   ret = 1;
-   break;
-   case NF_DROP:
-   kfree_skb(skb);
-   ret = NF_DROP_GETERR(verdict);
-   if (ret == 0)
-   ret = -EPERM;
-   break;
-   case NF_QUEUE:
-   ret = nf_queue(skb, state, entry, verdict);
-   if (ret == 1)
-   goto next_hook;
-   break;
-   default:
-   /* Implicit handling for NF_STOLEN, as well as any other non
-* conventional verdicts.
-*/
-   ret = 0;
-   break;
-   }
-   return ret;
+   entry = rcu_dereference(entry->next);
+   break;
+   case NF_DROP:
+   kfree_skb(skb);
+   ret = NF_DROP_GETERR(verdict);
+   if (ret == 0)
+   ret = -EPERM;
+   return ret;
+   case NF_REPEAT:
+   continue;
+   case NF_QUEUE:
+   ret = nf_queue(skb, state, entry, verdict);
+   if (ret == 1)
+   goto next_hook;
+   return ret;
+   default:
+   /* Implicit handling for NF_STOLEN, as well as any other
+* non conventional verdicts.
+*/
+   return 0;
+   }
+   } while (entry);
+
+   return 1;
 }
 EXPORT_SYMBOL(nf_hook_slow);
 
diff --git a/net/netfilter/nf_internals.h b/net/netfilter/nf_internals.h
index a46f2635b71f..78a59a23421f 100644
--- a/net/netfilter/nf_internals.h
+++ b/net/netfilter/nf_internals.h
@@ -11,11 +11,6 @@
 #define NFDEBUG(format, args...)
 #endif
 
-
-/* core.c */
-unsigned int nf_iterate(struct sk_buff *skb, struct nf_hook_state *state,
-   struct nf_hook_entry **entryp);
-
 /* nf_queue.c */
 int nf_queue(struct sk_buff *skb, const struct nf_hook_state *state,
 struct nf_hook_entry *entry, unsigned int verdict);
diff --git a/net/netfilter/nf_queue.c b/net/netfilter/nf_queue.c
index c5e0d534d352..25ad36f519f7 100644
--- a/net/netfilter/nf_queue.c
+++ b/net/netfilter/nf_queue.c
@@ -177,6 +177,26 @@ int nf_queue(struct sk_buff *skb, const struct 
nf_hook_state *state,
return 0;
 }
 
+static unsigned int nf_iterate(struct sk_buff *skb,
+  struct nf_hook_state *state,
+  struct nf_hook_entry **entryp)
+{
+   unsigned int verdict;
+
+   do {
+repeat:
+   verdict = (*entryp)->ops.hook((*entryp)->ops.priv, skb, state);
+   if (verdict != NF_ACCEPT) {
+

[PATCH 02/10] netfilter: remove comments that predate rcu days

2016-10-17 Thread Pablo Neira Ayuso
We cannot block/sleep on nf_iterate because netfilter runs under rcu
read lock these days, where blocking is well-known to be illegal. So
let's remove these old comments.

Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/core.c | 7 ---
 1 file changed, 7 deletions(-)

diff --git a/net/netfilter/core.c b/net/netfilter/core.c
index 7b723bcd2522..b193bd46ac30 100644
--- a/net/netfilter/core.c
+++ b/net/netfilter/core.c
@@ -308,18 +308,11 @@ unsigned int nf_iterate(struct sk_buff *skb,
 {
unsigned int verdict;
 
-   /*
-* The caller must not block between calls to this
-* function because of risk of continuing from deleted element.
-*/
while (*entryp) {
if (state->thresh > (*entryp)->ops.priority) {
*entryp = rcu_dereference((*entryp)->next);
continue;
}
-
-   /* Optimization: we don't need to hold module
-  reference here, since function can't sleep. --RR */
 repeat:
verdict = (*entryp)->ops.hook((*entryp)->ops.priv, skb, state);
if (verdict != NF_ACCEPT) {
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 07/10] netfilter: use switch() to handle verdict cases from nf_hook_slow()

2016-10-17 Thread Pablo Neira Ayuso
Use switch() for verdict handling and add explicit handling for
NF_STOLEN and other non-conventional verdicts.

Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/core.c | 28 ++--
 1 file changed, 18 insertions(+), 10 deletions(-)

diff --git a/net/netfilter/core.c b/net/netfilter/core.c
index 2a6ed7d29c6c..2b3b2f8e39c4 100644
--- a/net/netfilter/core.c
+++ b/net/netfilter/core.c
@@ -328,29 +328,37 @@ int nf_hook_slow(struct sk_buff *skb, struct 
nf_hook_state *state)
 {
struct nf_hook_entry *entry;
unsigned int verdict;
-   int ret = 0;
+   int ret;
 
entry = rcu_dereference(state->hook_entries);
 next_hook:
verdict = nf_iterate(skb, state, );
-   if (verdict == NF_ACCEPT) {
+   switch (verdict & NF_VERDICT_MASK) {
+   case NF_ACCEPT:
ret = 1;
-   } else if ((verdict & NF_VERDICT_MASK) == NF_DROP) {
+   break;
+   case NF_DROP:
kfree_skb(skb);
ret = NF_DROP_GETERR(verdict);
if (ret == 0)
ret = -EPERM;
-   } else if ((verdict & NF_VERDICT_MASK) == NF_QUEUE) {
-   int err;
-
+   break;
+   case NF_QUEUE:
RCU_INIT_POINTER(state->hook_entries, entry);
-   err = nf_queue(skb, state, verdict >> NF_VERDICT_QBITS);
-   if (err < 0) {
-   if (err == -ESRCH &&
-  (verdict & NF_VERDICT_FLAG_QUEUE_BYPASS))
+   ret = nf_queue(skb, state, verdict >> NF_VERDICT_QBITS);
+   if (ret < 0) {
+   if (ret == -ESRCH &&
+   (verdict & NF_VERDICT_FLAG_QUEUE_BYPASS))
goto next_hook;
kfree_skb(skb);
}
+   /* Fall through. */
+   default:
+   /* Implicit handling for NF_STOLEN, as well as any other non
+* conventional verdicts.
+*/
+   ret = 0;
+   break;
}
return ret;
 }
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 08/10] netfilter: remove hook_entries field from nf_hook_state

2016-10-17 Thread Pablo Neira Ayuso
This field is only useful for nf_queue, so store it in the
nf_queue_entry structure instead, away from the core path. Pass
hook_head to nf_hook_slow().

Since we always have a valid entry on the first iteration in
nf_iterate(), we can use 'do { ... } while (entry)' loop instead.

Signed-off-by: Pablo Neira Ayuso 
---
 include/linux/netfilter.h | 10 --
 include/linux/netfilter_ingress.h |  4 ++--
 include/net/netfilter/nf_queue.h  |  1 +
 net/bridge/br_netfilter_hooks.c   |  4 ++--
 net/bridge/netfilter/ebtable_broute.c |  2 +-
 net/netfilter/core.c  | 13 ++---
 net/netfilter/nf_internals.h  |  2 +-
 net/netfilter/nf_queue.c  | 16 ++--
 net/netfilter/nfnetlink_queue.c   |  2 +-
 9 files changed, 24 insertions(+), 30 deletions(-)

diff --git a/include/linux/netfilter.h b/include/linux/netfilter.h
index e0d000f6c9bf..69230140215b 100644
--- a/include/linux/netfilter.h
+++ b/include/linux/netfilter.h
@@ -54,7 +54,6 @@ struct nf_hook_state {
struct net_device *out;
struct sock *sk;
struct net *net;
-   struct nf_hook_entry __rcu *hook_entries;
int (*okfn)(struct net *, struct sock *, struct sk_buff *);
 };
 
@@ -81,7 +80,6 @@ struct nf_hook_entry {
 };
 
 static inline void nf_hook_state_init(struct nf_hook_state *p,
- struct nf_hook_entry *hook_entry,
  unsigned int hook,
  u_int8_t pf,
  struct net_device *indev,
@@ -96,7 +94,6 @@ static inline void nf_hook_state_init(struct nf_hook_state *p,
p->out = outdev;
p->sk = sk;
p->net = net;
-   RCU_INIT_POINTER(p->hook_entries, hook_entry);
p->okfn = okfn;
 }
 
@@ -150,7 +147,8 @@ void nf_unregister_sockopt(struct nf_sockopt_ops *reg);
 extern struct static_key nf_hooks_needed[NFPROTO_NUMPROTO][NF_MAX_HOOKS];
 #endif
 
-int nf_hook_slow(struct sk_buff *skb, struct nf_hook_state *state);
+int nf_hook_slow(struct sk_buff *skb, struct nf_hook_state *state,
+struct nf_hook_entry *entry);
 
 /**
  * nf_hook - call a netfilter hook
@@ -179,10 +177,10 @@ static inline int nf_hook(u_int8_t pf, unsigned int hook, 
struct net *net,
if (hook_head) {
struct nf_hook_state state;
 
-   nf_hook_state_init(, hook_head, hook, pf, indev, outdev,
+   nf_hook_state_init(, hook, pf, indev, outdev,
   sk, net, okfn);
 
-   ret = nf_hook_slow(skb, );
+   ret = nf_hook_slow(skb, , hook_head);
}
rcu_read_unlock();
 
diff --git a/include/linux/netfilter_ingress.h 
b/include/linux/netfilter_ingress.h
index fd44e4131710..2dc3b49b804a 100644
--- a/include/linux/netfilter_ingress.h
+++ b/include/linux/netfilter_ingress.h
@@ -26,10 +26,10 @@ static inline int nf_hook_ingress(struct sk_buff *skb)
if (unlikely(!e))
return 0;
 
-   nf_hook_state_init(, e, NF_NETDEV_INGRESS,
+   nf_hook_state_init(, NF_NETDEV_INGRESS,
   NFPROTO_NETDEV, skb->dev, NULL, NULL,
   dev_net(skb->dev), NULL);
-   return nf_hook_slow(skb, );
+   return nf_hook_slow(skb, , e);
 }
 
 static inline void nf_hook_ingress_init(struct net_device *dev)
diff --git a/include/net/netfilter/nf_queue.h b/include/net/netfilter/nf_queue.h
index 2280cfe86c56..09948d10e38e 100644
--- a/include/net/netfilter/nf_queue.h
+++ b/include/net/netfilter/nf_queue.h
@@ -12,6 +12,7 @@ struct nf_queue_entry {
unsigned intid;
 
struct nf_hook_statestate;
+   struct nf_hook_entry*hook;
u16 size; /* sizeof(entry) + saved route keys */
 
/* extra space to store route keys */
diff --git a/net/bridge/br_netfilter_hooks.c b/net/bridge/br_netfilter_hooks.c
index 7e3645fa6339..8155bd2a5138 100644
--- a/net/bridge/br_netfilter_hooks.c
+++ b/net/bridge/br_netfilter_hooks.c
@@ -1018,10 +1018,10 @@ int br_nf_hook_thresh(unsigned int hook, struct net 
*net,
 
/* We may already have this, but read-locks nest anyway */
rcu_read_lock();
-   nf_hook_state_init(, elem, hook, NFPROTO_BRIDGE, indev, outdev,
+   nf_hook_state_init(, hook, NFPROTO_BRIDGE, indev, outdev,
   sk, net, okfn);
 
-   ret = nf_hook_slow(skb, );
+   ret = nf_hook_slow(skb, , elem);
rcu_read_unlock();
if (ret == 1)
ret = okfn(net, sk, skb);
diff --git a/net/bridge/netfilter/ebtable_broute.c 
b/net/bridge/netfilter/ebtable_broute.c
index 599679e3498d..8fe36dc3aab2 100644
--- a/net/bridge/netfilter/ebtable_broute.c
+++ b/net/bridge/netfilter/ebtable_broute.c
@@ -53,7 +53,7 @@ static int ebt_broute(struct sk_buff *skb)
struct nf_hook_state state;
int ret;
 
-   

[PATCH 06/10] netfilter: nf_tables: use hook state from xt_action_param structure

2016-10-17 Thread Pablo Neira Ayuso
Don't copy relevant fields from hook state structure, instead use the
one that is already available in struct xt_action_param.

This patch also adds a set of new wrapper functions to fetch relevant
hook state structure fields.

Signed-off-by: Pablo Neira Ayuso 
---
 include/net/netfilter/nf_tables.h| 35 +++-
 net/bridge/netfilter/nft_meta_bridge.c   |  2 +-
 net/bridge/netfilter/nft_reject_bridge.c | 30 ---
 net/ipv4/netfilter/nft_dup_ipv4.c|  2 +-
 net/ipv4/netfilter/nft_masq_ipv4.c   |  4 ++--
 net/ipv4/netfilter/nft_redir_ipv4.c  |  3 +--
 net/ipv4/netfilter/nft_reject_ipv4.c |  4 ++--
 net/ipv6/netfilter/nft_dup_ipv6.c|  2 +-
 net/ipv6/netfilter/nft_masq_ipv6.c   |  3 ++-
 net/ipv6/netfilter/nft_redir_ipv6.c  |  3 ++-
 net/ipv6/netfilter/nft_reject_ipv6.c |  6 +++---
 net/netfilter/nf_dup_netdev.c|  2 +-
 net/netfilter/nf_tables_core.c   | 10 -
 net/netfilter/nf_tables_trace.c  |  8 
 net/netfilter/nft_log.c  |  5 +++--
 net/netfilter/nft_lookup.c   |  5 ++---
 net/netfilter/nft_meta.c |  6 +++---
 net/netfilter/nft_queue.c|  2 +-
 net/netfilter/nft_reject_inet.c  | 18 
 19 files changed, 86 insertions(+), 64 deletions(-)

diff --git a/include/net/netfilter/nf_tables.h 
b/include/net/netfilter/nf_tables.h
index 44060344f958..3295fb85bff6 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -14,27 +14,42 @@
 
 struct nft_pktinfo {
struct sk_buff  *skb;
-   struct net  *net;
-   const struct net_device *in;
-   const struct net_device *out;
-   u8  pf;
-   u8  hook;
booltprot_set;
u8  tprot;
/* for x_tables compatibility */
struct xt_action_param  xt;
 };
 
+static inline struct net *nft_net(const struct nft_pktinfo *pkt)
+{
+   return pkt->xt.state->net;
+}
+
+static inline unsigned int nft_hook(const struct nft_pktinfo *pkt)
+{
+   return pkt->xt.state->hook;
+}
+
+static inline u8 nft_pf(const struct nft_pktinfo *pkt)
+{
+   return pkt->xt.state->pf;
+}
+
+static inline const struct net_device *nft_in(const struct nft_pktinfo *pkt)
+{
+   return pkt->xt.state->in;
+}
+
+static inline const struct net_device *nft_out(const struct nft_pktinfo *pkt)
+{
+   return pkt->xt.state->out;
+}
+
 static inline void nft_set_pktinfo(struct nft_pktinfo *pkt,
   struct sk_buff *skb,
   const struct nf_hook_state *state)
 {
pkt->skb = skb;
-   pkt->net = state->net;
-   pkt->in = state->in;
-   pkt->out = state->out;
-   pkt->hook = state->hook;
-   pkt->pf = state->pf;
pkt->xt.state = state;
 }
 
diff --git a/net/bridge/netfilter/nft_meta_bridge.c 
b/net/bridge/netfilter/nft_meta_bridge.c
index ad47a921b701..5974dbc1ea24 100644
--- a/net/bridge/netfilter/nft_meta_bridge.c
+++ b/net/bridge/netfilter/nft_meta_bridge.c
@@ -23,7 +23,7 @@ static void nft_meta_bridge_get_eval(const struct nft_expr 
*expr,
 const struct nft_pktinfo *pkt)
 {
const struct nft_meta *priv = nft_expr_priv(expr);
-   const struct net_device *in = pkt->in, *out = pkt->out;
+   const struct net_device *in = nft_in(pkt), *out = nft_out(pkt);
u32 *dest = >data[priv->dreg];
const struct net_bridge_port *p;
 
diff --git a/net/bridge/netfilter/nft_reject_bridge.c 
b/net/bridge/netfilter/nft_reject_bridge.c
index 4b3df6b0e3b9..206dc266ecd2 100644
--- a/net/bridge/netfilter/nft_reject_bridge.c
+++ b/net/bridge/netfilter/nft_reject_bridge.c
@@ -315,17 +315,20 @@ static void nft_reject_bridge_eval(const struct nft_expr 
*expr,
case htons(ETH_P_IP):
switch (priv->type) {
case NFT_REJECT_ICMP_UNREACH:
-   nft_reject_br_send_v4_unreach(pkt->net, pkt->skb,
- pkt->in, pkt->hook,
+   nft_reject_br_send_v4_unreach(nft_net(pkt), pkt->skb,
+ nft_in(pkt),
+ nft_hook(pkt),
  priv->icmp_code);
break;
case NFT_REJECT_TCP_RST:
-   nft_reject_br_send_v4_tcp_reset(pkt->net, pkt->skb,
-   pkt->in, pkt->hook);
+   nft_reject_br_send_v4_tcp_reset(nft_net(pkt), pkt->skb,
+   nft_in(pkt),
+

[PATCH 09/10] netfilter: handle queue bypass flag from nf_queue

2016-10-17 Thread Pablo Neira Ayuso
Move queue bypass logic from nf_hook_slow() into nf_queue() that resides
in net/netfilter/nf_queue.c, away from the core path.

Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/core.c | 13 -
 net/netfilter/nf_internals.h |  4 ++--
 net/netfilter/nf_queue.c | 39 ---
 3 files changed, 30 insertions(+), 26 deletions(-)

diff --git a/net/netfilter/core.c b/net/netfilter/core.c
index fa5a3694c4b6..f299fbde150d 100644
--- a/net/netfilter/core.c
+++ b/net/netfilter/core.c
@@ -343,15 +343,10 @@ int nf_hook_slow(struct sk_buff *skb, struct 
nf_hook_state *state,
ret = -EPERM;
break;
case NF_QUEUE:
-   ret = nf_queue(skb, state, entry,
-  verdict >> NF_VERDICT_QBITS);
-   if (ret < 0) {
-   if (ret == -ESRCH &&
-   (verdict & NF_VERDICT_FLAG_QUEUE_BYPASS))
-   goto next_hook;
-   kfree_skb(skb);
-   }
-   /* Fall through. */
+   ret = nf_queue(skb, state, entry, verdict);
+   if (ret == 1)
+   goto next_hook;
+   break;
default:
/* Implicit handling for NF_STOLEN, as well as any other non
 * conventional verdicts.
diff --git a/net/netfilter/nf_internals.h b/net/netfilter/nf_internals.h
index 301cc02257ad..a46f2635b71f 100644
--- a/net/netfilter/nf_internals.h
+++ b/net/netfilter/nf_internals.h
@@ -17,8 +17,8 @@ unsigned int nf_iterate(struct sk_buff *skb, struct 
nf_hook_state *state,
struct nf_hook_entry **entryp);
 
 /* nf_queue.c */
-int nf_queue(struct sk_buff *skb, struct nf_hook_state *state,
-struct nf_hook_entry *entry, unsigned int queuenum);
+int nf_queue(struct sk_buff *skb, const struct nf_hook_state *state,
+struct nf_hook_entry *entry, unsigned int verdict);
 void nf_queue_nf_hook_drop(struct net *net, const struct nf_hook_entry *entry);
 int __init netfilter_queue_init(void);
 
diff --git a/net/netfilter/nf_queue.c b/net/netfilter/nf_queue.c
index 091130bc890a..c5e0d534d352 100644
--- a/net/netfilter/nf_queue.c
+++ b/net/netfilter/nf_queue.c
@@ -107,12 +107,8 @@ void nf_queue_nf_hook_drop(struct net *net, const struct 
nf_hook_entry *entry)
rcu_read_unlock();
 }
 
-/*
- * Any packet that leaves via this function must come back
- * through nf_reinject().
- */
-int nf_queue(struct sk_buff *skb, struct nf_hook_state *state,
-struct nf_hook_entry *hook_entry, unsigned int queuenum)
+static int __nf_queue(struct sk_buff *skb, const struct nf_hook_state *state,
+ struct nf_hook_entry *hook_entry, unsigned int queuenum)
 {
int status = -ENOENT;
struct nf_queue_entry *entry = NULL;
@@ -161,13 +157,32 @@ int nf_queue(struct sk_buff *skb, struct nf_hook_state 
*state,
return status;
 }
 
+/* Any packet that leaves via this function must come back through
+ * nf_reinject().
+ */
+int nf_queue(struct sk_buff *skb, const struct nf_hook_state *state,
+struct nf_hook_entry *entry, unsigned int verdict)
+{
+   int ret;
+
+   ret = __nf_queue(skb, state, entry, verdict >> NF_VERDICT_QBITS);
+   if (ret < 0) {
+   if (ret == -ESRCH &&
+   (verdict & NF_VERDICT_FLAG_QUEUE_BYPASS))
+   return 1;
+
+   kfree_skb(skb);
+   }
+
+   return 0;
+}
+
 void nf_reinject(struct nf_queue_entry *entry, unsigned int verdict)
 {
struct nf_hook_entry *hook_entry = entry->hook;
struct nf_hook_ops *elem = _entry->ops;
struct sk_buff *skb = entry->skb;
const struct nf_afinfo *afinfo;
-   int err;
 
nf_queue_entry_release_refs(entry);
 
@@ -196,14 +211,8 @@ void nf_reinject(struct nf_queue_entry *entry, unsigned 
int verdict)
local_bh_enable();
break;
case NF_QUEUE:
-   err = nf_queue(skb, >state, hook_entry,
-  verdict >> NF_VERDICT_QBITS);
-   if (err < 0) {
-   if (err == -ESRCH &&
-  (verdict & NF_VERDICT_FLAG_QUEUE_BYPASS))
-   goto next_hook;
-   kfree_skb(skb);
-   }
+   if (nf_queue(skb, >state, hook_entry, verdict) == 1)
+   goto next_hook;
break;
case NF_STOLEN:
break;
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 03/10] netfilter: kill NF_HOOK_THRESH() and state->tresh

2016-10-17 Thread Pablo Neira Ayuso
Patch c5136b15ea36 ("netfilter: bridge: add and use br_nf_hook_thresh")
introduced br_nf_hook_thresh().

Replace NF_HOOK_THRESH() by br_nf_hook_thresh from
br_nf_forward_finish(), so we have no more callers for this macro.

As a result, state->thresh and explicit thresh parameter in the hook
state structure is not required anymore. And we can get rid of
skip-hook-under-thresh loop in nf_iterate() in the core path that is
only used by br_netfilter to search for the filter hook.

Suggested-by: Florian Westphal 
Signed-off-by: Pablo Neira Ayuso 
---
 include/linux/netfilter.h | 50 +--
 include/linux/netfilter_ingress.h |  2 +-
 net/bridge/br_netfilter_hooks.c   |  8 +++---
 net/bridge/netfilter/ebtable_broute.c |  2 +-
 net/netfilter/core.c  |  4 ---
 net/netfilter/nf_queue.c  |  1 -
 6 files changed, 19 insertions(+), 48 deletions(-)

diff --git a/include/linux/netfilter.h b/include/linux/netfilter.h
index abc7fdcb9eb1..e0d000f6c9bf 100644
--- a/include/linux/netfilter.h
+++ b/include/linux/netfilter.h
@@ -49,7 +49,6 @@ struct sock;
 
 struct nf_hook_state {
unsigned int hook;
-   int thresh;
u_int8_t pf;
struct net_device *in;
struct net_device *out;
@@ -84,7 +83,7 @@ struct nf_hook_entry {
 static inline void nf_hook_state_init(struct nf_hook_state *p,
  struct nf_hook_entry *hook_entry,
  unsigned int hook,
- int thresh, u_int8_t pf,
+ u_int8_t pf,
  struct net_device *indev,
  struct net_device *outdev,
  struct sock *sk,
@@ -92,7 +91,6 @@ static inline void nf_hook_state_init(struct nf_hook_state *p,
  int (*okfn)(struct net *, struct sock *, 
struct sk_buff *))
 {
p->hook = hook;
-   p->thresh = thresh;
p->pf = pf;
p->in = indev;
p->out = outdev;
@@ -155,20 +153,16 @@ extern struct static_key 
nf_hooks_needed[NFPROTO_NUMPROTO][NF_MAX_HOOKS];
 int nf_hook_slow(struct sk_buff *skb, struct nf_hook_state *state);
 
 /**
- * nf_hook_thresh - call a netfilter hook
+ * nf_hook - call a netfilter hook
  *
  * Returns 1 if the hook has allowed the packet to pass.  The function
  * okfn must be invoked by the caller in this case.  Any other return
  * value indicates the packet has been consumed by the hook.
  */
-static inline int nf_hook_thresh(u_int8_t pf, unsigned int hook,
-struct net *net,
-struct sock *sk,
-struct sk_buff *skb,
-struct net_device *indev,
-struct net_device *outdev,
-int (*okfn)(struct net *, struct sock *, 
struct sk_buff *),
-int thresh)
+static inline int nf_hook(u_int8_t pf, unsigned int hook, struct net *net,
+ struct sock *sk, struct sk_buff *skb,
+ struct net_device *indev, struct net_device *outdev,
+ int (*okfn)(struct net *, struct sock *, struct 
sk_buff *))
 {
struct nf_hook_entry *hook_head;
int ret = 1;
@@ -185,8 +179,8 @@ static inline int nf_hook_thresh(u_int8_t pf, unsigned int 
hook,
if (hook_head) {
struct nf_hook_state state;
 
-   nf_hook_state_init(, hook_head, hook, thresh,
-  pf, indev, outdev, sk, net, okfn);
+   nf_hook_state_init(, hook_head, hook, pf, indev, outdev,
+  sk, net, okfn);
 
ret = nf_hook_slow(skb, );
}
@@ -195,14 +189,6 @@ static inline int nf_hook_thresh(u_int8_t pf, unsigned int 
hook,
return ret;
 }
 
-static inline int nf_hook(u_int8_t pf, unsigned int hook, struct net *net,
- struct sock *sk, struct sk_buff *skb,
- struct net_device *indev, struct net_device *outdev,
- int (*okfn)(struct net *, struct sock *, struct 
sk_buff *))
-{
-   return nf_hook_thresh(pf, hook, net, sk, skb, indev, outdev, okfn, 
INT_MIN);
-}
-   
 /* Activate hook; either okfn or kfree_skb called, unless a hook
returns NF_STOLEN (in which case, it's up to the hook to deal with
the consequences).
@@ -221,19 +207,6 @@ static inline int nf_hook(u_int8_t pf, unsigned int hook, 
struct net *net,
 */
 
 static inline int
-NF_HOOK_THRESH(uint8_t pf, unsigned int hook, struct net *net, struct sock *sk,
-  struct sk_buff *skb, struct net_device *in,
-  struct net_device *out,
-  int (*okfn)(struct net *, struct 

Re: [PATCH nf-next 2/5] netfilter: nft: basic routing expression

2016-10-17 Thread Arturo Borrero Gonzalez
On 16 October 2016 at 15:42, Anders K. Pedersen | Cohaesio
 wrote:
> From: Anders K. Pedersen 
>
> Introduce basic infrastructure for nftables rt expression for routing
> related data. Initially "rt classid" is implemented identical to "meta
> rtclassid", since it is more logical to have this match in the routing
> expression going forward.
>
> Signed-off-by: Anders K. Pedersen 
> ---
>  include/net/netfilter/nft_rt.h   |  23 +
>  net/netfilter/Kconfig|   6 ++
>  net/netfilter/Makefile   |   1 +
>  net/netfilter/nft_rt.c   | 145 ++
>  4 files changed, 175 insertions(+)
>
> diff --git a/include/net/netfilter/nft_rt.h b/include/net/netfilter/nft_rt.h
> --- /dev/null
> +++ b/include/net/netfilter/nft_rt.h
> @@ -0,0 +1,23 @@
> +#ifndef _NFT_RT_H_
> +#define _NFT_RT_H_
> +
> +struct nft_rt {
> +   enum nft_rt_keyskey:8;
> +   enum nft_registers  dreg:8;
> +   u8  family;
> +};
> +
> +extern const struct nla_policy nft_rt_policy[];
> +
> +int nft_rt_get_init(const struct nft_ctx *ctx,
> +   const struct nft_expr *expr,
> +   const struct nlattr * const tb[]);
> +
> +int nft_rt_get_dump(struct sk_buff *skb,
> +   const struct nft_expr *expr);
> +
> +void nft_rt_get_eval(const struct nft_expr *expr,
> +struct nft_regs *regs,
> +const struct nft_pktinfo *pkt);
> +
> +#endif
> diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
> --- a/net/netfilter/Kconfig
> +++ b/net/netfilter/Kconfig
> @@ -474,6 +474,12 @@ config NFT_META
>   This option adds the "meta" expression that you can use to match and
>   to set packet metainformation such as the packet mark.
>
> +config NFT_RT
> +   tristate "Netfilter nf_tables routing module"
> +   help
> + This option adds the "rt" expression that you can use to match
> + packet routing information such as the packet nexthop.
> +
>  config NFT_NUMGEN
> tristate "Netfilter nf_tables number generator module"
> help
> diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
> --- a/net/netfilter/Makefile
> +++ b/net/netfilter/Makefile
> @@ -81,6 +81,7 @@ obj-$(CONFIG_NF_TABLES_NETDEV)+= nf_tables_netdev.o
>  obj-$(CONFIG_NFT_COMPAT)   += nft_compat.o
>  obj-$(CONFIG_NFT_EXTHDR)   += nft_exthdr.o
>  obj-$(CONFIG_NFT_META) += nft_meta.o
> +obj-$(CONFIG_NFT_RT)   += nft_rt.o
>  obj-$(CONFIG_NFT_NUMGEN)   += nft_numgen.o
>  obj-$(CONFIG_NFT_CT)   += nft_ct.o
>  obj-$(CONFIG_NFT_LIMIT)+= nft_limit.o
> diff --git a/net/netfilter/nft_rt.c b/net/netfilter/nft_rt.c
> --- /dev/null
> +++ b/net/netfilter/nft_rt.c
> @@ -0,0 +1,145 @@
> +/*
> + * Copyright (c) 2016 Anders K. Pedersen 
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +void nft_rt_get_eval(const struct nft_expr *expr,
> +struct nft_regs *regs,
> +const struct nft_pktinfo *pkt)
> +{
> +   const struct nft_rt *priv = nft_expr_priv(expr);
> +   const struct sk_buff *skb = pkt->skb;
> +   u32 *dest = >data[priv->dreg];
> +
> +   switch (priv->key) {
> +#ifdef CONFIG_IP_ROUTE_CLASSID
> +   case NFT_RT_CLASSID: {
> +   const struct dst_entry *dst = skb_dst(skb);
> +
> +   if (dst == NULL)
> +   goto err;
> +   *dest = dst->tclassid;
> +   break;
> +   }
> +#endif
> +   default:
> +   WARN_ON(1);
> +   goto err;
> +   }
> +   return;
> +
> +err:
> +   regs->verdict.code = NFT_BREAK;
> +}
> +EXPORT_SYMBOL_GPL(nft_rt_get_eval);
> +
> +const struct nla_policy nft_rt_policy[NFTA_RT_MAX + 1] = {
> +   [NFTA_RT_DREG]  = { .type = NLA_U32 },
> +   [NFTA_RT_KEY]   = { .type = NLA_U32 },
> +   [NFTA_RT_FAMILY]= { .type = NLA_U32 },
> +};
> +EXPORT_SYMBOL_GPL(nft_rt_policy);
> +
> +int nft_rt_get_init(const struct nft_ctx *ctx,
> +   const struct nft_expr *expr,
> +   const struct nlattr * const tb[])
> +{
> +   struct nft_rt *priv = nft_expr_priv(expr);
> +   unsigned int len;
> +
> +   priv->key = ntohl(nla_get_be32(tb[NFTA_RT_KEY]));
> +   switch (priv->key) {
> +#ifdef CONFIG_IP_ROUTE_CLASSID
> +   case NFT_RT_CLASSID:
> +   len = sizeof(u32);
> +   break;
> +#endif
> +   default:
> +   return -EOPNOTSUPP;
> +   }
> +
> +   

Re: [PATCH nf-next 1/5] netfilter: nft: UAPI headers for routing expression

2016-10-17 Thread Arturo Borrero Gonzalez
On 16 October 2016 at 15:41, Anders K. Pedersen | Cohaesio
 wrote:
> diff --git a/include/uapi/linux/netfilter/nf_tables.h 
> b/include/uapi/linux/netfilter/nf_tables.h
> --- a/include/uapi/linux/netfilter/nf_tables.h
> +++ b/include/uapi/linux/netfilter/nf_tables.h
> @@ -759,6 +759,16 @@ enum nft_meta_keys {
>  };
>
>  /**
> + * enum nft_rt_keys - nf_tables routing expression keys
> + *
> + * @NFT_META_NEXTHOP: routing nexthop
> + */
> +enum nft_rt_keys {
> +   NFT_RT_CLASSID,
> +   NFT_RT_NEXTHOP,
> +};
> +

The comment section looks like it requires a fix.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html