Re: [PATCH nf] netfilter: core: remove erroneous warn_on

2017-09-06 Thread Aaron Conole
Florian Westphal  writes:

> kernel test robot reported:
>
> WARNING: CPU: 0 PID: 1244 at net/netfilter/core.c:218 
> __nf_hook_entries_try_shrink+0x49/0xcd
> [..]
>
> After allowing batching in nf_unregister_net_hooks its possible that an 
> earlier
> call to __nf_hook_entries_try_shrink already compacted the list.
> If this happens we don't need to do anything.
>
> Fixes: d3ad2c17b4047 ("netfilter: core: batch nf_unregister_net_hooks 
> synchronize_net calls")
> Reported-by: kernel test robot 
> Signed-off-by: Florian Westphal 
> ---


Acked-by: Aaron Conole 
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] netfilter: xt_hashlimit: avoid 64-bit division

2017-09-06 Thread Arnd Bergmann
On Wed, Sep 6, 2017 at 10:22 PM, Vishwanath Pai  wrote:
> On 09/06/2017 03:57 PM, Arnd Bergmann wrote:
>> 64-bit division is expensive on 32-bit architectures, and
>> requires a special function call to avoid a link error like:
>>
>> net/netfilter/xt_hashlimit.o: In function `hashlimit_mt_common':
>> xt_hashlimit.c:(.text+0x1328): undefined reference to `__aeabi_uldivmod'
>>
>> In the case of hashlimit_mt_common, we don't actually need a
>> 64-bit operation, we can simply rewrite the function slightly
>> to make that clear to the compiler.
>>
>> Fixes: bea74641e378 ("netfilter: xt_hashlimit: add rate match mode")
>> Signed-off-by: Arnd Bergmann 
>> ---
>>  net/netfilter/xt_hashlimit.c | 5 -
>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/net/netfilter/xt_hashlimit.c b/net/netfilter/xt_hashlimit.c
>> index 10d48234f5f4..50b53d86eef5 100644
>> --- a/net/netfilter/xt_hashlimit.c
>> +++ b/net/netfilter/xt_hashlimit.c
>> @@ -531,7 +531,10 @@ static u64 user2rate_bytes(u64 user)
>>  {
>>   u64 r;
>>
>> - r = user ? 0xULL / user : 0xULL;
>> + if (user > 0xULL)
>> + return 0;
>> +
>> + r = user ? 0xULL / (u32)user : 0xULL;
>>   r = (r - 1) << 4;
>>   return r;
>>  }
>>
>
> I have submitted another patch to fix this:
> https://patchwork.ozlabs.org/patch/809881/
>
> We have seen this problem before, I was careful not to introduce this
> again in the new patch but clearly I overlooked this particular line :(
>
> In the other cases we fixed it by replacing division with div64_u64().

div64_u64() seems needlessly expensive here since the dividend
is known to be a 32-bit number. I guess the function is not called
frequently though, so it doesn't matter much.

  Arnd
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 13/14] netfilter/ipvs: Use %pS printk format for direct addresses

2017-09-06 Thread Helge Deller
The debug and error printk functions in ipvs uses wrongly the %pF instead of
the %pS printk format specifier for printing symbols for the address returned
by _builtin_return_address(0). Fix it for the ia64, ppc64 and parisc64
architectures.

Signed-off-by: Helge Deller 
Cc: Wensong Zhang 
Cc: net...@vger.kernel.org
Cc: lvs-de...@vger.kernel.org
Cc: netfilter-devel@vger.kernel.org
---
 net/netfilter/ipvs/ip_vs_conn.c | 2 +-
 net/netfilter/ipvs/ip_vs_ctl.c  | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_conn.c b/net/netfilter/ipvs/ip_vs_conn.c
index 3d2ac71a..f73561c 100644
--- a/net/netfilter/ipvs/ip_vs_conn.c
+++ b/net/netfilter/ipvs/ip_vs_conn.c
@@ -185,7 +185,7 @@ static inline int ip_vs_conn_hash(struct ip_vs_conn *cp)
hlist_add_head_rcu(>c_list, _vs_conn_tab[hash]);
ret = 1;
} else {
-   pr_err("%s(): request for already hashed, called from %pF\n",
+   pr_err("%s(): request for already hashed, called from %pS\n",
   __func__, __builtin_return_address(0));
ret = 0;
}
diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
index 1fa3c23..88fc58a 100644
--- a/net/netfilter/ipvs/ip_vs_ctl.c
+++ b/net/netfilter/ipvs/ip_vs_ctl.c
@@ -300,7 +300,7 @@ static int ip_vs_svc_hash(struct ip_vs_service *svc)
unsigned int hash;
 
if (svc->flags & IP_VS_SVC_F_HASHED) {
-   pr_err("%s(): request for already hashed, called from %pF\n",
+   pr_err("%s(): request for already hashed, called from %pS\n",
   __func__, __builtin_return_address(0));
return 0;
}
@@ -334,7 +334,7 @@ static int ip_vs_svc_hash(struct ip_vs_service *svc)
 static int ip_vs_svc_unhash(struct ip_vs_service *svc)
 {
if (!(svc->flags & IP_VS_SVC_F_HASHED)) {
-   pr_err("%s(): request for unhash flagged, called from %pF\n",
+   pr_err("%s(): request for unhash flagged, called from %pS\n",
   __func__, __builtin_return_address(0));
return 0;
}
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] netfilter: xt_hashlimit: avoid 64-bit division

2017-09-06 Thread Vishwanath Pai
On 09/06/2017 03:57 PM, Arnd Bergmann wrote:
> 64-bit division is expensive on 32-bit architectures, and
> requires a special function call to avoid a link error like:
> 
> net/netfilter/xt_hashlimit.o: In function `hashlimit_mt_common':
> xt_hashlimit.c:(.text+0x1328): undefined reference to `__aeabi_uldivmod'
> 
> In the case of hashlimit_mt_common, we don't actually need a
> 64-bit operation, we can simply rewrite the function slightly
> to make that clear to the compiler.
> 
> Fixes: bea74641e378 ("netfilter: xt_hashlimit: add rate match mode")
> Signed-off-by: Arnd Bergmann 
> ---
>  net/netfilter/xt_hashlimit.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/net/netfilter/xt_hashlimit.c b/net/netfilter/xt_hashlimit.c
> index 10d48234f5f4..50b53d86eef5 100644
> --- a/net/netfilter/xt_hashlimit.c
> +++ b/net/netfilter/xt_hashlimit.c
> @@ -531,7 +531,10 @@ static u64 user2rate_bytes(u64 user)
>  {
>   u64 r;
>  
> - r = user ? 0xULL / user : 0xULL;
> + if (user > 0xULL)
> + return 0;
> +
> + r = user ? 0xULL / (u32)user : 0xULL;
>   r = (r - 1) << 4;
>   return r;
>  }
> 

I have submitted another patch to fix this:
https://patchwork.ozlabs.org/patch/809881/

We have seen this problem before, I was careful not to introduce this
again in the new patch but clearly I overlooked this particular line :(

In the other cases we fixed it by replacing division with div64_u64().

-Vishwanath
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] netfilter: xt_hashlimit: avoid 64-bit division

2017-09-06 Thread Arnd Bergmann
64-bit division is expensive on 32-bit architectures, and
requires a special function call to avoid a link error like:

net/netfilter/xt_hashlimit.o: In function `hashlimit_mt_common':
xt_hashlimit.c:(.text+0x1328): undefined reference to `__aeabi_uldivmod'

In the case of hashlimit_mt_common, we don't actually need a
64-bit operation, we can simply rewrite the function slightly
to make that clear to the compiler.

Fixes: bea74641e378 ("netfilter: xt_hashlimit: add rate match mode")
Signed-off-by: Arnd Bergmann 
---
 net/netfilter/xt_hashlimit.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/net/netfilter/xt_hashlimit.c b/net/netfilter/xt_hashlimit.c
index 10d48234f5f4..50b53d86eef5 100644
--- a/net/netfilter/xt_hashlimit.c
+++ b/net/netfilter/xt_hashlimit.c
@@ -531,7 +531,10 @@ static u64 user2rate_bytes(u64 user)
 {
u64 r;
 
-   r = user ? 0xULL / user : 0xULL;
+   if (user > 0xULL)
+   return 0;
+
+   r = user ? 0xULL / (u32)user : 0xULL;
r = (r - 1) << 4;
return r;
 }
-- 
2.9.0

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [nft PATCH RFC] Convert man page source to asciidoc

2017-09-06 Thread Jan Engelhardt

On Wednesday 2017-09-06 16:02, Phil Sutter wrote:
>> Knowing that, people just avoid them most of the time for groff - and if I 
>> may
>> say so, it has not reduced the document quality.
>
>Right now, nft.8 makes extensive use of tables which is why I considered
>proper table support an important feature. OTOH I didn't experience
>rendering issues with them in nft.8, did you?

I do. For some reason, the right margin of the table is 60% of the
terminal width. (width 120 => table up to col #63, terminal 80 cols
=> table up to col #50).

   ┌┬─┐
   │Hook│ Description │
   ├┼─┤
   │prerouting  │ All  packets  entering  the  system are │
   ││ processed by the prerouting hook. It is │
   ││ invoked  before the routing process and │
   ││ is used for early filtering or changing │
   ││ packet attributes that affect routing.  │
   ├┼─┤
   │input   │ Packets  delivered  to the local system │
   ││ are processed by the input hook.│
   ├┼─┤
   │forward │ Packets forwarded to a  different  host │
   ││ are processed by the forward hook.  │
   ├┼─┤
   │output  │ Packets  sent  by  local  processes are │
   ││ processed by the output hook.   │
   ├┼─┤
   │postrouting │ All packets leaving the system are pro- │
   ││ cessed by the postrouting hook. │
   └┴─┘

However, what I tried to express is the problem with tables is that
one column has almost no text and another has a lot of it, like this
one. This causes large amounts of whitespace in column 1 in this
case. Personally, I would have placed the hooks and their
explanations as (the equivalent of)

.TP
\fBprerouting\fP
All packets enter ...
.TP
\fBinput\fP
Packets delivered ...

Or if the source material was HTML, perhaps just


prerouting: All packets enter...
input: Packets delivered...



The rectangular grid is overkill, because you are not trying to do
a figurative police lineup (leave tables for e.g.
https://en.wikipedia.org/wiki/Comparison_of_Linux_distributions#Instruction_set_architecture_support
)


In another way, I could also ask why hooks are in tables, and commands 
("add", "flush", etc.) are in TP-style. What makes hooks so special that 
they deserve a table, despite their keyword being the same short length 
as the commands.



>> >> There are many markup languages, it reminds me to xkcd #927 [0].
>> >
>> >Well, the difference here is that I'm not inventing anything new but
>> >search for better options amongst the existing solutions. :P
>> 
>> That would be to stay with docbook then, because RST/MD/A2 do not seem to 
>> have
>> left themselves a lot of room for later extension.
>
>What extensions do you have in mind?

I kind of left that open; it seemed you implied there is something
(the "better option") that MD couldn't do that asciidoc could. That
thing, docbook should already have.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [nft PATCH RFC] Convert man page source to asciidoc

2017-09-06 Thread Phil Sutter
Hi,

On Wed, Sep 06, 2017 at 03:25:51PM +0200, Jan Engelhardt wrote:
> On Wednesday 2017-09-06 13:58, Phil Sutter wrote:
> >
> >Regarding reStructuredText, did you look at how tables are written
> >there? If not, see here[2]. I really think that speaks for itself.
> 
> Markup is the least problem. Tables, when rendered, have a tendency to quickly
> grow too large for the display container because there is just so much data to
> show.

It really matters if the markup specs suggest to draw ASCII art
pictures. The real problem though is that reStructuredText is meant to
be read as is and that is not required nor wanted in our case since we
generate a man page anyway.

> Knowing that, people just avoid them most of the time for groff - and if I may
> say so, it has not reduced the document quality.

Right now, nft.8 makes extensive use of tables which is why I considered
proper table support an important feature. OTOH I didn't experience
rendering issues with them in nft.8, did you?

> >> There are many markup languages, it reminds me to xkcd #927 [0].
> >
> >Well, the difference here is that I'm not inventing anything new but
> >search for better options amongst the existing solutions. :P
> 
> That would be to stay with docbook then, because RST/MD/A2 do not seem to have
> left themselves a lot of room for later extension.

What extensions do you have in mind?

> >> I would prefer if we stick to groff, which seems to be the standard in 
> >> Linux.
> >
> >Yes, this is the very basic alternative but as said I think providing
> >users with an easier to use markup makes sense.
> 
> Users read the rendered text, not the markup.

Yes, you are right of course. I mixed up users with contributors again.

> All of the glorious old markups are "unreadable" enough to be fed through a
> program to give a visually-cooked, markup-free rendering (browser,
> /usr/bin/man, evince, etc.) MediaWiki, RST, MD and asciidoc on the other hand
> however seem to propagate the use of just cat/less with very short markup,
> which, to a user, is not too bad to look at, but also not too visually 
> pleasing
> either, giving the worst of both worlds :-p

Now you seem to mix up users and contributors: I never suggested to
provide documentation to users in a different format than roff (rendered
by 'man'), but to provide people editing the documentation with
something more intuitive than docbook or roff.

Cheers, Phil
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [nft PATCH RFC] Convert man page source to asciidoc

2017-09-06 Thread Jan Engelhardt

On Wednesday 2017-09-06 13:58, Phil Sutter wrote:
>
>Regarding reStructuredText, did you look at how tables are written
>there? If not, see here[2]. I really think that speaks for itself.

Markup is the least problem. Tables, when rendered, have a tendency to quickly
grow too large for the display container because there is just so much data to
show.

Knowing that, people just avoid them most of the time for groff - and if I may
say so, it has not reduced the document quality.


>> There are many markup languages, it reminds me to xkcd #927 [0].
>
>Well, the difference here is that I'm not inventing anything new but
>search for better options amongst the existing solutions. :P

That would be to stay with docbook then, because RST/MD/A2 do not seem to have
left themselves a lot of room for later extension.


>> I would prefer if we stick to groff, which seems to be the standard in Linux.
>
>Yes, this is the very basic alternative but as said I think providing
>users with an easier to use markup makes sense.

Users read the rendered text, not the markup.

All of the glorious old markups are "unreadable" enough to be fed through a
program to give a visually-cooked, markup-free rendering (browser,
/usr/bin/man, evince, etc.) MediaWiki, RST, MD and asciidoc on the other hand
however seem to propagate the use of just cat/less with very short markup,
which, to a user, is not too bad to look at, but also not too visually pleasing
either, giving the worst of both worlds :-p
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH nf 1/2] netfilter: nat: Revert "netfilter: nat: convert nat bysrc hash to rhashtable"

2017-09-06 Thread Florian Westphal
Dao Quang Minh  wrote:
> Thanks for the patch !
> 
> > rhlist removal however requires a rhlist walk.
> We can have thousands of entries in such a list if source port/addresses
> are reused for multiple flows, if this happens removal requests are so
> expensive that deletions of a few thousand flows can take several
> seconds(!).
> 
> Do you think rhlist can use double linked list instead of single linked so
> removal doesn't need to walk the whole chain ?

Yes, but:

> > I tried to add hlist_node to rhlist to speed up rhltable_remove but this
> > isn't doable without changing semantics.  rhltable_remove_fast will
> > check that the to-be-deleted object is part of the table and that
> > requires a list walk that we want to avoid.
> >
> > Furthermore, using hlist_node increases size of struct rhlist_head, which
> > in turn increases nf_conn size.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH nf] netfilter: core: remove erroneous warn_on

2017-09-06 Thread Florian Westphal
kernel test robot reported:

WARNING: CPU: 0 PID: 1244 at net/netfilter/core.c:218 
__nf_hook_entries_try_shrink+0x49/0xcd
[..]

After allowing batching in nf_unregister_net_hooks its possible that an earlier
call to __nf_hook_entries_try_shrink already compacted the list.
If this happens we don't need to do anything.

Fixes: d3ad2c17b4047 ("netfilter: core: batch nf_unregister_net_hooks 
synchronize_net calls")
Reported-by: kernel test robot 
Signed-off-by: Florian Westphal 
---
 net/netfilter/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netfilter/core.c b/net/netfilter/core.c
index 04fe25abc5f6..52cd2901a097 100644
--- a/net/netfilter/core.c
+++ b/net/netfilter/core.c
@@ -215,7 +215,7 @@ static void *__nf_hook_entries_try_shrink(struct 
nf_hook_entries __rcu **pp)
if (skip == hook_entries)
goto out_assign;
 
-   if (WARN_ON(skip == 0))
+   if (skip == 0)
return NULL;
 
hook_entries -= skip;
-- 
2.13.0

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: nf_nat_cleanup_conntrack performance with rhlist

2017-09-06 Thread Florian Westphal
Ivan Babrou  wrote:
> Hello,
> 
> We hit a bug where nf_nat_cleanup_conntrack is taking exponential amount of 
> time
> if you ping a lot of distinct IPs. It seems to be related to hlist ->
> rhlist migration.

Yes, and its best to revert, I sent patches to do just that.

Thanks for the heads-up.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH nf 0/2] netfilter: nat: do not use rhltable

2017-09-06 Thread Florian Westphal
This reverts the conversion of the nat bysource table to rhlist that
I did last year.

It was a bad idea to begin with.

Fast lookup is not essential; in most cases there is no lookup
at all because original tuple is not taken and can be used as-is.
What needs to be fast is insertion and deletion.

With rhlist, deletion is slow as it requires a list traversal.
So, switch back to a statically-sized table for bysource hash.

 include/net/netfilter/nf_conntrack.h |3 
 include/net/netfilter/nf_nat.h   |1 
 net/netfilter/nf_nat_core.c  |  142 ---
 3 files changed, 66 insertions(+), 80 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH nf 2/2] netfilter: nat: use keyed locks

2017-09-06 Thread Florian Westphal
no need to serialize on a single lock, we can partition the table and
add/delete in parallel to different slots.
This restores one of the advantages that got lost with the rhlist
revert.

Cc: Ivan Babrou 
Signed-off-by: Florian Westphal 
---
 net/netfilter/nf_nat_core.c | 36 
 1 file changed, 24 insertions(+), 12 deletions(-)

diff --git a/net/netfilter/nf_nat_core.c b/net/netfilter/nf_nat_core.c
index 2fb80a4bfb34..ad29637d1b62 100644
--- a/net/netfilter/nf_nat_core.c
+++ b/net/netfilter/nf_nat_core.c
@@ -30,7 +30,7 @@
 #include 
 #include 
 
-static DEFINE_SPINLOCK(nf_nat_lock);
+static spinlock_t nf_nat_locks[CONNTRACK_LOCKS];
 
 static DEFINE_MUTEX(nf_nat_proto_mutex);
 static const struct nf_nat_l3proto __rcu *nf_nat_l3protos[NFPROTO_NUMPROTO]
@@ -423,13 +423,15 @@ nf_nat_setup_info(struct nf_conn *ct,
 
if (maniptype == NF_NAT_MANIP_SRC) {
unsigned int srchash;
+   spinlock_t *lock;
 
srchash = hash_by_src(net,
  >tuplehash[IP_CT_DIR_ORIGINAL].tuple);
-   spin_lock_bh(_nat_lock);
+   lock = _nat_locks[srchash % ARRAY_SIZE(nf_nat_locks)];
+   spin_lock_bh(lock);
hlist_add_head_rcu(>nat_bysource,
   _nat_bysource[srchash]);
-   spin_unlock_bh(_nat_lock);
+   spin_unlock_bh(lock);
}
 
/* It's done. */
@@ -523,6 +525,16 @@ static int nf_nat_proto_remove(struct nf_conn *i, void 
*data)
return i->status & IPS_NAT_MASK ? 1 : 0;
 }
 
+static void __nf_nat_cleanup_conntrack(struct nf_conn *ct)
+{
+   unsigned int h;
+
+   h = hash_by_src(nf_ct_net(ct), 
>tuplehash[IP_CT_DIR_ORIGINAL].tuple);
+   spin_lock_bh(_nat_locks[h % ARRAY_SIZE(nf_nat_locks)]);
+   hlist_del_rcu(>nat_bysource);
+   spin_unlock_bh(_nat_locks[h % ARRAY_SIZE(nf_nat_locks)]);
+}
+
 static int nf_nat_proto_clean(struct nf_conn *ct, void *data)
 {
if (nf_nat_proto_remove(ct, data))
@@ -538,9 +550,7 @@ static int nf_nat_proto_clean(struct nf_conn *ct, void 
*data)
 * will delete entry from already-freed table.
 */
clear_bit(IPS_SRC_NAT_DONE_BIT, >status);
-   spin_lock_bh(_nat_lock);
-   hlist_del_rcu(>nat_bysource);
-   spin_unlock_bh(_nat_lock);
+   __nf_nat_cleanup_conntrack(ct);
 
/* don't delete conntrack.  Although that would make things a lot
 * simpler, we'd end up flushing all conntracks on nat rmmod.
@@ -668,11 +678,8 @@ EXPORT_SYMBOL_GPL(nf_nat_l3proto_unregister);
 /* No one using conntrack by the time this called. */
 static void nf_nat_cleanup_conntrack(struct nf_conn *ct)
 {
-   if (ct->status & IPS_SRC_NAT_DONE) {
-   spin_lock_bh(_nat_lock);
-   hlist_del_rcu(>nat_bysource);
-   spin_unlock_bh(_nat_lock);
-   }
+   if (ct->status & IPS_SRC_NAT_DONE)
+   __nf_nat_cleanup_conntrack(ct);
 }
 
 static struct nf_ct_ext_type nat_extend __read_mostly = {
@@ -794,10 +801,12 @@ static struct nf_ct_helper_expectfn follow_master_nat = {
 
 static int __init nf_nat_init(void)
 {
-   int ret;
+   int ret, i;
 
/* Leave them the same for the moment. */
nf_nat_htable_size = nf_conntrack_htable_size;
+   if (nf_nat_htable_size < ARRAY_SIZE(nf_nat_locks))
+   nf_nat_htable_size = ARRAY_SIZE(nf_nat_locks);
 
nf_nat_bysource = nf_ct_alloc_hashtable(_nat_htable_size, 0);
if (!nf_nat_bysource)
@@ -810,6 +819,9 @@ static int __init nf_nat_init(void)
return ret;
}
 
+   for (i = 0; i < ARRAY_SIZE(nf_nat_locks); i++)
+   spin_lock_init(_nat_locks[i]);
+
nf_ct_helper_expectfn_register(_master_nat);
 
BUG_ON(nfnetlink_parse_nat_setup_hook != NULL);
-- 
2.13.0

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH nf 1/2] netfilter: nat: Revert "netfilter: nat: convert nat bysrc hash to rhashtable"

2017-09-06 Thread Florian Westphal
This reverts commit 870190a9ec9075205c0fa795a09fa931694a3ff1.

It was not a good idea. The custom hash table was a much better
fit for this purpose.

A fast lookup is not essential, in fact for most cases there is no lookup
at all because original tuple is not taken and can be used as-is.
What needs to be fast is insertion and deletion.

rhlist removal however requires a rhlist walk.
We can have thousands of entries in such a list if source port/addresses
are reused for multiple flows, if this happens removal requests are so
expensive that deletions of a few thousand flows can take several
seconds(!).

The advantages that we got from rhashtable are:
1) table auto-sizing
2) multiple locks

1) would be nice to have, but it is not essential as we have at
most one lookup per new flow, so even a million flows in the bysource
table are not a problem compared to current deletion cost.
2) is easy to add to custom hash table.

I tried to add hlist_node to rhlist to speed up rhltable_remove but this
isn't doable without changing semantics.  rhltable_remove_fast will
check that the to-be-deleted object is part of the table and that
requires a list walk that we want to avoid.

Furthermore, using hlist_node increases size of struct rhlist_head, which
in turn increases nf_conn size.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=196821
Reported-by: Ivan Babrou 
Signed-off-by: Florian Westphal 
---
 include/net/netfilter/nf_conntrack.h |   3 +-
 include/net/netfilter/nf_nat.h   |   1 -
 net/netfilter/nf_nat_core.c  | 130 ++-
 3 files changed, 54 insertions(+), 80 deletions(-)

diff --git a/include/net/netfilter/nf_conntrack.h 
b/include/net/netfilter/nf_conntrack.h
index 48407569585d..a7c2cee39570 100644
--- a/include/net/netfilter/nf_conntrack.h
+++ b/include/net/netfilter/nf_conntrack.h
@@ -17,7 +17,6 @@
 #include 
 #include 
 #include 
-#include 
 
 #include 
 #include 
@@ -83,7 +82,7 @@ struct nf_conn {
possible_net_t ct_net;
 
 #if IS_ENABLED(CONFIG_NF_NAT)
-   struct rhlist_head nat_bysource;
+   struct hlist_node   nat_bysource;
 #endif
/* all members below initialized via memset */
u8 __nfct_init_offset[0];
diff --git a/include/net/netfilter/nf_nat.h b/include/net/netfilter/nf_nat.h
index 05c82a1a4267..b71701302e61 100644
--- a/include/net/netfilter/nf_nat.h
+++ b/include/net/netfilter/nf_nat.h
@@ -1,6 +1,5 @@
 #ifndef _NF_NAT_H
 #define _NF_NAT_H
-#include 
 #include 
 #include 
 #include 
diff --git a/net/netfilter/nf_nat_core.c b/net/netfilter/nf_nat_core.c
index b1d3740ae36a..2fb80a4bfb34 100644
--- a/net/netfilter/nf_nat_core.c
+++ b/net/netfilter/nf_nat_core.c
@@ -30,19 +30,17 @@
 #include 
 #include 
 
+static DEFINE_SPINLOCK(nf_nat_lock);
+
 static DEFINE_MUTEX(nf_nat_proto_mutex);
 static const struct nf_nat_l3proto __rcu *nf_nat_l3protos[NFPROTO_NUMPROTO]
__read_mostly;
 static const struct nf_nat_l4proto __rcu **nf_nat_l4protos[NFPROTO_NUMPROTO]
__read_mostly;
 
-struct nf_nat_conn_key {
-   const struct net *net;
-   const struct nf_conntrack_tuple *tuple;
-   const struct nf_conntrack_zone *zone;
-};
-
-static struct rhltable nf_nat_bysource_table;
+static struct hlist_head *nf_nat_bysource __read_mostly;
+static unsigned int nf_nat_htable_size __read_mostly;
+static unsigned int nf_nat_hash_rnd __read_mostly;
 
 inline const struct nf_nat_l3proto *
 __nf_nat_l3proto_find(u8 family)
@@ -118,17 +116,19 @@ int nf_xfrm_me_harder(struct net *net, struct sk_buff 
*skb, unsigned int family)
 EXPORT_SYMBOL(nf_xfrm_me_harder);
 #endif /* CONFIG_XFRM */
 
-static u32 nf_nat_bysource_hash(const void *data, u32 len, u32 seed)
+/* We keep an extra hash for each conntrack, for fast searching. */
+static unsigned int
+hash_by_src(const struct net *n, const struct nf_conntrack_tuple *tuple)
 {
-   const struct nf_conntrack_tuple *t;
-   const struct nf_conn *ct = data;
+   unsigned int hash;
+
+   get_random_once(_nat_hash_rnd, sizeof(nf_nat_hash_rnd));
 
-   t = >tuplehash[IP_CT_DIR_ORIGINAL].tuple;
/* Original src, to ensure we map it consistently if poss. */
+   hash = jhash2((u32 *)>src, sizeof(tuple->src) / sizeof(u32),
+ tuple->dst.protonum ^ nf_nat_hash_rnd ^ net_hash_mix(n));
 
-   seed ^= net_hash_mix(nf_ct_net(ct));
-   return jhash2((const u32 *)>src, sizeof(t->src) / sizeof(u32),
- t->dst.protonum ^ seed);
+   return reciprocal_scale(hash, nf_nat_htable_size);
 }
 
 /* Is this tuple already taken? (not by us) */
@@ -184,28 +184,6 @@ same_src(const struct nf_conn *ct,
t->src.u.all == tuple->src.u.all);
 }
 
-static int nf_nat_bysource_cmp(struct rhashtable_compare_arg *arg,
-  const void *obj)
-{
-   const struct nf_nat_conn_key *key = arg->key;
- 

Re: [nft PATCH RFC] Convert man page source to asciidoc

2017-09-06 Thread Phil Sutter
Hi Arturo,

On Wed, Sep 06, 2017 at 11:56:37AM +0200, Arturo Borrero Gonzalez wrote:
> On 6 September 2017 at 10:41, Phil Sutter  wrote:
> > Beware: The conversion is incomplete and merely serves as base for
> > discussion.
> >
> > This patch converts nft.xml into asciidoc markup, top down until (and
> > including) stateful objects description. I stopped there because it's
> > the first chance of demonstrating my idea of splitting the documentation
> > into smaller pieces for convenience and maintainability.
> >
> > Regarding package dependencies, this "just" exchanges docbook with
> > asciidoc - dblatex is still required for PDF creation.
> >
> 
> Hi Phil,
> 
> thanks for your initiative and hard work, it's really appreciated :-)

Thanks!

> Regarding the change, why asciidoc? why not markdown, or org-mode or
> reStructuredText?

Well, asciidoc is basicaly markdown on steroids, I fear we'll soon miss
features if we stuck to plain markdown. Also see this URL for a
comparison: [1].

I tend to ignore org-mode because I'm not an emacs user, and I guess
about 50% of (potential) nftables contributors feel the same about that.
:)

Regarding reStructuredText, did you look at how tables are written
there? If not, see here[2]. I really think that speaks for itself.

> There are many markup languages, it reminds me to xkcd #927 [0].

Well, the difference here is that I'm not inventing anything new but
search for better options amongst the existing solutions. :P

> I would prefer if we stick to groff, which seems to be the standard in Linux.

Yes, this is the very basic alternative but as said I think providing
users with an easier to use markup makes sense.

> Regarding the separation of text in different includes, why not
> creating different manpages?
> Netfilter did this in the past with iptables(8) and iptables-extensions(8).
> 
> Brainstorming:
> * nft(8) <-- main document, general info
> * nft-ct(8) <-- concrete info for ct objects
> * nft-counter(8) <-- concrete info for counter objets
> * nft-flowtables(8) <-- about flow tables
> * nft-quota(8) <--- concrete info for quotas
> * nft-performance(8) <--- concrete info about nftables sets, maps,
> dicts, concatenations, etc.
> * nft-ha(8) <--- for HA environments, loadbalancing etc

Well, I don't think nft.8's current size requires to split it into
smaller files at this point.

Cheers, Phil

[1] 
https://github.com/asciidoctor/asciidoctor.org/blob/master/docs/_includes/asciidoc-vs-markdown.adoc
[2] http://docutils.sourceforge.net/docs/ref/rst/restructuredtext.html#tables
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


nf_nat_cleanup_conntrack performance with rhlist

2017-09-06 Thread Ivan Babrou
Hello,

We hit a bug where nf_nat_cleanup_conntrack is taking exponential amount of time
if you ping a lot of distinct IPs. It seems to be related to hlist ->
rhlist migration.

Here's the bugzilla ticket:

* https://bugzilla.kernel.org/show_bug.cgi?id=196821

It has a reproduce script and the list of reverts that bring back
reasonable performance.

Dev mailing list seems like a good place to notify the right people.

The first email got rejected because of html subpart, apologies for
the duplicate.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [nft PATCH RFC] Convert man page source to asciidoc

2017-09-06 Thread Jan Engelhardt

On Wednesday 2017-09-06 11:56, Arturo Borrero Gonzalez wrote:
>
>Regarding the separation of text in different includes, why not creating
>different manpages? Netfilter did this in the past with iptables(8) and
>iptables-extensions(8).

The split into iptables-extensions.8 happened not because of size or topic, but
because iptables.8 and ip6tables.8 had redundant copies of numerous sections of
each other, and they sometimes were quite small sections too (for this reason,
it only has 9 manuals in total, not 96).

>Brainstorming:
>* nft-performance(8) <--- concrete info about nftables sets, maps,
>dicts, concatenations, etc.

those seem more like "datatypes" than "performance".
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [nft PATCH RFC] Convert man page source to asciidoc

2017-09-06 Thread Pablo Neira Ayuso
Hi Arturo,

On Wed, Sep 06, 2017 at 11:56:37AM +0200, Arturo Borrero Gonzalez wrote:
> On 6 September 2017 at 10:41, Phil Sutter  wrote:
> > Beware: The conversion is incomplete and merely serves as base for
> > discussion.
> >
> > This patch converts nft.xml into asciidoc markup, top down until (and
> > including) stateful objects description. I stopped there because it's
> > the first chance of demonstrating my idea of splitting the documentation
> > into smaller pieces for convenience and maintainability.
> >
> > Regarding package dependencies, this "just" exchanges docbook with
> > asciidoc - dblatex is still required for PDF creation.
> >
> 
> Hi Phil,
> 
> thanks for your initiative and hard work, it's really appreciated :-)
> 
> Regarding the change, why asciidoc? why not markdown, or org-mode or
> reStructuredText?
> There are many markup languages, it reminds me to xkcd #927 [0].
> I would prefer if we stick to groff, which seems to be the standard in Linux.

Not willing to create a large thread debating what markup language
choice is better here, but I think asciidoc is a reasonable option.

> Regarding the separation of text in different includes, why not
> creating different manpages?
> Netfilter did this in the past with iptables(8) and iptables-extensions(8).
> 
> Brainstorming:
> * nft(8) <-- main document, general info
> * nft-ct(8) <-- concrete info for ct objects
> * nft-counter(8) <-- concrete info for counter objets
> * nft-flowtables(8) <-- about flow tables
> * nft-quota(8) <--- concrete info for quotas
> * nft-performance(8) <--- concrete info about nftables sets, maps,
> dicts, concatenations, etc.
> * nft-ha(8) <--- for HA environments, loadbalancing etc

Splitting manpage is a sensible option if we consider it's getting too
large, we can do this in incremental steps after the conversion.

Cheers.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [nft PATCH RFC] Convert man page source to asciidoc

2017-09-06 Thread Arturo Borrero Gonzalez
On 6 September 2017 at 10:41, Phil Sutter  wrote:
> Beware: The conversion is incomplete and merely serves as base for
> discussion.
>
> This patch converts nft.xml into asciidoc markup, top down until (and
> including) stateful objects description. I stopped there because it's
> the first chance of demonstrating my idea of splitting the documentation
> into smaller pieces for convenience and maintainability.
>
> Regarding package dependencies, this "just" exchanges docbook with
> asciidoc - dblatex is still required for PDF creation.
>

Hi Phil,

thanks for your initiative and hard work, it's really appreciated :-)

Regarding the change, why asciidoc? why not markdown, or org-mode or
reStructuredText?
There are many markup languages, it reminds me to xkcd #927 [0].
I would prefer if we stick to groff, which seems to be the standard in Linux.

Regarding the separation of text in different includes, why not
creating different manpages?
Netfilter did this in the past with iptables(8) and iptables-extensions(8).

Brainstorming:
* nft(8) <-- main document, general info
* nft-ct(8) <-- concrete info for ct objects
* nft-counter(8) <-- concrete info for counter objets
* nft-flowtables(8) <-- about flow tables
* nft-quota(8) <--- concrete info for quotas
* nft-performance(8) <--- concrete info about nftables sets, maps,
dicts, concatenations, etc.
* nft-ha(8) <--- for HA environments, loadbalancing etc

[0] https://xkcd.com/927/
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html