Re: [PATCH nf] netfilter: core: remove erroneous warn_on
Florian Westphalwrites: > kernel test robot reported: > > WARNING: CPU: 0 PID: 1244 at net/netfilter/core.c:218 > __nf_hook_entries_try_shrink+0x49/0xcd > [..] > > After allowing batching in nf_unregister_net_hooks its possible that an > earlier > call to __nf_hook_entries_try_shrink already compacted the list. > If this happens we don't need to do anything. > > Fixes: d3ad2c17b4047 ("netfilter: core: batch nf_unregister_net_hooks > synchronize_net calls") > Reported-by: kernel test robot > Signed-off-by: Florian Westphal > --- Acked-by: Aaron Conole -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] netfilter: xt_hashlimit: avoid 64-bit division
On Wed, Sep 6, 2017 at 10:22 PM, Vishwanath Paiwrote: > On 09/06/2017 03:57 PM, Arnd Bergmann wrote: >> 64-bit division is expensive on 32-bit architectures, and >> requires a special function call to avoid a link error like: >> >> net/netfilter/xt_hashlimit.o: In function `hashlimit_mt_common': >> xt_hashlimit.c:(.text+0x1328): undefined reference to `__aeabi_uldivmod' >> >> In the case of hashlimit_mt_common, we don't actually need a >> 64-bit operation, we can simply rewrite the function slightly >> to make that clear to the compiler. >> >> Fixes: bea74641e378 ("netfilter: xt_hashlimit: add rate match mode") >> Signed-off-by: Arnd Bergmann >> --- >> net/netfilter/xt_hashlimit.c | 5 - >> 1 file changed, 4 insertions(+), 1 deletion(-) >> >> diff --git a/net/netfilter/xt_hashlimit.c b/net/netfilter/xt_hashlimit.c >> index 10d48234f5f4..50b53d86eef5 100644 >> --- a/net/netfilter/xt_hashlimit.c >> +++ b/net/netfilter/xt_hashlimit.c >> @@ -531,7 +531,10 @@ static u64 user2rate_bytes(u64 user) >> { >> u64 r; >> >> - r = user ? 0xULL / user : 0xULL; >> + if (user > 0xULL) >> + return 0; >> + >> + r = user ? 0xULL / (u32)user : 0xULL; >> r = (r - 1) << 4; >> return r; >> } >> > > I have submitted another patch to fix this: > https://patchwork.ozlabs.org/patch/809881/ > > We have seen this problem before, I was careful not to introduce this > again in the new patch but clearly I overlooked this particular line :( > > In the other cases we fixed it by replacing division with div64_u64(). div64_u64() seems needlessly expensive here since the dividend is known to be a 32-bit number. I guess the function is not called frequently though, so it doesn't matter much. Arnd -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 13/14] netfilter/ipvs: Use %pS printk format for direct addresses
The debug and error printk functions in ipvs uses wrongly the %pF instead of the %pS printk format specifier for printing symbols for the address returned by _builtin_return_address(0). Fix it for the ia64, ppc64 and parisc64 architectures. Signed-off-by: Helge DellerCc: Wensong Zhang Cc: net...@vger.kernel.org Cc: lvs-de...@vger.kernel.org Cc: netfilter-devel@vger.kernel.org --- net/netfilter/ipvs/ip_vs_conn.c | 2 +- net/netfilter/ipvs/ip_vs_ctl.c | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/net/netfilter/ipvs/ip_vs_conn.c b/net/netfilter/ipvs/ip_vs_conn.c index 3d2ac71a..f73561c 100644 --- a/net/netfilter/ipvs/ip_vs_conn.c +++ b/net/netfilter/ipvs/ip_vs_conn.c @@ -185,7 +185,7 @@ static inline int ip_vs_conn_hash(struct ip_vs_conn *cp) hlist_add_head_rcu(>c_list, _vs_conn_tab[hash]); ret = 1; } else { - pr_err("%s(): request for already hashed, called from %pF\n", + pr_err("%s(): request for already hashed, called from %pS\n", __func__, __builtin_return_address(0)); ret = 0; } diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c index 1fa3c23..88fc58a 100644 --- a/net/netfilter/ipvs/ip_vs_ctl.c +++ b/net/netfilter/ipvs/ip_vs_ctl.c @@ -300,7 +300,7 @@ static int ip_vs_svc_hash(struct ip_vs_service *svc) unsigned int hash; if (svc->flags & IP_VS_SVC_F_HASHED) { - pr_err("%s(): request for already hashed, called from %pF\n", + pr_err("%s(): request for already hashed, called from %pS\n", __func__, __builtin_return_address(0)); return 0; } @@ -334,7 +334,7 @@ static int ip_vs_svc_hash(struct ip_vs_service *svc) static int ip_vs_svc_unhash(struct ip_vs_service *svc) { if (!(svc->flags & IP_VS_SVC_F_HASHED)) { - pr_err("%s(): request for unhash flagged, called from %pF\n", + pr_err("%s(): request for unhash flagged, called from %pS\n", __func__, __builtin_return_address(0)); return 0; } -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] netfilter: xt_hashlimit: avoid 64-bit division
On 09/06/2017 03:57 PM, Arnd Bergmann wrote: > 64-bit division is expensive on 32-bit architectures, and > requires a special function call to avoid a link error like: > > net/netfilter/xt_hashlimit.o: In function `hashlimit_mt_common': > xt_hashlimit.c:(.text+0x1328): undefined reference to `__aeabi_uldivmod' > > In the case of hashlimit_mt_common, we don't actually need a > 64-bit operation, we can simply rewrite the function slightly > to make that clear to the compiler. > > Fixes: bea74641e378 ("netfilter: xt_hashlimit: add rate match mode") > Signed-off-by: Arnd Bergmann> --- > net/netfilter/xt_hashlimit.c | 5 - > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/net/netfilter/xt_hashlimit.c b/net/netfilter/xt_hashlimit.c > index 10d48234f5f4..50b53d86eef5 100644 > --- a/net/netfilter/xt_hashlimit.c > +++ b/net/netfilter/xt_hashlimit.c > @@ -531,7 +531,10 @@ static u64 user2rate_bytes(u64 user) > { > u64 r; > > - r = user ? 0xULL / user : 0xULL; > + if (user > 0xULL) > + return 0; > + > + r = user ? 0xULL / (u32)user : 0xULL; > r = (r - 1) << 4; > return r; > } > I have submitted another patch to fix this: https://patchwork.ozlabs.org/patch/809881/ We have seen this problem before, I was careful not to introduce this again in the new patch but clearly I overlooked this particular line :( In the other cases we fixed it by replacing division with div64_u64(). -Vishwanath -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] netfilter: xt_hashlimit: avoid 64-bit division
64-bit division is expensive on 32-bit architectures, and requires a special function call to avoid a link error like: net/netfilter/xt_hashlimit.o: In function `hashlimit_mt_common': xt_hashlimit.c:(.text+0x1328): undefined reference to `__aeabi_uldivmod' In the case of hashlimit_mt_common, we don't actually need a 64-bit operation, we can simply rewrite the function slightly to make that clear to the compiler. Fixes: bea74641e378 ("netfilter: xt_hashlimit: add rate match mode") Signed-off-by: Arnd Bergmann--- net/netfilter/xt_hashlimit.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/net/netfilter/xt_hashlimit.c b/net/netfilter/xt_hashlimit.c index 10d48234f5f4..50b53d86eef5 100644 --- a/net/netfilter/xt_hashlimit.c +++ b/net/netfilter/xt_hashlimit.c @@ -531,7 +531,10 @@ static u64 user2rate_bytes(u64 user) { u64 r; - r = user ? 0xULL / user : 0xULL; + if (user > 0xULL) + return 0; + + r = user ? 0xULL / (u32)user : 0xULL; r = (r - 1) << 4; return r; } -- 2.9.0 -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [nft PATCH RFC] Convert man page source to asciidoc
On Wednesday 2017-09-06 16:02, Phil Sutter wrote: >> Knowing that, people just avoid them most of the time for groff - and if I >> may >> say so, it has not reduced the document quality. > >Right now, nft.8 makes extensive use of tables which is why I considered >proper table support an important feature. OTOH I didn't experience >rendering issues with them in nft.8, did you? I do. For some reason, the right margin of the table is 60% of the terminal width. (width 120 => table up to col #63, terminal 80 cols => table up to col #50). ┌┬─┐ │Hook│ Description │ ├┼─┤ │prerouting │ All packets entering the system are │ ││ processed by the prerouting hook. It is │ ││ invoked before the routing process and │ ││ is used for early filtering or changing │ ││ packet attributes that affect routing. │ ├┼─┤ │input │ Packets delivered to the local system │ ││ are processed by the input hook.│ ├┼─┤ │forward │ Packets forwarded to a different host │ ││ are processed by the forward hook. │ ├┼─┤ │output │ Packets sent by local processes are │ ││ processed by the output hook. │ ├┼─┤ │postrouting │ All packets leaving the system are pro- │ ││ cessed by the postrouting hook. │ └┴─┘ However, what I tried to express is the problem with tables is that one column has almost no text and another has a lot of it, like this one. This causes large amounts of whitespace in column 1 in this case. Personally, I would have placed the hooks and their explanations as (the equivalent of) .TP \fBprerouting\fP All packets enter ... .TP \fBinput\fP Packets delivered ... Or if the source material was HTML, perhaps just prerouting: All packets enter... input: Packets delivered... The rectangular grid is overkill, because you are not trying to do a figurative police lineup (leave tables for e.g. https://en.wikipedia.org/wiki/Comparison_of_Linux_distributions#Instruction_set_architecture_support ) In another way, I could also ask why hooks are in tables, and commands ("add", "flush", etc.) are in TP-style. What makes hooks so special that they deserve a table, despite their keyword being the same short length as the commands. >> >> There are many markup languages, it reminds me to xkcd #927 [0]. >> > >> >Well, the difference here is that I'm not inventing anything new but >> >search for better options amongst the existing solutions. :P >> >> That would be to stay with docbook then, because RST/MD/A2 do not seem to >> have >> left themselves a lot of room for later extension. > >What extensions do you have in mind? I kind of left that open; it seemed you implied there is something (the "better option") that MD couldn't do that asciidoc could. That thing, docbook should already have. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [nft PATCH RFC] Convert man page source to asciidoc
Hi, On Wed, Sep 06, 2017 at 03:25:51PM +0200, Jan Engelhardt wrote: > On Wednesday 2017-09-06 13:58, Phil Sutter wrote: > > > >Regarding reStructuredText, did you look at how tables are written > >there? If not, see here[2]. I really think that speaks for itself. > > Markup is the least problem. Tables, when rendered, have a tendency to quickly > grow too large for the display container because there is just so much data to > show. It really matters if the markup specs suggest to draw ASCII art pictures. The real problem though is that reStructuredText is meant to be read as is and that is not required nor wanted in our case since we generate a man page anyway. > Knowing that, people just avoid them most of the time for groff - and if I may > say so, it has not reduced the document quality. Right now, nft.8 makes extensive use of tables which is why I considered proper table support an important feature. OTOH I didn't experience rendering issues with them in nft.8, did you? > >> There are many markup languages, it reminds me to xkcd #927 [0]. > > > >Well, the difference here is that I'm not inventing anything new but > >search for better options amongst the existing solutions. :P > > That would be to stay with docbook then, because RST/MD/A2 do not seem to have > left themselves a lot of room for later extension. What extensions do you have in mind? > >> I would prefer if we stick to groff, which seems to be the standard in > >> Linux. > > > >Yes, this is the very basic alternative but as said I think providing > >users with an easier to use markup makes sense. > > Users read the rendered text, not the markup. Yes, you are right of course. I mixed up users with contributors again. > All of the glorious old markups are "unreadable" enough to be fed through a > program to give a visually-cooked, markup-free rendering (browser, > /usr/bin/man, evince, etc.) MediaWiki, RST, MD and asciidoc on the other hand > however seem to propagate the use of just cat/less with very short markup, > which, to a user, is not too bad to look at, but also not too visually > pleasing > either, giving the worst of both worlds :-p Now you seem to mix up users and contributors: I never suggested to provide documentation to users in a different format than roff (rendered by 'man'), but to provide people editing the documentation with something more intuitive than docbook or roff. Cheers, Phil -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [nft PATCH RFC] Convert man page source to asciidoc
On Wednesday 2017-09-06 13:58, Phil Sutter wrote: > >Regarding reStructuredText, did you look at how tables are written >there? If not, see here[2]. I really think that speaks for itself. Markup is the least problem. Tables, when rendered, have a tendency to quickly grow too large for the display container because there is just so much data to show. Knowing that, people just avoid them most of the time for groff - and if I may say so, it has not reduced the document quality. >> There are many markup languages, it reminds me to xkcd #927 [0]. > >Well, the difference here is that I'm not inventing anything new but >search for better options amongst the existing solutions. :P That would be to stay with docbook then, because RST/MD/A2 do not seem to have left themselves a lot of room for later extension. >> I would prefer if we stick to groff, which seems to be the standard in Linux. > >Yes, this is the very basic alternative but as said I think providing >users with an easier to use markup makes sense. Users read the rendered text, not the markup. All of the glorious old markups are "unreadable" enough to be fed through a program to give a visually-cooked, markup-free rendering (browser, /usr/bin/man, evince, etc.) MediaWiki, RST, MD and asciidoc on the other hand however seem to propagate the use of just cat/less with very short markup, which, to a user, is not too bad to look at, but also not too visually pleasing either, giving the worst of both worlds :-p -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH nf 1/2] netfilter: nat: Revert "netfilter: nat: convert nat bysrc hash to rhashtable"
Dao Quang Minhwrote: > Thanks for the patch ! > > > rhlist removal however requires a rhlist walk. > We can have thousands of entries in such a list if source port/addresses > are reused for multiple flows, if this happens removal requests are so > expensive that deletions of a few thousand flows can take several > seconds(!). > > Do you think rhlist can use double linked list instead of single linked so > removal doesn't need to walk the whole chain ? Yes, but: > > I tried to add hlist_node to rhlist to speed up rhltable_remove but this > > isn't doable without changing semantics. rhltable_remove_fast will > > check that the to-be-deleted object is part of the table and that > > requires a list walk that we want to avoid. > > > > Furthermore, using hlist_node increases size of struct rhlist_head, which > > in turn increases nf_conn size. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH nf] netfilter: core: remove erroneous warn_on
kernel test robot reported: WARNING: CPU: 0 PID: 1244 at net/netfilter/core.c:218 __nf_hook_entries_try_shrink+0x49/0xcd [..] After allowing batching in nf_unregister_net_hooks its possible that an earlier call to __nf_hook_entries_try_shrink already compacted the list. If this happens we don't need to do anything. Fixes: d3ad2c17b4047 ("netfilter: core: batch nf_unregister_net_hooks synchronize_net calls") Reported-by: kernel test robotSigned-off-by: Florian Westphal --- net/netfilter/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/netfilter/core.c b/net/netfilter/core.c index 04fe25abc5f6..52cd2901a097 100644 --- a/net/netfilter/core.c +++ b/net/netfilter/core.c @@ -215,7 +215,7 @@ static void *__nf_hook_entries_try_shrink(struct nf_hook_entries __rcu **pp) if (skip == hook_entries) goto out_assign; - if (WARN_ON(skip == 0)) + if (skip == 0) return NULL; hook_entries -= skip; -- 2.13.0 -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: nf_nat_cleanup_conntrack performance with rhlist
Ivan Babrouwrote: > Hello, > > We hit a bug where nf_nat_cleanup_conntrack is taking exponential amount of > time > if you ping a lot of distinct IPs. It seems to be related to hlist -> > rhlist migration. Yes, and its best to revert, I sent patches to do just that. Thanks for the heads-up. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH nf 0/2] netfilter: nat: do not use rhltable
This reverts the conversion of the nat bysource table to rhlist that I did last year. It was a bad idea to begin with. Fast lookup is not essential; in most cases there is no lookup at all because original tuple is not taken and can be used as-is. What needs to be fast is insertion and deletion. With rhlist, deletion is slow as it requires a list traversal. So, switch back to a statically-sized table for bysource hash. include/net/netfilter/nf_conntrack.h |3 include/net/netfilter/nf_nat.h |1 net/netfilter/nf_nat_core.c | 142 --- 3 files changed, 66 insertions(+), 80 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH nf 2/2] netfilter: nat: use keyed locks
no need to serialize on a single lock, we can partition the table and add/delete in parallel to different slots. This restores one of the advantages that got lost with the rhlist revert. Cc: Ivan BabrouSigned-off-by: Florian Westphal --- net/netfilter/nf_nat_core.c | 36 1 file changed, 24 insertions(+), 12 deletions(-) diff --git a/net/netfilter/nf_nat_core.c b/net/netfilter/nf_nat_core.c index 2fb80a4bfb34..ad29637d1b62 100644 --- a/net/netfilter/nf_nat_core.c +++ b/net/netfilter/nf_nat_core.c @@ -30,7 +30,7 @@ #include #include -static DEFINE_SPINLOCK(nf_nat_lock); +static spinlock_t nf_nat_locks[CONNTRACK_LOCKS]; static DEFINE_MUTEX(nf_nat_proto_mutex); static const struct nf_nat_l3proto __rcu *nf_nat_l3protos[NFPROTO_NUMPROTO] @@ -423,13 +423,15 @@ nf_nat_setup_info(struct nf_conn *ct, if (maniptype == NF_NAT_MANIP_SRC) { unsigned int srchash; + spinlock_t *lock; srchash = hash_by_src(net, >tuplehash[IP_CT_DIR_ORIGINAL].tuple); - spin_lock_bh(_nat_lock); + lock = _nat_locks[srchash % ARRAY_SIZE(nf_nat_locks)]; + spin_lock_bh(lock); hlist_add_head_rcu(>nat_bysource, _nat_bysource[srchash]); - spin_unlock_bh(_nat_lock); + spin_unlock_bh(lock); } /* It's done. */ @@ -523,6 +525,16 @@ static int nf_nat_proto_remove(struct nf_conn *i, void *data) return i->status & IPS_NAT_MASK ? 1 : 0; } +static void __nf_nat_cleanup_conntrack(struct nf_conn *ct) +{ + unsigned int h; + + h = hash_by_src(nf_ct_net(ct), >tuplehash[IP_CT_DIR_ORIGINAL].tuple); + spin_lock_bh(_nat_locks[h % ARRAY_SIZE(nf_nat_locks)]); + hlist_del_rcu(>nat_bysource); + spin_unlock_bh(_nat_locks[h % ARRAY_SIZE(nf_nat_locks)]); +} + static int nf_nat_proto_clean(struct nf_conn *ct, void *data) { if (nf_nat_proto_remove(ct, data)) @@ -538,9 +550,7 @@ static int nf_nat_proto_clean(struct nf_conn *ct, void *data) * will delete entry from already-freed table. */ clear_bit(IPS_SRC_NAT_DONE_BIT, >status); - spin_lock_bh(_nat_lock); - hlist_del_rcu(>nat_bysource); - spin_unlock_bh(_nat_lock); + __nf_nat_cleanup_conntrack(ct); /* don't delete conntrack. Although that would make things a lot * simpler, we'd end up flushing all conntracks on nat rmmod. @@ -668,11 +678,8 @@ EXPORT_SYMBOL_GPL(nf_nat_l3proto_unregister); /* No one using conntrack by the time this called. */ static void nf_nat_cleanup_conntrack(struct nf_conn *ct) { - if (ct->status & IPS_SRC_NAT_DONE) { - spin_lock_bh(_nat_lock); - hlist_del_rcu(>nat_bysource); - spin_unlock_bh(_nat_lock); - } + if (ct->status & IPS_SRC_NAT_DONE) + __nf_nat_cleanup_conntrack(ct); } static struct nf_ct_ext_type nat_extend __read_mostly = { @@ -794,10 +801,12 @@ static struct nf_ct_helper_expectfn follow_master_nat = { static int __init nf_nat_init(void) { - int ret; + int ret, i; /* Leave them the same for the moment. */ nf_nat_htable_size = nf_conntrack_htable_size; + if (nf_nat_htable_size < ARRAY_SIZE(nf_nat_locks)) + nf_nat_htable_size = ARRAY_SIZE(nf_nat_locks); nf_nat_bysource = nf_ct_alloc_hashtable(_nat_htable_size, 0); if (!nf_nat_bysource) @@ -810,6 +819,9 @@ static int __init nf_nat_init(void) return ret; } + for (i = 0; i < ARRAY_SIZE(nf_nat_locks); i++) + spin_lock_init(_nat_locks[i]); + nf_ct_helper_expectfn_register(_master_nat); BUG_ON(nfnetlink_parse_nat_setup_hook != NULL); -- 2.13.0 -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH nf 1/2] netfilter: nat: Revert "netfilter: nat: convert nat bysrc hash to rhashtable"
This reverts commit 870190a9ec9075205c0fa795a09fa931694a3ff1. It was not a good idea. The custom hash table was a much better fit for this purpose. A fast lookup is not essential, in fact for most cases there is no lookup at all because original tuple is not taken and can be used as-is. What needs to be fast is insertion and deletion. rhlist removal however requires a rhlist walk. We can have thousands of entries in such a list if source port/addresses are reused for multiple flows, if this happens removal requests are so expensive that deletions of a few thousand flows can take several seconds(!). The advantages that we got from rhashtable are: 1) table auto-sizing 2) multiple locks 1) would be nice to have, but it is not essential as we have at most one lookup per new flow, so even a million flows in the bysource table are not a problem compared to current deletion cost. 2) is easy to add to custom hash table. I tried to add hlist_node to rhlist to speed up rhltable_remove but this isn't doable without changing semantics. rhltable_remove_fast will check that the to-be-deleted object is part of the table and that requires a list walk that we want to avoid. Furthermore, using hlist_node increases size of struct rhlist_head, which in turn increases nf_conn size. Link: https://bugzilla.kernel.org/show_bug.cgi?id=196821 Reported-by: Ivan BabrouSigned-off-by: Florian Westphal --- include/net/netfilter/nf_conntrack.h | 3 +- include/net/netfilter/nf_nat.h | 1 - net/netfilter/nf_nat_core.c | 130 ++- 3 files changed, 54 insertions(+), 80 deletions(-) diff --git a/include/net/netfilter/nf_conntrack.h b/include/net/netfilter/nf_conntrack.h index 48407569585d..a7c2cee39570 100644 --- a/include/net/netfilter/nf_conntrack.h +++ b/include/net/netfilter/nf_conntrack.h @@ -17,7 +17,6 @@ #include #include #include -#include #include #include @@ -83,7 +82,7 @@ struct nf_conn { possible_net_t ct_net; #if IS_ENABLED(CONFIG_NF_NAT) - struct rhlist_head nat_bysource; + struct hlist_node nat_bysource; #endif /* all members below initialized via memset */ u8 __nfct_init_offset[0]; diff --git a/include/net/netfilter/nf_nat.h b/include/net/netfilter/nf_nat.h index 05c82a1a4267..b71701302e61 100644 --- a/include/net/netfilter/nf_nat.h +++ b/include/net/netfilter/nf_nat.h @@ -1,6 +1,5 @@ #ifndef _NF_NAT_H #define _NF_NAT_H -#include #include #include #include diff --git a/net/netfilter/nf_nat_core.c b/net/netfilter/nf_nat_core.c index b1d3740ae36a..2fb80a4bfb34 100644 --- a/net/netfilter/nf_nat_core.c +++ b/net/netfilter/nf_nat_core.c @@ -30,19 +30,17 @@ #include #include +static DEFINE_SPINLOCK(nf_nat_lock); + static DEFINE_MUTEX(nf_nat_proto_mutex); static const struct nf_nat_l3proto __rcu *nf_nat_l3protos[NFPROTO_NUMPROTO] __read_mostly; static const struct nf_nat_l4proto __rcu **nf_nat_l4protos[NFPROTO_NUMPROTO] __read_mostly; -struct nf_nat_conn_key { - const struct net *net; - const struct nf_conntrack_tuple *tuple; - const struct nf_conntrack_zone *zone; -}; - -static struct rhltable nf_nat_bysource_table; +static struct hlist_head *nf_nat_bysource __read_mostly; +static unsigned int nf_nat_htable_size __read_mostly; +static unsigned int nf_nat_hash_rnd __read_mostly; inline const struct nf_nat_l3proto * __nf_nat_l3proto_find(u8 family) @@ -118,17 +116,19 @@ int nf_xfrm_me_harder(struct net *net, struct sk_buff *skb, unsigned int family) EXPORT_SYMBOL(nf_xfrm_me_harder); #endif /* CONFIG_XFRM */ -static u32 nf_nat_bysource_hash(const void *data, u32 len, u32 seed) +/* We keep an extra hash for each conntrack, for fast searching. */ +static unsigned int +hash_by_src(const struct net *n, const struct nf_conntrack_tuple *tuple) { - const struct nf_conntrack_tuple *t; - const struct nf_conn *ct = data; + unsigned int hash; + + get_random_once(_nat_hash_rnd, sizeof(nf_nat_hash_rnd)); - t = >tuplehash[IP_CT_DIR_ORIGINAL].tuple; /* Original src, to ensure we map it consistently if poss. */ + hash = jhash2((u32 *)>src, sizeof(tuple->src) / sizeof(u32), + tuple->dst.protonum ^ nf_nat_hash_rnd ^ net_hash_mix(n)); - seed ^= net_hash_mix(nf_ct_net(ct)); - return jhash2((const u32 *)>src, sizeof(t->src) / sizeof(u32), - t->dst.protonum ^ seed); + return reciprocal_scale(hash, nf_nat_htable_size); } /* Is this tuple already taken? (not by us) */ @@ -184,28 +184,6 @@ same_src(const struct nf_conn *ct, t->src.u.all == tuple->src.u.all); } -static int nf_nat_bysource_cmp(struct rhashtable_compare_arg *arg, - const void *obj) -{ - const struct nf_nat_conn_key *key = arg->key; -
Re: [nft PATCH RFC] Convert man page source to asciidoc
Hi Arturo, On Wed, Sep 06, 2017 at 11:56:37AM +0200, Arturo Borrero Gonzalez wrote: > On 6 September 2017 at 10:41, Phil Sutterwrote: > > Beware: The conversion is incomplete and merely serves as base for > > discussion. > > > > This patch converts nft.xml into asciidoc markup, top down until (and > > including) stateful objects description. I stopped there because it's > > the first chance of demonstrating my idea of splitting the documentation > > into smaller pieces for convenience and maintainability. > > > > Regarding package dependencies, this "just" exchanges docbook with > > asciidoc - dblatex is still required for PDF creation. > > > > Hi Phil, > > thanks for your initiative and hard work, it's really appreciated :-) Thanks! > Regarding the change, why asciidoc? why not markdown, or org-mode or > reStructuredText? Well, asciidoc is basicaly markdown on steroids, I fear we'll soon miss features if we stuck to plain markdown. Also see this URL for a comparison: [1]. I tend to ignore org-mode because I'm not an emacs user, and I guess about 50% of (potential) nftables contributors feel the same about that. :) Regarding reStructuredText, did you look at how tables are written there? If not, see here[2]. I really think that speaks for itself. > There are many markup languages, it reminds me to xkcd #927 [0]. Well, the difference here is that I'm not inventing anything new but search for better options amongst the existing solutions. :P > I would prefer if we stick to groff, which seems to be the standard in Linux. Yes, this is the very basic alternative but as said I think providing users with an easier to use markup makes sense. > Regarding the separation of text in different includes, why not > creating different manpages? > Netfilter did this in the past with iptables(8) and iptables-extensions(8). > > Brainstorming: > * nft(8) <-- main document, general info > * nft-ct(8) <-- concrete info for ct objects > * nft-counter(8) <-- concrete info for counter objets > * nft-flowtables(8) <-- about flow tables > * nft-quota(8) <--- concrete info for quotas > * nft-performance(8) <--- concrete info about nftables sets, maps, > dicts, concatenations, etc. > * nft-ha(8) <--- for HA environments, loadbalancing etc Well, I don't think nft.8's current size requires to split it into smaller files at this point. Cheers, Phil [1] https://github.com/asciidoctor/asciidoctor.org/blob/master/docs/_includes/asciidoc-vs-markdown.adoc [2] http://docutils.sourceforge.net/docs/ref/rst/restructuredtext.html#tables -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
nf_nat_cleanup_conntrack performance with rhlist
Hello, We hit a bug where nf_nat_cleanup_conntrack is taking exponential amount of time if you ping a lot of distinct IPs. It seems to be related to hlist -> rhlist migration. Here's the bugzilla ticket: * https://bugzilla.kernel.org/show_bug.cgi?id=196821 It has a reproduce script and the list of reverts that bring back reasonable performance. Dev mailing list seems like a good place to notify the right people. The first email got rejected because of html subpart, apologies for the duplicate. Thanks. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [nft PATCH RFC] Convert man page source to asciidoc
On Wednesday 2017-09-06 11:56, Arturo Borrero Gonzalez wrote: > >Regarding the separation of text in different includes, why not creating >different manpages? Netfilter did this in the past with iptables(8) and >iptables-extensions(8). The split into iptables-extensions.8 happened not because of size or topic, but because iptables.8 and ip6tables.8 had redundant copies of numerous sections of each other, and they sometimes were quite small sections too (for this reason, it only has 9 manuals in total, not 96). >Brainstorming: >* nft-performance(8) <--- concrete info about nftables sets, maps, >dicts, concatenations, etc. those seem more like "datatypes" than "performance". -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [nft PATCH RFC] Convert man page source to asciidoc
Hi Arturo, On Wed, Sep 06, 2017 at 11:56:37AM +0200, Arturo Borrero Gonzalez wrote: > On 6 September 2017 at 10:41, Phil Sutterwrote: > > Beware: The conversion is incomplete and merely serves as base for > > discussion. > > > > This patch converts nft.xml into asciidoc markup, top down until (and > > including) stateful objects description. I stopped there because it's > > the first chance of demonstrating my idea of splitting the documentation > > into smaller pieces for convenience and maintainability. > > > > Regarding package dependencies, this "just" exchanges docbook with > > asciidoc - dblatex is still required for PDF creation. > > > > Hi Phil, > > thanks for your initiative and hard work, it's really appreciated :-) > > Regarding the change, why asciidoc? why not markdown, or org-mode or > reStructuredText? > There are many markup languages, it reminds me to xkcd #927 [0]. > I would prefer if we stick to groff, which seems to be the standard in Linux. Not willing to create a large thread debating what markup language choice is better here, but I think asciidoc is a reasonable option. > Regarding the separation of text in different includes, why not > creating different manpages? > Netfilter did this in the past with iptables(8) and iptables-extensions(8). > > Brainstorming: > * nft(8) <-- main document, general info > * nft-ct(8) <-- concrete info for ct objects > * nft-counter(8) <-- concrete info for counter objets > * nft-flowtables(8) <-- about flow tables > * nft-quota(8) <--- concrete info for quotas > * nft-performance(8) <--- concrete info about nftables sets, maps, > dicts, concatenations, etc. > * nft-ha(8) <--- for HA environments, loadbalancing etc Splitting manpage is a sensible option if we consider it's getting too large, we can do this in incremental steps after the conversion. Cheers. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [nft PATCH RFC] Convert man page source to asciidoc
On 6 September 2017 at 10:41, Phil Sutterwrote: > Beware: The conversion is incomplete and merely serves as base for > discussion. > > This patch converts nft.xml into asciidoc markup, top down until (and > including) stateful objects description. I stopped there because it's > the first chance of demonstrating my idea of splitting the documentation > into smaller pieces for convenience and maintainability. > > Regarding package dependencies, this "just" exchanges docbook with > asciidoc - dblatex is still required for PDF creation. > Hi Phil, thanks for your initiative and hard work, it's really appreciated :-) Regarding the change, why asciidoc? why not markdown, or org-mode or reStructuredText? There are many markup languages, it reminds me to xkcd #927 [0]. I would prefer if we stick to groff, which seems to be the standard in Linux. Regarding the separation of text in different includes, why not creating different manpages? Netfilter did this in the past with iptables(8) and iptables-extensions(8). Brainstorming: * nft(8) <-- main document, general info * nft-ct(8) <-- concrete info for ct objects * nft-counter(8) <-- concrete info for counter objets * nft-flowtables(8) <-- about flow tables * nft-quota(8) <--- concrete info for quotas * nft-performance(8) <--- concrete info about nftables sets, maps, dicts, concatenations, etc. * nft-ha(8) <--- for HA environments, loadbalancing etc [0] https://xkcd.com/927/ -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html