Re: [PATCH RFC] net/mlx5_en: switch to Toeplitz RSS hash by default

2018-09-06 Thread Konstantin Khlebnikov
On 06.09.2018 08:24, Saeed Mahameed wrote: On Sun, Sep 2, 2018 at 2:55 AM, Konstantin Khlebnikov wrote: On 02.09.2018 12:29, Tariq Toukan wrote: On 31/08/2018 2:29 PM, Konstantin Khlebnikov wrote: XOR (MLX5_RX_HASH_FN_INVERTED_XOR8) gives only 8 bits. It seems not enough for RFS. All

Re: [PATCH RFC] net/mlx5_en: switch to Toeplitz RSS hash by default

2018-09-02 Thread Konstantin Khlebnikov
On 02.09.2018 12:29, Tariq Toukan wrote: On 31/08/2018 2:29 PM, Konstantin Khlebnikov wrote: XOR (MLX5_RX_HASH_FN_INVERTED_XOR8) gives only 8 bits. It seems not enough for RFS. All other drivers use toeplitz. Driver mlx4_en uses Toeplitz by default and warns if hash XOR is used together

[PATCH RFC] net/mlx5_en: switch to Toeplitz RSS hash by default

2018-08-31 Thread Konstantin Khlebnikov
g can limit RPS functionality". XOR is default in mlx5_en since commit 2be6967cdbc9 ("net/mlx5e: Support ETH_RSS_HASH_XOR"). Hash function could be set via ethtool. But it would be nice to have single standard for drivers or proper description why this one is special. Signed-off-by: K

Re: [BUG] mlx5 have problems with ipv4-ipv6 tunnels in linux 4.4

2018-07-10 Thread Konstantin Khlebnikov
On 10.07.2018 01:31, Saeed Mahameed wrote: On Tue, Jul 3, 2018 at 10:45 PM, Konstantin Khlebnikov wrote: I'm seeing problems with tunnelled traffic with Mellanox Technologies MT27710 Family [ConnectX-4 Lx] using vanilla driver from linux 4.4.y Packets with payload bigger than 116 bytes

[BUG] mlx5 have problems with ipv4-ipv6 tunnels in linux 4.4

2018-07-03 Thread Konstantin Khlebnikov
I'm seeing problems with tunnelled traffic with Mellanox Technologies MT27710 Family [ConnectX-4 Lx] using vanilla driver from linux 4.4.y Packets with payload bigger than 116 bytes are not exmited. Smaller packets and normal ipv6 works fine. In linux 4.9, 4.14 and out-of-tree driver

Re: [PATCH] net_sched: blackhole: tell upper qdisc about dropped packets

2018-06-15 Thread Konstantin Khlebnikov
On 15.06.2018 16:13, Eric Dumazet wrote: On 06/15/2018 03:27 AM, Konstantin Khlebnikov wrote: When blackhole is used on top of classful qdisc like hfsc it breaks qlen and backlog counters because packets are disappear without notice. In HFSC non-zero qlen while all classes are inactive

[PATCH] net_sched: blackhole: tell upper qdisc about dropped packets

2018-06-15 Thread Konstantin Khlebnikov
] and schedules watchdog work endlessly. This patch return __NET_XMIT_BYPASS in addition to NET_XMIT_SUCCESS, this flag tells upper layer: this packet is gone and isn't queued. Signed-off-by: Konstantin Khlebnikov --- net/sched/sch_blackhole.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff

Re: Repeating "unregister_netdevice: waiting for lo to become free" caused by upstream 76da0704507bb ("ipv6: only call ip6_route_dev_notify() once for NETDEV_UNREGISTER")

2018-04-25 Thread Konstantin Khlebnikov
On 25.04.2018 17:16, Rafał Miłecki wrote: On 23.04.2018 15:08, Rafał Miłecki wrote: I've just updated my kernel 4.4.x and noticed a regression. Bisecting pointed me to the commit 2417da3f4d6bc ("ipv6: only call ip6_route_dev_notify() once for NETDEV_UNREGISTER") [0] which is backport of

Re: 4.4.103 linux kernel regression

2017-12-24 Thread Konstantin Khlebnikov
on the latest - but I can give it a try. Regards Mathias On Sat, 23 Dec 2017, 17:36 Konstantin Khlebnikov, <khlebni...@yandex-team.ru <mailto:khlebni...@yandex-team.ru>> wrote: On 23.12.2017 16:52, Greg KH wrote: > adding stable@ and netdev@ > > On Sat, Dec 23, 201

Re: 4.4.103 linux kernel regression

2017-12-23 Thread Konstantin Khlebnikov
, please try debug patch from attachment. It logs all refcount changes for loopback in non-host net namespace. Hopefully log would will be tiny and show what is missing. Looks like vsftpd creates and destroys empty net-ns, like "unshare -n true" net: debug lo refcnt From: Konstantin Khlebniko

Re: [PATCH] iptables: ip6t_MASQUERADE: add dependency on conntrack module

2017-12-15 Thread Konstantin Khlebnikov
On 11.12.2017 18:47, Pablo Neira Ayuso wrote: On Mon, Dec 11, 2017 at 06:19:33PM +0300, Konstantin Khlebnikov wrote: After commit 4d3a57f23dec ("netfilter: conntrack: do not enable connection tracking unless needed") conntrack is disabled by default unless some module explicitl

[PATCH] iptables: ip6t_MASQUERADE: add dependency on conntrack module

2017-12-11 Thread Konstantin Khlebnikov
After commit 4d3a57f23dec ("netfilter: conntrack: do not enable connection tracking unless needed") conntrack is disabled by default unless some module explicitly declares dependency in particular network namespace. Signed-off-by: Konstantin Khlebnikov <khlebni...@yandex-t

[PATCH] tcp_nv: use do_div() instead of expensive div64_u64()

2017-11-02 Thread Konstantin Khlebnikov
Average RTT is 32-bit thus full 64-bit division is redundant. Signed-off-by: Konstantin Khlebnikov <khlebni...@yandex-team.ru> Suggested-by: Stephen Hemminger <step...@networkplumber.org> Suggested-by: Eric Dumazet <eric.duma...@gmail.com> --- net/ipv4/tcp_nv.c |7 ---

[PATCH] tcp_nv: fix division by zero in tcpnv_acked()

2017-11-01 Thread Konstantin Khlebnikov
Average RTT could become zero. This happened in real life at least twice. This patch treats zero as 1us. Signed-off-by: Konstantin Khlebnikov <khlebni...@yandex-team.ru> --- net/ipv4/tcp_nv.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv4/tcp_nv.c b/ne

[BUG] division by zero in tcpnv_acked()

2017-10-30 Thread Konstantin Khlebnikov
I've got this on two different machines: [ 24.405015] divide error: [#1] SMP [ 24.405403] Modules linked in: nf_log_ipv6 nf_log_common xt_LOG xt_u32 ip6t_REJECT nf_reject_ipv6 xt_conntrack ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_filter ip6_tables xt_tcpudp

[PATCH] net_sched/hfsc: fix curve activation in hfsc_change_class()

2017-09-20 Thread Konstantin Khlebnikov
If real-time or fair-share curves are enabled in hfsc_change_class() class isn't inserted into rb-trees yet. Thus init_ed() and init_vf() must be called in place of update_ed() and update_vf(). Remove isn't required because for now curves cannot be disabled. Signed-off-by: Konstantin Khlebnikov

[PATCH] net_sched: always reset qdisc backlog in qdisc_reset()

2017-09-20 Thread Konstantin Khlebnikov
SKB stored in qdisc->gso_skb also counted into backlog. Some qdiscs don't reset backlog to zero in ->reset(), for example sfq just dequeue and free all queued skb. Signed-off-by: Konstantin Khlebnikov <khlebni...@yandex-team.ru> Fixes: 2f5fb43f ("net_sched: update hierarc

Re: [PATCH] mm/vmstats: add counters for the page frag cache

2017-09-04 Thread Konstantin Khlebnikov
pen and would remove pgfrag_alloc_calls and pgfrag_free_calls. Thanks, Kyeongdon Kim On 2017-09-01 오후 6:12, Konstantin Khlebnikov wrote: IMHO that's too much counters. Per-node NR_FRAGMENT_PAGES should be enough for guessing what's going on. Perf probes provides enough features for furhter debugging.

[PATCH RFC] net_sched/codel: do not defer queue length update

2017-08-21 Thread Konstantin Khlebnikov
roblem in HFSC - now operation peek could fail and deactivate parent class. Signed-off-by: Konstantin Khlebnikov <khlebni...@yandex-team.ru> Link: https://bugzilla.kernel.org/show_bug.cgi?id=109581 --- net/sched/sch_codel.c| 14 ++ net/sched/sch_fq_co

[PATCH] net_sched/hhf: update hierarchical backlog when drop packet

2017-08-21 Thread Konstantin Khlebnikov
When hhf_enqueue() drops packet from another bucket it have to update backlog at upper qdiscs too. Signed-off-by: Konstantin Khlebnikov <khlebni...@yandex-team.ru> Fixes: 2f5fb43f ("net_sched: update hierarchical backlog too") --- net/sched/sch_hhf.c |5 - 1

Re: [PATCH] net_sched: fix order of queue length updates in qdisc_replace()

2017-08-19 Thread Konstantin Khlebnikov
:37, Konstantin Khlebnikov wrote: This important to call qdisc_tree_reduce_backlog() after changing queue length. Parent qdisc should deactivate class in ->qlen_notify() called from qdisc_tree_reduce_backlog() but this happens only if qdisc->q.qlen in zero. Missed class deactivations leads t

[PATCH] net_sched: fix order of queue length updates in qdisc_replace()

2017-08-19 Thread Konstantin Khlebnikov
ackets from empty qdisc and corrupting state at reactivating this class in future. Signed-off-by: Konstantin Khlebnikov <khlebni...@yandex-team.ru> Fixes: 86a7996cc8a0 ("net_sched: introduce qdisc_replace() helper") Cc: Stable <sta...@vger.kernel.org> --- include/net/sch_ge

Re: [PATCH 1/2] net_sched: call qlen_notify only if child qdisc is empty

2017-08-16 Thread Konstantin Khlebnikov
On 16.08.2017 20:22, Cong Wang wrote: On Tue, Aug 15, 2017 at 6:39 AM, Konstantin Khlebnikov <khlebni...@yandex-team.ru> wrote: This callback is used for deactivating class in parent qdisc. This is cheaper to test queue length right here. Also this allows to catch draining screwed b

Re: [PATCH] net_sched/sfq: update hierarchical backlog when drop packet

2017-08-15 Thread Konstantin Khlebnikov
On 15.08.2017 17:09, Eric Dumazet wrote: On Tue, 2017-08-15 at 16:37 +0300, Konstantin Khlebnikov wrote: When sfq_enqueue() drops head packet or packet from another queue it have to update backlog at upper qdiscs too. Signed-off-by: Konstantin Khlebnikov <khlebni...@yandex-team.ru>

Re: [PATCH] net/sched: reset block pointer in tcf_block_put()

2017-08-15 Thread Konstantin Khlebnikov
On 15.08.2017 00:15, Cong Wang wrote: On Mon, Aug 14, 2017 at 5:59 AM, Konstantin Khlebnikov <khlebni...@yandex-team.ru> wrote: This should work, I suppose. But this approach requires careful review for all qdisc, mine is completely mechanical. Well, we don't have many classful q

[PATCH 2/2] net_sched/hfsc: opencode trivial set_active() and set_passive()

2017-08-15 Thread Konstantin Khlebnikov
Any move comment abount update_vf() into right place. Signed-off-by: Konstantin Khlebnikov <khlebni...@yandex-team.ru> --- net/sched/sch_hfsc.c | 45 - 1 file changed, 16 insertions(+), 29 deletions(-) diff --git a/net/sched/sch_hfsc.c b/net

[PATCH] net_sched: remove warning from qdisc_hash_add

2017-08-15 Thread Konstantin Khlebnikov
ful qdisc is added to inactive device because default qdiscs are added before switching root qdisc. Anyway after commit ea3274695353 ("net: sched: avoid duplicates in qdisc dump") duplicates are filtered right in dumper. Signed-off-by: Konstantin Khlebnikov <khlebni...@yandex-team.ru&g

[PATCH] net_sched/sfq: update hierarchical backlog when drop packet

2017-08-15 Thread Konstantin Khlebnikov
When sfq_enqueue() drops head packet or packet from another queue it have to update backlog at upper qdiscs too. Signed-off-by: Konstantin Khlebnikov <khlebni...@yandex-team.ru> Fixes: 2f5fb43f ("net_sched: update hierarchical backlog too") --- net/sched/sch_sfq.c |

[PATCH 1/2] net_sched: call qlen_notify only if child qdisc is empty

2017-08-15 Thread Konstantin Khlebnikov
at destruction of child qdisc where no packets but backlog is not zero. Signed-off-by: Konstantin Khlebnikov <khlebni...@yandex-team.ru> --- net/sched/sch_api.c | 10 +- net/sched/sch_cbq.c |3 +-- net/sched/sch_drr.c |3 +-- net/sched/sch_hfsc.c |6 ++ net/sched/sch

[PATCH] net_sched: reset pointers to tcf blocks in classful qdiscs' destructors

2017-08-15 Thread Konstantin Khlebnikov
be called second time. This patch set class->block to NULL after first tcf_block_put() and turn second call into no-op. Signed-off-by: Konstantin Khlebnikov <khlebni...@yandex-team.ru> Fixes: 6529eaba33f0 ("net: sched: introduce tcf block infractructure") --- net/sched/sch_atm.

Re: [PATCH] net/sched: reset block pointer in tcf_block_put()

2017-08-14 Thread Konstantin Khlebnikov
On 12.08.2017 00:38, Cong Wang wrote: On Fri, Aug 11, 2017 at 1:36 PM, Konstantin Khlebnikov <khlebni...@yandex-team.ru> wrote: On 11.08.2017 23:18, Cong Wang wrote: On Thu, Aug 10, 2017 at 2:31 AM, Konstantin Khlebnikov <khlebni...@yandex-team.ru> wrote: In

Re: [PATCH] net/sched: reset block pointer in tcf_block_put()

2017-08-11 Thread Konstantin Khlebnikov
On 11.08.2017 23:18, Cong Wang wrote: On Thu, Aug 10, 2017 at 2:31 AM, Konstantin Khlebnikov <khlebni...@yandex-team.ru> wrote: In previous API tcf_destroy_chain() could be called several times and some schedulers like hfsc and atm use that. In new API tcf_block_put() frees block but

[PATCH] net/sched/hfsc: allocate tcf block for hfsc root class

2017-08-10 Thread Konstantin Khlebnikov
Without this filters cannot be attached. Signed-off-by: Konstantin Khlebnikov <khlebni...@yandex-team.ru> Fixes: 6529eaba33f0 ("net: sched: introduce tcf block infractructure") --- net/sched/sch_hfsc.c |8 1 file changed, 8 insertions(+) diff --git a/net/sched/

[PATCH] net/sched: reset block pointer in tcf_block_put()

2017-08-10 Thread Konstantin Khlebnikov
In previous API tcf_destroy_chain() could be called several times and some schedulers like hfsc and atm use that. In new API tcf_block_put() frees block but leaves stale pointer, second call will free it once again. This patch fixes such double-frees. Signed-off-by: Konstantin Khlebnikov

[PATCH] e1000e: use disable_hardirq() also for MSIX vectors in e1000_netpoll()

2017-05-19 Thread Konstantin Khlebnikov
Replace disable_irq() which waits for threaded irq handlers with disable_hardirq() which waits only for hardirq part. Signed-off-by: Konstantin Khlebnikov <khlebni...@yandex-team.ru> Fixes: 311191297125 ("e1000: use disable_hardirq() for e1000_netpoll()") --- drivers/net/ether

Re: [PATCH net-next] mlx4: support __GFP_MEMALLOC for rx

2017-01-18 Thread Konstantin Khlebnikov
On 18.01.2017 17:23, Eric Dumazet wrote: On Wed, 2017-01-18 at 12:31 +0300, Konstantin Khlebnikov wrote: On 18.01.2017 07:14, Eric Dumazet wrote: From: Eric Dumazet <eduma...@google.com> Commit 04aeb56a1732 ("net/mlx4_en: allocate non 0-order pages for RX ring with __GFP_NOMEMAL

Re: [PATCH net-next] mlx4: support __GFP_MEMALLOC for rx

2017-01-18 Thread Konstantin Khlebnikov
s a straight way to depleting all reserves by flood from network. Note that this driver does not reuse pages (yet) so we do not have to add anything else. Signed-off-by: Eric Dumazet <eduma...@google.com> Cc: Konstantin Khlebnikov <khlebni...@yandex-team.ru> Cc: Tariq Toukan <tar...

[PATCH] net/sched/sch_htb: clamp xstats tokens to fit into 32-bit int

2016-07-16 Thread Konstantin Khlebnikov
hus tool 'tc' prints them as signed. Big values loose higher bits and/or become negative. This patch clamps tokens in xstat into range from INT_MIN to INT_MAX. In this way it's easier to understand what's going on here. Signed-off-by: Konstantin Khlebnikov <khlebni...@yandex-team.ru> ---

[PATCH] cls_cgroup: get sk_classid only from full sockets

2016-04-18 Thread Konstantin Khlebnikov
skb->sk could point to timewait or request socket which has no sk_classid. Detected as "BUG: KASAN: slab-out-of-bounds in cls_cgroup_classify". Signed-off-by: Konstantin Khlebnikov <khlebni...@yandex-team.ru> --- include/net/cls_cgroup.h |7 +-- 1 file changed,

[PATCH 1/2] net/ipv6/addrconf: simplify sysctl registration

2016-04-18 Thread Konstantin Khlebnikov
options are disable in config. Signed-off-by: Konstantin Khlebnikov <khlebni...@yandex-team.ru> --- include/linux/ipv6.h |3 ++- net/ipv6/addrconf.c | 43 +-- 2 files changed, 19 insertions(+), 27 deletions(-) diff --git a/include/linux/ipv6.h b/i

[PATCH] net/mlx4_en: allocate non 0-order pages for RX ring with __GFP_NOMEMALLOC

2016-04-18 Thread Konstantin Khlebnikov
High order pages are optional here since commit 51151a16a60f ("mlx4: allow order-0 memory allocations in RX path"), so here is no reason for depleting reserves. Generic __netdev_alloc_frag() implements the same logic. Signed-off-by: Konstantin Khlebnikov <khlebni...@yandex-team.ru&

[PATCH] net/mlx4_en: do batched put_page using atomic_sub

2016-04-18 Thread Konstantin Khlebnikov
This patch fixes couple error paths after allocation failures. Atomic set of page reference counter is safe only if it is zero, otherwise set can race with any speculative get_page_unless_zero. Signed-off-by: Konstantin Khlebnikov <khlebni...@yandex-team.ru> --- drivers/net/ethernet/me

[PATCH 2/2] net/ipv6/addrconf: fix sysctl table indentation

2016-04-18 Thread Konstantin Khlebnikov
Separated from previous patch for readability. Signed-off-by: Konstantin Khlebnikov <khlebni...@yandex-team.ru> --- net/ipv6/addrconf.c | 616 +-- 1 file changed, 307 insertions(+), 309 deletions(-) diff --git a/net/ipv6/addrconf.c b/ne

Re: [PATCH] ipv4: in new netns initialize sysctls in net.ipv4.conf.* with defaults

2016-02-23 Thread Konstantin Khlebnikov
On Wed, Feb 24, 2016 at 2:21 AM, David Miller <da...@davemloft.net> wrote: > From: Konstantin Khlebnikov <khlebni...@yandex-team.ru> > Date: Sun, 21 Feb 2016 10:11:02 +0300 > >> Currently initial net.ipv4.conf.all.* and net.ipv4.conf.default.* are >> copied fr

[PATCH] ipv4: in new netns initialize sysctls in net.ipv4.conf.* with defaults

2016-02-21 Thread Konstantin Khlebnikov
are enabled. Other sysctls in net.ipv4 and net.ipv6 already initialized with default values at namespace creation. Signed-off-by: Konstantin Khlebnikov <khlebni...@yandex-team.ru> Fixes: 752d14dc6aa9 ("[IPV4]: Move the devinet pointers on the struct net") --- net/ipv4/devinet.c |

[PATCH] tcp: convert cached rtt from usec to jiffies when feeding initial rto

2016-02-21 Thread Konstantin Khlebnikov
Currently it's converted into msecs, thus HZ=1000 intact. Signed-off-by: Konstantin Khlebnikov <khlebni...@yandex-team.ru> Fixes: 740b0f1841f6 ("tcp: switch rtt estimations to usec resolution") --- net/ipv4/tcp_metrics.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-)

Re: [PATCH] ipv4: in new netns initialize sysctls in net.ipv4.conf.* with defaults

2016-02-21 Thread Konstantin Khlebnikov
. However, there is corner case: module with sysctl can be loaded after creation of namespaces. In this case namespaces will get pre-compiled sysctl defaults, and are not be able to adjust them even if they want to do it. Thank you, Vasily Averin On 21.02.2016 10:11, Konstantin Khlebnikov wrote

IPv4/IPv6 sysctl defaults in new namespace

2016-02-15 Thread Konstantin Khlebnikov
IPv6 initialized with default. That's ok. IPv4 makes a copy from init_net. Looks like a bug, here v2.6.24-2577-g752d14dc6aa9 root@zurg:~# sysctl net.ipv4.conf.all.forwarding=0 net.ipv6.conf.all.forwarding=0 net.ipv4.conf.all.forwarding = 0 net.ipv6.conf.all.forwarding = 0 root@zurg:~# unshare -n

[PATCH] mac80211: minstrel_ht: fix out-of-bound in minstrel_ht_set_best_prob_rate

2016-01-29 Thread Konstantin Khlebnikov
Patch fixes this splat BUG: KASAN: slab-out-of-bounds in minstrel_ht_update_stats.isra.7+0x6e1/0x9e0 [mac80211] at addr 8800cee640f4 Read of size 4 by task swapper/3/0 Signed-off-by: Konstantin Khlebnikov <koc...@gmail.com> Link: http://lkml.kernel

Re: [BUG] skb corruption and kernel panic at forwarding with fragmentation

2016-01-07 Thread Konstantin Khlebnikov
On Thu, Jan 7, 2016 at 2:00 PM, Konstantin Khlebnikov <koc...@gmail.com> wrote: > On Thu, Jan 7, 2016 at 2:49 AM, Florian Westphal <f...@strlen.de> wrote: >> Florian Westphal <f...@strlen.de> wrote: >>> Thadeu Lima de Souza Cascardo <casca...@redhat.com>

[BUG] skb corruption and kernel panic at forwarding with fragmentation

2016-01-06 Thread Konstantin Khlebnikov
I've got some of these: [84408.314676] BUG: unable to handle kernel NULL pointer dereference at (null) [84408.317324] IP: [] put_page+0x5/0x50 [84408.319985] PGD 0 [84408.322583] Oops: [#1] SMP [84408.325156] Modules linked in: ppp_mppe ppp_async ppp_generic slhc 8021q fuse nfsd

Re: [BUG] skb corruption and kernel panic at forwarding with fragmentation

2016-01-06 Thread Konstantin Khlebnikov
On Wed, Jan 6, 2016 at 10:59 PM, Cong Wang <xiyou.wangc...@gmail.com> wrote: > On Wed, Jan 6, 2016 at 11:15 AM, Konstantin Khlebnikov <koc...@gmail.com> > wrote: >> Looks like this happens because ip_options_fragment() relies on >> correct ip options len

[PATCH] ip neigh: device is optional for proxy entries

2015-11-30 Thread Konstantin Khlebnikov
Though dumping such entries crashes present kernels. Signed-off-by: Konstantin Khlebnikov <koc...@gmail.com> --- ip/ipneigh.c | 13 - 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/ip/ipneigh.c b/ip/ipneigh.c index 54655842ed38..92b7cd6f2a75 100644 --- a/ip/ipn

[PATCH] net/neighbour: fix crash at dumping device-agnostic proxy entries

2015-11-30 Thread Konstantin Khlebnikov
Proxy entries could have null pointer to net-device. Signed-off-by: Konstantin Khlebnikov <koc...@gmail.com> Fixes: 84920c1420e2 ("net: Allow ipv6 proxies and arp proxies be shown with iproute2") Cc: <sta...@vger.kernel.org> # v3.4 --- net/core/neighbour.c |4 +

[PATCH] ovs: do not allocate memory from offline numa node

2015-10-02 Thread Konstantin Khlebnikov
patch disables numa affinity in this case. Signed-off-by: Konstantin Khlebnikov <khlebni...@yandex-team.ru> --- <4>[ 24.368805] [ cut here ] <2>[ 24.368846] kernel BUG at include/linux/gfp.h:325! <4>[ 24.368868] invalid opcode: [#1]

Re: net: Fix skb_set_peeked use-after-free bug

2015-08-05 Thread Konstantin Khlebnikov
...@gondor.apana.org.au Seems correct. You can add: Reviewed-by: Konstantin Khlebnikov khlebni...@yandex-team.ru Your skb_set_peeked() doesn't set prev/next to NULL when unlinks old skb from the queue unlike to __skb_unlink(). Isn't big deal but nulling might be useful. diff --git a/net/core/datagram.c b

[PATCH v2] cgroup: net_cls: fix false-positive suspicious RCU usage

2015-07-22 Thread Konstantin Khlebnikov
] vfs_write+0xb8/0x190 [ 270.730236] [811fe8c2] SyS_write+0x52/0xb0 [ 270.730239] [817b6bae] entry_SYSCALL_64_fastpath+0x12/0x76 Signed-off-by: Konstantin Khlebnikov khlebni...@yandex-team.ru --- net/core/netclassid_cgroup.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion

Re: [PATCH v2] cgroup: net_cls: fix false-positive suspicious RCU usage

2015-07-22 Thread Konstantin Khlebnikov
On 22.07.2015 14:56, Daniel Borkmann wrote: On 07/22/2015 11:23 AM, Konstantin Khlebnikov wrote: In dev_queue_xmit() net_cls protected with rcu-bh. ... Signed-off-by: Konstantin Khlebnikov khlebni...@yandex-team.ru --- net/core/netclassid_cgroup.c |3 ++- 1 file changed, 2 insertions

[PATCH] cgroup: net_cls: fix false-positive suspicious RCU usage

2015-07-21 Thread Konstantin Khlebnikov
] vfs_write+0xb8/0x190 [ 270.730236] [811fe8c2] SyS_write+0x52/0xb0 [ 270.730239] [817b6bae] entry_SYSCALL_64_fastpath+0x12/0x76 Signed-off-by: Konstantin Khlebnikov khlebni...@yandex-team.ru --- net/core/netclassid_cgroup.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion

[PATCH v2] net: ratelimit warnings about dst entry refcount underflow or overflow

2015-07-17 Thread Konstantin Khlebnikov
. Signed-off-by: Konstantin Khlebnikov khlebni...@yandex-team.ru --- net/core/dst.c |4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/net/core/dst.c b/net/core/dst.c index e956ce6d1378..002144bea935 100644 --- a/net/core/dst.c +++ b/net/core/dst.c @@ -284,7 +284,9 @@ void

[PATCH] net: stop endless flood about dst entry refcount underflow or overflow

2015-07-14 Thread Konstantin Khlebnikov
fixed in upstream. Anyway flood of that warnings completely kills machine and makes further debugging impossible. Signed-off-by: Konstantin Khlebnikov khlebni...@yandex-team.ru --- net/core/dst.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/core/dst.c b/net/core

Re: [PATCH] net: stop endless flood about dst entry refcount underflow or overflow

2015-07-14 Thread Konstantin Khlebnikov
On 14.07.2015 15:04, Eric Dumazet wrote: On Tue, 2015-07-14 at 14:43 +0300, Konstantin Khlebnikov wrote: Kernel generates a lot of warnings when dst entry reference counter overflows and becomes negative. This patch prints address of dst entry, its refcount and then resets reference counter

[PATCH v3 5/5] ipvlan: ignore addresses from ipv6 autoconfiguration

2015-07-14 Thread Konstantin Khlebnikov
/r/20150703125840.24121.91556.stgit@buzz Signed-off-by: Konstantin Khlebnikov khlebni...@yandex-team.ru --- drivers/net/ipvlan/ipvlan_main.c |4 1 file changed, 4 insertions(+) diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c index e995bc501ee6

[PATCH v3 4/5] ipvlan: use rcu_deference_bh() in ipvlan_queue_xmit()

2015-07-14 Thread Konstantin Khlebnikov
xiyou.wangc...@gmail.com Acked-by: Mahesh Bandewar mahe...@google.com Acked-by: Konstantin Khlebnikov khlebni...@yandex-team.ru --- drivers/net/ipvlan/ipvlan.h |5 + drivers/net/ipvlan/ipvlan_core.c |2 +- 2 files changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/net/ipvlan

[PATCH v3 0/5] ipvlan: cleanups and fixes

2015-07-14 Thread Konstantin Khlebnikov
v1: http://comments.gmane.org/gmane.linux.network/363346 v2: http://comments.gmane.org/gmane.linux.network/369086 v3 has reduced set of patches from ipvlan: fix ipv6 autoconfiguration. Here just cleanups and patch which ignores ipv6 notifications from RA. --- Konstantin Khlebnikov (4

[PATCH v3 2/5] ipvlan: plug memory leak in ipvlan_link_delete

2015-07-14 Thread Konstantin Khlebnikov
Add missing kfree_rcu(addr, rcu); Signed-off-by: Konstantin Khlebnikov khlebni...@yandex-team.ru --- drivers/net/ipvlan/ipvlan_main.c |1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c index 048ecf0c76fb..7d81e37c3f76 100644

[PATCH v3 1/5] ipvlan: remove counters of ipv4 and ipv6 addresses

2015-07-14 Thread Konstantin Khlebnikov
They are unused after commit f631c44bbe15 (ipvlan: Always set broadcast bit in multicast filter). Signed-off-by: Konstantin Khlebnikov khlebni...@yandex-team.ru --- drivers/net/ipvlan/ipvlan.h |2 -- drivers/net/ipvlan/ipvlan_main.c | 33 - 2 files

[PATCH v3 3/5] ipvlan: unhash addresses without synchronize_rcu

2015-07-14 Thread Konstantin Khlebnikov
All structures used in traffic forwarding are rcu-protected: ipvl_addr, ipvl_dev and ipvl_port. Thus we can unhash addresses without synchronization. We'll anyway hash it back into the same bucket: in worst case lockless lookup will scan hash once again. Signed-off-by: Konstantin Khlebnikov

Re: [PATCH] netlink: enable skb header refcounting before sending first broadcast

2015-07-13 Thread Konstantin Khlebnikov
On 13.07.2015 10:23, Herbert Xu wrote: On Fri, Jul 10, 2015 at 02:51:41PM +0300, Konstantin Khlebnikov wrote: This fixes race between non-atomic updates of adjacent bit-fields: skb-cloned could be lost because netlink broadcast clones skb after sending it to the first listener who sets skb

[PATCH] netlink: enable skb header refcounting before sending first broadcast

2015-07-10 Thread Konstantin Khlebnikov
it twice. Race leads to double-free in kmalloc-xxx. Signed-off-by: Konstantin Khlebnikov khlebni...@yandex-team.ru Fixes: b19372273164 (net: reorganize sk_buff for faster __copy_skb_header()) --- net/netlink/af_netlink.c |6 ++ 1 file changed, 6 insertions(+) diff --git a/net/netlink

[PATCH] netlink: reset skb-peeked when reuse orphan skb for next broadcast

2015-07-10 Thread Konstantin Khlebnikov
This patch clears skb-peeked set by previous recipient of broadcast. Signed-off-by: Konstantin Khlebnikov khlebni...@yandex-team.ru Fixes: add05ad4e9f5 (unix/dgram: peek beyond 0-sized skbs) --- net/netlink/af_netlink.c |1 + 1 file changed, 1 insertion(+) diff --git a/net/netlink

[PATCH v2] netlink: reset skb-peeked when reuse orphan skb for next broadcast

2015-07-10 Thread Konstantin Khlebnikov
This patch clears skb-peeked set by previous recipient of broadcast. Signed-off-by: Konstantin Khlebnikov khlebni...@yandex-team.ru Fixes: add05ad4e9f5 (unix/dgram: peek beyond 0-sized skbs) --- net/netlink/af_netlink.c |1 + 1 file changed, 1 insertion(+) diff --git a/net/netlink

Re: [PATCH] netlink: reset skb-peeked when reuse orphan skb for next broadcast

2015-07-10 Thread Konstantin Khlebnikov
On 10.07.2015 14:51, Konstantin Khlebnikov wrote: This patch clears skb-peeked set by previous recipient of broadcast. Signed-off-by: Konstantin Khlebnikov khlebni...@yandex-team.ru Fixes: add05ad4e9f5 (unix/dgram: peek beyond 0-sized skbs) --- net/netlink/af_netlink.c |1 + 1 file

Re: [PATCH v2 4/5] ipvlan: protect addresses with internal spinlock

2015-07-10 Thread Konstantin Khlebnikov
On 08.07.2015 07:05, Mahesh Bandewar wrote: On Fri, Jul 3, 2015 at 5:58 AM, Konstantin Khlebnikov khlebni...@yandex-team.ru wrote: Inet6addr notifier is atomic and runs in bh context without RTNL when ipv6 receives router advertisement packet and performs autoconfiguration. This patch adds

Re: [PATCH] netlink: enable skb header refcounting before sending first broadcast

2015-07-10 Thread Konstantin Khlebnikov
On 10.07.2015 16:49, Eric Dumazet wrote: On Fri, 2015-07-10 at 14:51 +0300, Konstantin Khlebnikov wrote: This fixes race between non-atomic updates of adjacent bit-fields: skb-cloned could be lost because netlink broadcast clones skb after sending it to the first listener who sets skb-peeked

[PATCH v2 1/5] ipvlan: remove counters of ipv4 and ipv6 addresses

2015-07-03 Thread Konstantin Khlebnikov
They are unused after commit f631c44bbe15 (ipvlan: Always set broadcast bit in multicast filter). Signed-off-by: Konstantin Khlebnikov khlebni...@yandex-team.ru --- drivers/net/ipvlan/ipvlan.h |2 - drivers/net/ipvlan/ipvlan_main.c | 65 +++--- 2 files

[PATCH v2 3/5] ipvlan: unhash addresses without synchronize_rcu

2015-07-03 Thread Konstantin Khlebnikov
All structures used in traffic forwarding are rcu-protected: ipvl_addr, ipvl_dev and ipvl_port. Thus we can unhash addresses without synchronization. We'll anyway hash it back into the same bucket, in worst case lockless lookup will scan hash once again. Signed-off-by: Konstantin Khlebnikov

[PATCH v2 4/5] ipvlan: protect addresses with internal spinlock

2015-07-03 Thread Konstantin Khlebnikov
-by: Konstantin Khlebnikov khlebni...@yandex-team.ru --- drivers/net/ipvlan/ipvlan.h | 11 +++ drivers/net/ipvlan/ipvlan_core.c |2 -- drivers/net/ipvlan/ipvlan_main.c | 33 ++--- 3 files changed, 41 insertions(+), 5 deletions(-) diff --git a/drivers

[PATCH v2 0/5] ipvlan: fix ipv6 autoconfiguration

2015-07-03 Thread Konstantin Khlebnikov
with this: https://patchwork.ozlabs.org/patch/471481/ * new fix for trivial memory leak and patch which removes address counters --- Konstantin Khlebnikov (5): ipvlan: remove counters of ipv4 and ipv6 addresses ipvlan: plug memory leak in ipvlan_link_delete ipvlan: unhash

[PATCH v2 5/5] ipvlan: set dev_id for l2 ports to generate unique IPv6 addresses

2015-07-03 Thread Konstantin Khlebnikov
All ipvlan ports use one MAC address, this way ipv6 RA tries to assign one ipv6 address to all of them. This patch assigns unique dev_id to each ipvlan port. This field is used instead of common FF-FE in Modified EUI-64. Signed-off-by: Konstantin Khlebnikov khlebni...@yandex-team.ru

[PATCH v2 2/5] ipvlan: plug memory leak in ipvlan_link_delete

2015-07-03 Thread Konstantin Khlebnikov
Add missing kfree_rcu(addr, rcu); Signed-off-by: Konstantin Khlebnikov khlebni...@yandex-team.ru --- drivers/net/ipvlan/ipvlan_main.c |1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c index 62577b3f01f2..4c3a0ac85381 100644

Re: [PATCH v3.17 .. v3.19] lib/rhashtable: fix race between rhashtable_lookup_compare and hashtable resize

2015-06-30 Thread Konstantin Khlebnikov
+CC Sasha Levin FYI: this patch fixes race in netlink which leads to hung in glibc function getaddrinfo() because it doesn't handle errors at all. On 26.06.2015 13:48, Konstantin Khlebnikov wrote: Hash value passed as argument into rhashtable_lookup_compare could be computed using different

Re: netlink rhashtable status

2015-06-26 Thread Konstantin Khlebnikov
On 14.05.2015 07:21, Herbert Xu wrote: On Thu, May 14, 2015 at 12:16:28PM +0800, Herbert Xu wrote: On Wed, May 13, 2015 at 09:13:38PM -0700, Eric Dumazet wrote: So it looks like we lost an skb or something OK that sounds reasonable. So my plan is to disable dynamic rehashing and then

[PATCH v3.17 .. v3.19] lib/rhashtable: fix race between rhashtable_lookup_compare and hashtable resize

2015-06-26 Thread Konstantin Khlebnikov
it adds comment for rhashtable_hashfn and rhashtable_obj_hashfn: user must prevent concurrent insert/remove otherwise returned hash value could be invalid. Signed-off-by: Konstantin Khlebnikov khlebni...@yandex-team.ru Fixes: e341694e3eb5 (netlink: Convert netlink_lookup() to use RCU protected hash

[PATCH 3.10.y 1/2] ipv6: prevent fib6_run_gc() contention

2015-06-10 Thread Konstantin Khlebnikov
From: Michal Kubeček mkube...@suse.cz commit 2ac3ac8f86f2fe065d746d9a9abaca867adec577 upstream On a high-traffic router with many processors and many IPv6 dst entries, soft lockup in fib6_run_gc() can occur when number of entries reaches gc_thresh. This happens because fib6_run_gc() uses

[PATCH 3.10.y 2/2] ipv6: update ip6_rt_last_gc every time GC is run

2015-06-10 Thread Konstantin Khlebnikov
From: Michal Kubeček mkube...@suse.cz commit 49a18d86f66d33a20144ecb5a34bba0d1856b260 upstream As pointed out by Eric Dumazet, net-ipv6.ip6_rt_last_gc should hold the last time garbage collector was run so that we should update it whenever fib6_run_gc() calls fib6_clean_all(), not only if we got

[PATCH 3.10.y 0/2] ipv6: avoid soft lockups in fib6_run_gc()

2015-06-10 Thread Konstantin Khlebnikov
Two patches from 3.11 which are missing in 3.10.y I've just seen livelock in 3.10.69+ where all cpus are stuck in fib6_run_gc() 4[2919865.977745] Call Trace: 4[2919865.977748] IRQ 4[2919865.977754] [8163b87e] _raw_spin_lock_bh+0x1e/0x30 4[2919865.977759] [815e4018]

Re: [PATCH 3/3] ipvlan: set dev_id for l2 ports to generate unique IPv6 addresses

2015-05-21 Thread Konstantin Khlebnikov
On 20.05.2015 02:59, Mahesh Bandewar wrote: On Thu, May 14, 2015 at 6:56 AM, Konstantin Khlebnikov khlebni...@yandex-team.ru wrote: All ipvlan ports use one MAC address, this way ipv6 RA tries to assign one ipv6 address to all of them. This patch assigns unique dev_id to each ipvlan port

Re: [PATCH 2/3] ipvlan: grab rcu_read_lock on xmit path

2015-05-21 Thread Konstantin Khlebnikov
On 20.05.2015 02:33, Mahesh Bandewar wrote: On Thu, May 14, 2015 at 6:56 AM, Konstantin Khlebnikov khlebni...@yandex-team.ru wrote: ipvlan_start_xmit() is called with rcu_read_lock_bh() while its internal structures requre normal rcu_read_lock(). Signed-off-by: Konstantin Khlebnikov khlebni

[PATCH RFC] openvswitch: add support for netpoll

2015-04-23 Thread Konstantin Khlebnikov
-by: Konstantin Khlebnikov khlebni...@yandex-team.ru --- net/openvswitch/vport-internal_dev.c | 74 ++ net/openvswitch/vport-netdev.c | 63 - net/openvswitch/vport-netdev.h | 15 +++ 3 files changed, 148 insertions(+), 4