[PATCH net] Revert "blackhole_netdev: fix syzkaller reported issue"

2019-10-16 Thread Mahesh Bandewar
evert this now and I'll send a better fix after analysing / fixing the weirdness observed. CC: Eric Dumazet CC: Wei Wang CC: David S. Miller Signed-off-by: Mahesh Bandewar --- net/ipv6/addrconf.c | 7 +-- net/ipv6/route.c| 15 +-- 2 files changed, 10 insertions(+), 12

[PATCHv3 next] blackhole_netdev: fix syzkaller reported issue

2019-10-11 Thread Mahesh Bandewar
DR6: fffe0ff0 DR7: 0400 Fixes: 8d7017fd621d ("blackhole_netdev: use blackhole_netdev to invalidate dst entries") Signed-off-by: Mahesh Bandewar CC: Eric Dumazet CC: Wei Wang --- v1->v2: fixed missing update in ip6_dst_ifdown() v2->v3: added idev cle

[PATCHv2 next] blackhole_netdev: fix syzkaller reported issue

2019-10-10 Thread Mahesh Bandewar
DR6: fffe0ff0 DR7: 0400 Fixes: 8d7017fd621d ("blackhole_netdev: use blackhole_netdev to invalidate dst entries") Signed-off-by: Mahesh Bandewar CC: Eric Dumazet --- v1->v2: fixed missing update in ip6_dst_ifdown() net/ipv6/addrconf.c | 6 +- net/ip

[PATCH next] ipvlan: consolidate TSO flags using NETIF_F_ALL_TSO

2019-10-09 Thread Mahesh Bandewar
This will ensure that any new TSO related flags added (which would be part of ALL_TSO mask and IPvlan driver doesn't need to update every time new flag gets added. Signed-off-by: Mahesh Bandewar Suggested-by: Eric Dumazet --- drivers/net/ipvlan/ipvlan_main.c | 4 ++-- 1 file chang

[PATCH next] blackhole_netdev: fix syzkaller reported issue

2019-10-09 Thread Mahesh Bandewar
DR6: fffe0ff0 DR7: 0400 Fixes: 8d7017fd621d ("blackhole_netdev: use blackhole_netdev to invalidate dst entries") Signed-off-by: Mahesh Bandewar --- net/ipv6/addrconf.c | 6 +- net/ipv6/route.c| 5 ++--- 2 files changed, 7 insertions(+), 4 deletions(-) di

[PATCH next] loopback: fix lockdep splat

2019-07-02 Thread Mahesh Bandewar
0x260/0x260 [3.855074] kernel_init+0xf/0x180 [3.855076] ? rest_init+0x260/0x260 [3.855078] ret_from_fork+0x24/0x30 Fixes: 4de83b88c66 ("loopback: create blackhole net device similar to loopack.") Reported-by: Geert Uytterhoeven Cc: Eric Dumazet Signed-off-by: Mahesh Bandewa

[PATCHv3 next 2/3] blackhole_netdev: use blackhole_netdev to invalidate dst entries

2019-07-01 Thread Mahesh Bandewar
Use blackhole_netdev instead of 'lo' device with lower MTU when marking dst "dead". Signed-off-by: Mahesh Bandewar --- v1->v2->v3 no change net/core/dst.c | 2 +- net/ipv4/route.c | 3 +-- net/ipv6/route.c | 2 +- 3 files changed, 3 insertions(+), 4 deletions

[PATCHv3 next 3/3] blackhole_dev: add a selftest

2019-07-01 Thread Mahesh Bandewar
Since this is not really a device with all capabilities, this test ensures that it has *enough* to make it through the data path without causing unwanted side-effects (read crash!). Signed-off-by: Mahesh Bandewar --- v1 -> v2 fixed the conflict resolution in selftests Makefile v2 -> v3

[PATCHv3 next 1/3] loopback: create blackhole net device similar to loopack.

2019-07-01 Thread Mahesh Bandewar
since it's not registered it won't have ifindex. Lower MTU effectively make the device not pass the MTU check during the route check when a dst associated with the skb is dead. Signed-off-by: Mahesh Bandewar --- v1->v2->v3 no change drivers/net/loopback.c| 76 +

[PATCHv3 next 0/3] blackhole device to invalidate dst

2019-07-01 Thread Mahesh Bandewar
is device instead of 'lo' when marking the dst dead. First patch implements the blackhole device and second patch uses it in IPv4 and IPv6 stack while the third patch is the self test that ensures the sanity of this device. v1->v2 fixed the self-test patch to handle the conflict

[PATCHv2 next 3/3] blackhole_dev: add a selftest

2019-06-27 Thread Mahesh Bandewar
Since this is not really a device with all capabilities, this test ensures that it has *enough* to make it through the data path without causing unwanted side-effects (read crash!). Signed-off-by: Mahesh Bandewar --- v1 -> v2 fixed the conflict resolution in selftests Makefile

[PATCHv2 next 0/3] blackhole device to invalidate dst

2019-06-27 Thread Mahesh Bandewar
is device instead of 'lo' when marking the dst dead. First patch implements the blackhole device and second patch uses it in IPv4 and IPv6 stack while the third patch is the self test that ensures the sanity of this device. v1->v2 fixed the self-test patch to handle the conf

[PATCHv2 next 1/3] loopback: create blackhole net device similar to loopack.

2019-06-27 Thread Mahesh Bandewar
since it's not registered it won't have ifindex. Lower MTU effectively make the device not pass the MTU check during the route check when a dst associated with the skb is dead. Signed-off-by: Mahesh Bandewar --- v1->v2 no change drivers/net/loopback.c| 76 ++

[PATCHv2 next 2/3] blackhole_netdev: use blackhole_netdev to invalidate dst entries

2019-06-27 Thread Mahesh Bandewar
Use blackhole_netdev instead of 'lo' device with lower MTU when marking dst "dead". Signed-off-by: Mahesh Bandewar --- v1 -> v2 no change net/core/dst.c | 2 +- net/ipv4/route.c | 3 +-- net/ipv6/route.c | 2 +- 3 files changed, 3 insertions(+), 4 deletions(-) diff

[PATCH next 2/3] blackhole_netdev: use blackhole_netdev to invalidate dst entries

2019-06-21 Thread Mahesh Bandewar
Use blackhole_netdev instead of 'lo' device with lower MTU when marking dst "dead". Signed-off-by: Mahesh Bandewar --- net/core/dst.c | 2 +- net/ipv4/route.c | 3 +-- net/ipv6/route.c | 2 +- 3 files changed, 3 insertions(+), 4 deletions(-) diff --git a/net/core/dst.c b/

[PATCH next 3/3] blackhole_dev: add a selftest

2019-06-21 Thread Mahesh Bandewar
Since this is not really a device with all capabilities, this test ensures that it has *enough* to make it through the data path without causing unwanted side-effects (read crash!). Signed-off-by: Mahesh Bandewar --- lib/Kconfig.debug | 9 ++ lib/Makefile

[PATCH next 1/3] loopback: create blackhole net device similar to loopack.

2019-06-21 Thread Mahesh Bandewar
since it's not registered it won't have ifindex. Lower MTU effectively make the device not pass the MTU check during the route check when a dst associated with the skb is dead. Signed-off-by: Mahesh Bandewar --- drivers/net/loopback.c| 76 ++- inclu

[PATCH next 0/3] blackhole device to invalidate dst

2019-06-21 Thread Mahesh Bandewar
is device instead of 'lo' when marking the dst dead. First patch implements the blackhole device and second patch uses it in IPv4 and IPv6 stack while the third patch is the self test that ensures the sanity of this device. Mahesh Bandewar (3): loopback: create blackhole net device si

[PATCH iproute2] ip6tunnel: fix 'ip -6 {show|change} dev ' cmds

2019-06-06 Thread Mahesh Bandewar
ow ip6tnl1 Signed-off-by: Mahesh Bandewar --- ip/ip6tunnel.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/ip/ip6tunnel.c b/ip/ip6tunnel.c index 999408ed801b..56fd3466ed06 100644 --- a/ip/ip6tunnel.c +++ b/ip/ip6tunnel.c @@ -298,6 +298,8 @@ static int parse_args(int argc, char **argv, int cm

[PATCH net] bonding: fix warning message

2018-10-02 Thread Mahesh Bandewar
From: Mahesh Bandewar RX queue config for bonding master could be different from its slave device(s). With the commit 6a9e461f6fe4 ("bonding: pass link-local packets to bonding master also."), the packet is reinjected into stack with skb->dev as bonding master. This potentially

[PATCH net] bonding: avoid possible dead-lock

2018-09-24 Thread Mahesh Bandewar
From: Mahesh Bandewar Syzkaller reported this on a slightly older kernel but it's still applicable to the current kernel - == WARNING: possible circular locking dependency detected 4.18.0-next-20180823+ #46 Not ta

[PATCH net] bonding: pass link-local packets to bonding master also.

2018-09-24 Thread Mahesh Bandewar
From: Mahesh Bandewar Commit b89f04c61efe ("bonding: deliver link-local packets with skb->dev set to link that packets arrived on") changed the behavior of how link-local-multicast packets are processed. The change in the behavior broke some legacy use cases where these packets ar

[PATCH iproute2] iproute2: fix use-after-free

2018-09-12 Thread Mahesh Bandewar
From: Mahesh Bandewar A local program using iproute2 lib pointed out the issue and looking at the code it is pretty obvious - a = (struct nlmsghdr *)b; ... free(b); if (a->nlmsg_seq == seq) ... Fixes: 86bf43c7c2fd ("lib/libnetlink: update rtnl_talk to support mal

[PATCHv3 iproute2 0/2] clang + misc changes

2018-08-22 Thread Mahesh Bandewar
From: Mahesh Bandewar The primary theme is to make clang compile the iproute2 package without warnings. Along with this there are two other misc patches in the series. First patch uses the preferred_family when operating with maddr feature. Prior to this patch, it would always open an AF_INET

[PATCHv3 iproute2 1/2] ipmaddr: use preferred_family when given

2018-08-22 Thread Mahesh Bandewar
From: Mahesh Bandewar When creating socket() AF_INET is used irrespective of the family that is given at the command-line (with -4, -6, or -0). This change will open the socket with the preferred family. Signed-off-by: Mahesh Bandewar --- ip/ipmaddr.c | 13 - 1 file changed, 12

[PATCHv3 iproute2 2/2] iproute: make clang happy

2018-08-22 Thread Mahesh Bandewar
From: Mahesh Bandewar These are primarily fixes for "string is not string literal" warnings / errors (with -Werror -Wformat-nonliteral). This should be a no-op change. I had to replace couple of print helper functions with the code they call as it was becoming harder to eliminate thes

[PATCHv2 iproute2 3/3] iproute: make clang happy with iproute2 package

2018-08-21 Thread Mahesh Bandewar
From: Mahesh Bandewar These are primarily fixes for "string is not string literal" warnings / errors (with -Werror -Wformat-nonliteral). This should be a no-op change. I had to replace couple of print helper functions with the code they call as it was becoming harder to eliminate thes

[PATCHv2 iproute2 0/3] clang + misc changes

2018-08-21 Thread Mahesh Bandewar
From: Mahesh Bandewar The primary theme is to make clang compile the iproute2 package without warnings. Along with this there are two other misc patches in the series. First patch uses the preferred_family when operating with maddr feature. Prior to this patch, it would always open an AF_INET

[PATCHv2 iproute2 2/3] tc: remove extern from prototype declarations

2018-08-21 Thread Mahesh Bandewar
From: Mahesh Bandewar Signed-off-by: Mahesh Bandewar --- tc/m_ematch.h | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/tc/m_ematch.h b/tc/m_ematch.h index f634f19164fa..80b02cfad6cc 100644 --- a/tc/m_ematch.h +++ b/tc/m_ematch.h @@ -20,7 +20,7 @@ struct bstr

[PATCHv2 iproute2 1/3] ipmaddr: use preferred_family when given

2018-08-21 Thread Mahesh Bandewar
From: Mahesh Bandewar When creating socket() AF_INET is used irrespective of the family that is given at the command-line (with -4, -6, or -0). This change will open the socket with the preferred family. Signed-off-by: Mahesh Bandewar --- ip/ipmaddr.c | 13 - 1 file changed, 12

[PATCH iproute2] iproute: make clang happy

2018-08-20 Thread Mahesh Bandewar
From: Mahesh Bandewar These are primarily fixes for "string is not string literal" warnings / errors (with -Werror -Wformat-nonliteral). This should be a no-op change. I had to replace couple of print helper functions with the code they call as it was becoming harder to eliminate thes

[PATCH iproute2] ipmaddr: use preferred_family when given

2018-08-15 Thread Mahesh Bandewar
From: Mahesh Bandewar When creating socket() AF_INET is used irrespective of the family that is given at the command-line (with -4, -6, or -0). This change will open the socket with the preferred family. Signed-off-by: Mahesh Bandewar --- ip/ipmaddr.c | 13 - 1 file changed, 12

[PATCH net v2] bonding: pass link-local packets to bonding master also.

2018-07-18 Thread Mahesh Bandewar
From: Mahesh Bandewar Commit b89f04c61efe ("bonding: deliver link-local packets with skb->dev set to link that packets arrived on") changed the behavior of how link-local-multicast packets are processed. The change in the behavior broke some legacy use cases where these packets ar

[PATCH next v2] bonding: pass link-local packets to bonding master also.

2018-07-18 Thread Mahesh Bandewar
From: Mahesh Bandewar Commit b89f04c61efe ("bonding: deliver link-local packets with skb->dev set to link that packets arrived on") changed the behavior of how link-local-multicast packets are processed. The change in the behavior broke some legacy use cases where these packets ar

[PATCH next] bonding: pass link-local packets to bonding master also.

2018-07-15 Thread Mahesh Bandewar
From: Mahesh Bandewar Commit b89f04c61efe ("bonding: deliver link-local packets with skb->dev set to link that packets arrived on") changed the behavior of how link-local-multicast packets are processed. The change in the behavior broke some legacy use cases where these packets ar

[PATCHv4 1/2] capability: introduce sysctl for controlled user-ns capability whitelist

2018-01-02 Thread Mahesh Bandewar
From: Mahesh Bandewar Add a sysctl variable kernel.controlled_userns_caps_whitelist. Capability mask is stored in kernel as kernel_cap_t type (array of u32). This sysctl takes input as comma separated hex u32 words. For simplicity one could see this sysctl to operate on string inputs. However

[PATCHv4 0/2] capability controlled user-namespaces

2018-01-02 Thread Mahesh Bandewar
From: Mahesh Bandewar TL;DR version - Creating a sandbox environment with namespaces is challenging considering what these sandboxed processes can engage into. e.g. CVE-2017-6074, CVE-2017-7184, CVE-2017-7308 etc. just to name few. Current form of user-namespaces, however, if changed

[PATCHv4 2/2] userns: control capabilities of some user namespaces

2018-01-02 Thread Mahesh Bandewar
From: Mahesh Bandewar With this new notion of "controlled" user-namespaces, the controlled user-namespaces are marked at the time of their creation while the capabilities of processes that belong to them are controlled using the global mask. Init-user-ns is always uncontrolled and

[PATCH next 0/2] ipvlan: packet scrub

2017-12-13 Thread Mahesh Bandewar
From: Mahesh Bandewar While crossing namespace boundary IPvlan aggressively scrubs packets. This is creating problems. First thing is that scrubbing changes the packet type in skb meta-data to PACKET_HOST. This causes erroneous packet delivery when dev_forward_skb() has already marked the

[PATCH next 1/2] Revert "ipvlan: add L2 check for packets arriving via virtual devices"

2017-12-13 Thread Mahesh Bandewar
From: Mahesh Bandewar This reverts commit 92ff42645028fa6f9b8aa767718457b9264316b4. Even though the check added is not that taxing, it's not really needed. First of all this will be per packet cost and second thing is that the eth_type_trans() already does this correctly. The exce

[PATCH next 2/2] ipvlan: remove excessive packet scrubbing

2017-12-13 Thread Mahesh Bandewar
From: Mahesh Bandewar IPvlan currently scrubs packets at every location where packets may be crossing namespace boundary. Though this is desirable, currently IPvlan does it more than necessary. e.g. packets that are going to take dev_forward_skb() path will get scrubbed so no point in scrubbing

[PATCH next] ipvlan: add L2 check for packets arriving via virtual devices

2017-12-07 Thread Mahesh Bandewar
From: Mahesh Bandewar Packets that don't have dest mac as the mac of the master device should not be entertained by the IPvlan rx-handler. This is mostly true as the packet path mostly takes care of that, except when the master device is a virtual device. As demonstrated in the following

[PATCHv3 0/2] capability controlled user-namespaces

2017-12-05 Thread Mahesh Bandewar
From: Mahesh Bandewar TL;DR version - Creating a sandbox environment with namespaces is challenging considering what these sandboxed processes can engage into. e.g. CVE-2017-6074, CVE-2017-7184, CVE-2017-7308 etc. just to name few. Current form of user-namespaces, however, if changed

[PATCHv3 1/2] capability: introduce sysctl for controlled user-ns capability whitelist

2017-12-05 Thread Mahesh Bandewar
From: Mahesh Bandewar Add a sysctl variable kernel.controlled_userns_caps_whitelist. This takes input as capability mask expressed as two comma separated hex u32 words. The mask, however, is stored in kernel as kernel_cap_t type. Any capabilities that are not part of this mask will be

[PATCHv3 2/2] userns: control capabilities of some user namespaces

2017-12-05 Thread Mahesh Bandewar
From: Mahesh Bandewar With this new notion of "controlled" user-namespaces, the controlled user-namespaces are marked at the time of their creation while the capabilities of processes that belong to them are controlled using the global mask. Init-user-ns is always uncontrolled and

[PATCHv2 0/2] capability controlled user-namespaces

2017-11-09 Thread Mahesh Bandewar
From: Mahesh Bandewar TL;DR version - Creating a sandbox environment with namespaces is challenging considering what these sandboxed processes can engage into. e.g. CVE-2017-6074, CVE-2017-7184, CVE-2017-7308 etc. just to name few. Current form of user-namespaces, however, if changed

[PATCHv2 1/2] capability: introduce sysctl for controlled user-ns capability whitelist

2017-11-09 Thread Mahesh Bandewar
From: Mahesh Bandewar Add a sysctl variable kernel.controlled_userns_caps_whitelist. This takes input as capability mask expressed as two comma separated hex u32 words. The mask, however, is stored in kernel as kernel_cap_t type. Any capabilities that are not part of this mask will be

[PATCHv2 2/2] userns: control capabilities of some user namespaces

2017-11-09 Thread Mahesh Bandewar
From: Mahesh Bandewar With this new notion of "controlled" user-namespaces, the controlled user-namespaces are marked at the time of their creation while the capabilities of processes that belong to them are controlled using the global mask. Init-user-ns is always uncontrolled and

[PATCH resend 0/2] capability controlled user-namespaces

2017-11-02 Thread Mahesh Bandewar
From: Mahesh Bandewar TL;DR version - Creating a sandbox environment with namespaces is challenging considering what these sandboxed processes can engage into. e.g. CVE-2017-6074, CVE-2017-7184, CVE-2017-7308 etc. just to name few. Current form of user-namespaces, however, if changed

[PATCH resend 2/2] userns: control capabilities of some user namespaces

2017-11-02 Thread Mahesh Bandewar
From: Mahesh Bandewar With this new notion of "controlled" user-namespaces, the controlled user-namespaces are marked at the time of their creation while the capabilities of processes that belong to them are controlled using the global mask. Init-user-ns is always uncontrolled and

[PATCH resend 1/2] capability: introduce sysctl for controlled user-ns capability whitelist

2017-11-02 Thread Mahesh Bandewar
From: Mahesh Bandewar Add a sysctl variable kernel.controlled_userns_caps_whitelist. This takes input as capability mask expressed as two comma separated hex u32 words. The mask, however, is stored in kernel as kernel_cap_t type. Any capabilities that are not part of this mask will be

[PATCH] ip/ipvlan: enhance ability to add mode flags to existing modes

2017-10-30 Thread Mahesh Bandewar
From: Mahesh Bandewar IPvlan supported bridge-only functionality prior to commits a190d04db937 ('ipvlan: introduce 'private' attribute for all existing modes.') and fe89aa6b250c ('ipvlan: implement VEPA mode'). These two commits allow to configure the VEPA and priv

[PATCH next 0/2] add 'private' and 'vepa' attributes to ipvlan modes

2017-10-26 Thread Mahesh Bandewar
From: Mahesh Bandewar IPvlan has always been operating in bridge-mode for its supported modes i.e. if the packets are destined to the adjacent neighbor dev, then IPvlan driver will switch the packet internally without needing the packets to hit the wire or get routed. However, there are

[PATCH next 2/2] ipvlan: implement VEPA mode

2017-10-26 Thread Mahesh Bandewar
From: Mahesh Bandewar This is very similar to the Macvlan VEPA mode, however, there is some difference. IPvlan uses the mac-address of the lower device, so the VEPA mode has implications of ICMP-redirects for packets destined for its immediate neighbors sharing same master since the packets will

[PATCH next 1/2] ipvlan: introduce 'private' attribute for all existing modes.

2017-10-26 Thread Mahesh Bandewar
From: Mahesh Bandewar IPvlan has always operated in bridge mode. However there are scenarios where each slave should be able to talk through the master device but not necessarily across each other. Think of an environment where each of a namespace is a private and independant customer. In this

[PATCH next] ipvlan: always use the current L2 addr of the master

2017-10-11 Thread Mahesh Bandewar
From: Mahesh Bandewar If the underlying master ever changes its L2 (e.g. bonding device), then make sure that the IPvlan slaves always emit packets with the current L2 of the master instead of the stale mac addr which was copied during the device creation. The problem can be seen with following

[PATCH 1/2] capability: introduce sysctl for controlled user-ns capability whitelist

2017-09-29 Thread Mahesh Bandewar
From: Mahesh Bandewar Add a sysctl variable kernel.controlled_userns_caps_whitelist. This takes input as capability mask expressed as two comma separated hex u32 words. The mask, however, is stored in kernel as kernel_cap_t type. Any capabilities that are not part of this mask will be

[PATCH 0/2] capability controlled user-namespaces

2017-09-29 Thread Mahesh Bandewar
From: Mahesh Bandewar [Same as the previous RFC series sent on 9/21] TL;DR version - Creating a sandbox environment with namespaces is challenging considering what these sandboxed processes can engage into. e.g. CVE-2017-6074, CVE-2017-7184, CVE-2017-7308 etc. just to name few

[PATCH 2/2] userns: control capabilities of some user namespaces

2017-09-29 Thread Mahesh Bandewar
From: Mahesh Bandewar With this new notion of "controlled" user-namespaces, the controlled user-namespaces are marked at the time of their creation while the capabilities of processes that belong to them are controlled using the global mask. Init-user-ns is always uncontrolled and

[PATCH next] bonding: speed/duplex update at NETDEV_UP event

2017-09-27 Thread Mahesh Bandewar
From: Mahesh Bandewar Some NIC drivers don't have correct speed/duplex settings at the time they send NETDEV_UP notification and that messes up the bonding state. Especially 802.3ad mode which is very sensitive to these settings. In the current implementation we invoke bond_update_speed_d

[RFC PATCH 0/2] capability controlled user-namespaces

2017-09-21 Thread Mahesh Bandewar
From: Mahesh Bandewar TL;DR version - Creating a sandbox environment with namespaces is challenging considering what these sandboxed processes can engage into. e.g. CVE-2017-6074, CVE-2017-7184, CVE-2017-7308 etc. just to name few. Current form of user-namespaces, however, if changed

[RFC PATCH 2/2] userns: control capabilities of some user namespaces

2017-09-21 Thread Mahesh Bandewar
From: Mahesh Bandewar With this new notion of "controlled" user-namespaces, the controlled user-namespaces are marked at the time of their creation while the capabilities of processes that belong to them are controlled using the global mask. Init-user-ns is always uncontrolled and

[RFC PATCH 1/2] capability: introduce sysctl for controlled user-ns capability whitelist

2017-09-21 Thread Mahesh Bandewar
From: Mahesh Bandewar Add a sysctl variable kernel.controlled_userns_caps_whitelist. This takes input as capability mask expressed as two comma separated hex u32 words. The mask, however, is stored in kernel as kernel_cap_t type. Any capabilities that are not part of this mask will be

[PATCH next] neigh: initialize neigh entry correctly during arp processing

2017-08-16 Thread Mahesh Bandewar
From: Mahesh Bandewar If the ARP processing creates a neigh entry, it's immediately marked as STALE without timer and stays that way in that state as long as host do not send traffic to that neighbour. I observed this on hosts which are in IPv6 environment, where there is very little to no

[PATCH 1/3] ipv4: initialize fib_trie prior to register_netdev_notifier call.

2017-07-19 Thread Mahesh Bandewar
From: Mahesh Bandewar Net stack initialization currently initializes fib-trie after the first call to netdevice_notifier() call. In fact fib_trie initialization needs to happen before first rtnl_register(). It does not cause any problem since there are no devices UP at this moment, but trying to

[PATCH 1/2] ipv4: initialize fib_trie prior to register_netdev_notifier call.

2017-07-04 Thread Mahesh Bandewar
From: Mahesh Bandewar Net stack initialization currently initializes fib-trie after the first call to netdevice_notifier() call. It does not cause any problem since there are no devices UP at this moment, but trying to bring 'lo' UP at initialization would make this assumption wron

[PATCH 2/2] loopback: bringup 'lo' by default at initialization

2017-07-04 Thread Mahesh Bandewar
From: Mahesh Bandewar loopback devices are always brought up right after its initialization including the case of network namespace creation. e.g. ip netns add foo ip -netns foo link set lo up This patch will eliminate the need to do that separately and would bring it up as part of the

[PATCH 0/2] bring UP loopback device at initialziation

2017-07-04 Thread Mahesh Bandewar
From: Mahesh Bandewar In almost every scenario the loopback device is brought UP after initialization. So there is no point of bringing up the device in DOWN state followed by device UP operation. This change exposed another issue of fib-trie initialization which is corrected in the first path

[PATCH net] ipv6: avoid dad-failures for addresses with NODAD

2017-05-12 Thread Mahesh Bandewar
From: Mahesh Bandewar Every address gets added with TENTATIVE flag even for the addresses with IFA_F_NODAD flag and dad-work is scheduled for them. During this DAD process we realize it's an address with NODAD and complete the process without sending any probe. However the TENTATIVE flags

[PATCH] kmod: don't load module unless req process has CAP_SYS_MODULE

2017-05-12 Thread Mahesh Bandewar
From: Mahesh Bandewar A process inside random user-ns should not load a module, which is currently possible. As demonstrated in following scenario - Create namespaces; especially a user-ns and become root inside. $ unshare -rfUp -- unshare -unm -- bash Try to load the bridge module. It

[PATCHv2 next] bonding: fix wq initialization for links created via netlink

2017-04-20 Thread Mahesh Bandewar
From: Mahesh Bandewar Earlier patch 4493b81bea ("bonding: initialize work-queues during creation of bond") moved the work-queue initialization from bond_open() to bond_create(). However this caused the link those are created using netlink 'create bond option' (ip link

[PATCH next] bonding: fix wq initialization for links created via netlink

2017-04-19 Thread Mahesh Bandewar
From: Mahesh Bandewar Earlier patch 4493b81bea ("bonding: initialize work-queues during creation of bond") moved the work-queue initialization from bond_open() to bond_create(). However this caused the link those are created using netlink 'create bond option' (ip link

[PATCH next] bonding: handle link transition from FAIL to UP correctly

2017-04-11 Thread Mahesh Bandewar
From: Mahesh Bandewar When link transitions from LINK_FAIL to LINK_UP, the commit phase is not called. This leads to an erroneous state causing slave-link state to get stuck in "going down" state while its speed and duplex are perfectly fine. This issue is a side-effect of splittin

[PATCH next] bonding: fix active-backup transition

2017-04-03 Thread Mahesh Bandewar
From: Mahesh Bandewar Earlier patch c4adfc822bf5 ("bonding: make speed, duplex setting consistent with link state") made an attempt to keep slave state consistent with speed and duplex settings. Unfortunately link-state transition is used to change the active link especially wh

[PATCH next 4/5] bonding: correctly update link status during mii-commit phase

2017-03-27 Thread Mahesh Bandewar
From: Mahesh Bandewar bond_miimon_commit() marks the link UP after attempting to get the speed and duplex settings for the link. There is a possibility that bond_update_speed_duplex() could fail. This is another place where it could result into an inconsistent bonding link state. With this

[PATCH next 5/5] bonding: avoid printing while holding a spinlock

2017-03-27 Thread Mahesh Bandewar
From: Mahesh Bandewar Signed-off-by: Mahesh Bandewar --- drivers/net/bonding/bond_3ad.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c index 508713b4e533..c5fd4259da33 100644 --- a/drivers/net/bonding

[PATCH next 2/5] bonding: improve link-status update in mii-monitoring

2017-03-27 Thread Mahesh Bandewar
From: Mahesh Bandewar The primary issue is that mii-inspect phase updates link-state and expects changes to be committed during the mii-commit phase. After the inspect phase if it fails to acquire rtnl-mutex, the commit phase (bond_mii_commit) doesn't get to run. This partially updated

[PATCH next 1/5] bonding: split bond_set_slave_link_state into two parts

2017-03-27 Thread Mahesh Bandewar
From: Mahesh Bandewar Split the function into two (a) propose (b) commit phase without changing the semantics for the original API. Signed-off-by: Mahesh Bandewar --- include/net/bonding.h | 22 +- 1 file changed, 17 insertions(+), 5 deletions(-) diff --git a/include/net

[PATCH next 0/5] link-status fixes for mii-monitoring

2017-03-27 Thread Mahesh Bandewar
From: Mahesh Bandewar The mii monitoring is divided into two phases - inspect and commit. The inspect phase technically should not make any changes to the state and defer it to the commit phase. However detected link state inconsistencies on several machines and discovered that it's the r

[PATCH next 3/5] bonding: make speed, duplex setting consistent with link state

2017-03-27 Thread Mahesh Bandewar
From: Mahesh Bandewar bond_update_speed_duplex() retrieves speed and duplex settings. There is a possibility of failure in retrieving these values but caller has to assume it's always successful. This leads to having inconsistent slave link settings. If these (speed, duplex) values cann

[PATCH next 0/5] bonding: winter cleanup

2017-03-08 Thread Mahesh Bandewar
From: Mahesh Bandewar Few cleanup patches that I have accumulated over some time now. (a) First two patches are basically to move the work-queue initialization from every ndo_open / bond_open operation to once at the beginning while port creation. Work-queue initialization is an

[PATCH next 2/5] bonding: initialize work-queues during creation of bond

2017-03-08 Thread Mahesh Bandewar
From: Mahesh Bandewar Initializing work-queues every time ifup operation performed is unnecessary and can be performed only once when the port is created. Signed-off-by: Mahesh Bandewar --- drivers/net/bonding/bond_main.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a

[PATCH next 1/5] bonding: restructure arp-monitor

2017-03-08 Thread Mahesh Bandewar
From: Mahesh Bandewar In preparation to move the work-queue initialization to port creation from current port_open phase. Work-queue initialization does not make sense every time we do 'ifup/ifdown'. So moving to port creation phase. Arp monitoring work depends on the bonding mode a

[PATCH next 4/5] bonding: remove "port-moved" state that was never implemented

2017-03-08 Thread Mahesh Bandewar
From: Mahesh Bandewar LACP state-machine defines "port-moved" state when the same ActorSystemID and Port are seen in a LACPDU received on different port. The state is never set since it's not implemented. However the state-machine attempts to clear that state occasionally. LACP s

[PATCH next 5/5] bonding: reduce scope of some global variables

2017-03-08 Thread Mahesh Bandewar
From: Mahesh Bandewar Many of the bond param variables are declared global while it's not really necessary for these variables to be global. So moving them to the location these are used. Signed-off-by: Mahesh Bandewar --- drivers/net/bonding/bond_main.c | 11 +-- 1 file chang

[PATCH next 3/5] bonding: remove hardcoded value

2017-03-08 Thread Mahesh Bandewar
From: Mahesh Bandewar Eliminate hard-coded value and use the default that is set. Signed-off-by: Mahesh Bandewar --- drivers/net/bonding/bond_main.c | 14 +- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding

[PATCH next 3/4] bonding: remove hardcoded value

2017-02-21 Thread Mahesh Bandewar
From: Mahesh Bandewar Eliminate hard-coded value and use the default that is set. Signed-off-by: Mahesh Bandewar --- drivers/net/bonding/bond_main.c | 14 +- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding

[PATCH next 1/4] bonding: restructure arp-monitor

2017-02-21 Thread Mahesh Bandewar
From: Mahesh Bandewar In preparation to move the work-queue initialization to port creation from current port_open phase. Work-queue initialization does not make sense every time we do 'ifup/ifdown'. So moving to port creation phase. Arp monitoring work depends on the bonding mode a

[PATCH next 4/4] bonding: remove "port-moved" state that was never implemented

2017-02-21 Thread Mahesh Bandewar
From: Mahesh Bandewar LACP state-machine defines "port-moved" state when the same ActorSystemID and Port are seen in a LACPDU received on different port. The state is never set since it's not implemented. However the state-machine attempts to clear that state occasionally. LACP s

[PATCH next 2/4] bonding: initialize work-queues during creation of bond

2017-02-21 Thread Mahesh Bandewar
From: Mahesh Bandewar Initializing work-queues every time ifup operation performed is unnecessary and can be performed only once when the port is created. Signed-off-by: Mahesh Bandewar --- drivers/net/bonding/bond_main.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a

[PATCH next 0/4] bonding: winter cleanup

2017-02-21 Thread Mahesh Bandewar
From: Mahesh Bandewar Few cleanup patches that I have accumulated over some time now. (a) First two patches are basically to move the work-queue initialization from every ndo_open / bond_open operation to once at the beginning while port creation. Work-queue initialization is an

[PATCH next 0/3] use netdev_is_rx_handler_busy() in few known cases

2017-01-18 Thread Mahesh Bandewar
From: Mahesh Bandewar netdev_rx_handler_register() was recently split into two parts - (a) check if the handler is used, (b) register the new handler, parts. This is helpful in scenarios like bonding where at the time of registration there is too much state to unwind and it should check if the

[PATCH next 3/3] macvlan: use netdev_is_rx_handler_busy instead of checking specific type

2017-01-18 Thread Mahesh Bandewar
From: Mahesh Bandewar netdev_is_rx_handler_busy() check is a superset of netif_is_ipvlan_port() check and hence should be preferred. Signed-off-by: Mahesh Bandewar --- drivers/net/macvlan.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/macvlan.c b/drivers/net

[PATCH next 1/3] net: remove duplicate code.

2017-01-18 Thread Mahesh Bandewar
From: Mahesh Bandewar netdev_rx_handler_register() checks to see if the handler is already busy which was recently separated into netdev_is_rx_handler_busy(). So use the same function inside register() to avoid code duplication. Essentially this change should be a no-op Signed-off-by: Mahesh

[PATCH next 2/3] ipvlan: use netdev_is_rx_handler_busy instead of checking specific type

2017-01-18 Thread Mahesh Bandewar
From: Mahesh Bandewar IPvlan checks if the master device is already used by checking a specific device (here it's macvlan device). This is technically not sufficient and it should just ensure the rx_handler is busy or not. This would be a super check that includes macvlan and any other tha

[PATCH next] ipvlan: fix dev_id creation corner case.

2017-01-13 Thread Mahesh Bandewar
From: Mahesh Bandewar In the last patch da36e13cf65 ("ipvlan: improvise dev_id generation logic in IPvlan") I missed some part of Dave's suggestion and because of that the dev_id creation could fail in a corner case scenario. This would happen when more or less 64k devices ha

[PATCH next v2] ipvlan: improvise dev_id generation logic in IPvlan

2017-01-09 Thread Mahesh Bandewar
From: Mahesh Bandewar The patch 009146d117b ("ipvlan: assign unique dev-id for each slave device.") used ida_simple_get() to generate dev_ids assigned to the slave devices. However (Eric has pointed out that) there is a shortcoming with that approach as it always uses the first av

[PATCH next v1] ipvlan: don't use IDR for generating dev_id

2017-01-06 Thread Mahesh Bandewar
From: Mahesh Bandewar The patch 009146d117b ("ipvlan: assign unique dev-id for each slave device.") used ida_simple_get() to generate dev_ids assigned to the slave devices. However (Eric has pointed out that) there is a shortcoming with that approach as it always uses the first av

[RFC PATCH next] ipv6: do not send RTM_DELADDR for tentative addresses

2017-01-04 Thread Mahesh Bandewar
From: Mahesh Bandewar RTM_NEWADDR notification is sent when IFA_F_TENTATIVE is cleared from the address. So if the address is added and deleted before DAD probes completes, the RTM_DELADDR will be sent for which there was no RTM_NEWADDR causing asymmetry in notification. However if the same

[PATCH next v1] ipvlan: assign unique dev-id for each slave device.

2017-01-03 Thread Mahesh Bandewar
From: Mahesh Bandewar IPvlan setup uses one mac-address (of master). The IPv6 link-local addresses are derived using the mac-address on the link. Lack of dev-ids makes these link-local addresses same for all slaves including that of master device. dev-ids are necessary to add differentiation

  1   2   3   >