[PATCH net] ipv6: addrconf: increment ifp refcount before ipv6_del_addr()

2017-10-30 Thread Eric Dumazet
From: Eric Dumazet 

In the (unlikely) event fixup_permanent_addr() returns a failure,
addrconf_permanent_addr() calls ipv6_del_addr() without the
mandatory call to in6_ifa_hold(), leading to a refcount error,
spotted by syzkaller :

WARNING: CPU: 1 PID: 3142 at lib/refcount.c:227 refcount_dec+0x4c/0x50
lib/refcount.c:227
Kernel panic - not syncing: panic_on_warn set ...

CPU: 1 PID: 3142 Comm: ip Not tainted 4.14.0-rc4-next-20171009+ #33
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:16 [inline]
 dump_stack+0x194/0x257 lib/dump_stack.c:52
 panic+0x1e4/0x41c kernel/panic.c:181
 __warn+0x1c4/0x1e0 kernel/panic.c:544
 report_bug+0x211/0x2d0 lib/bug.c:183
 fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:178
 do_trap_no_signal arch/x86/kernel/traps.c:212 [inline]
 do_trap+0x260/0x390 arch/x86/kernel/traps.c:261
 do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:298
 do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:311
 invalid_op+0x18/0x20 arch/x86/entry/entry_64.S:905
RIP: 0010:refcount_dec+0x4c/0x50 lib/refcount.c:227
RSP: 0018:8801ca49e680 EFLAGS: 00010286
RAX: 002c RBX: 8801d07cfcdc RCX: 
RDX: 002c RSI: 110039493c90 RDI: ed0039493cc4
RBP: 8801ca49e688 R08: 8801ca49dd70 R09: 
R10: 8801ca49df58 R11:  R12: 110039493cd9
R13: 8801ca49e6e8 R14: 8801ca49e7e8 R15: 8801d07cfcdc
 __in6_ifa_put include/net/addrconf.h:369 [inline]
 ipv6_del_addr+0x42b/0xb60 net/ipv6/addrconf.c:1208
 addrconf_permanent_addr net/ipv6/addrconf.c:3327 [inline]
 addrconf_notify+0x1c66/0x2190 net/ipv6/addrconf.c:3393
 notifier_call_chain+0x136/0x2c0 kernel/notifier.c:93
 __raw_notifier_call_chain kernel/notifier.c:394 [inline]
 raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401
 call_netdevice_notifiers_info+0x32/0x60 net/core/dev.c:1697
 call_netdevice_notifiers net/core/dev.c:1715 [inline]
 __dev_notify_flags+0x15d/0x430 net/core/dev.c:6843
 dev_change_flags+0xf5/0x140 net/core/dev.c:6879
 do_setlink+0xa1b/0x38e0 net/core/rtnetlink.c:2113
 rtnl_newlink+0xf0d/0x1a40 net/core/rtnetlink.c:2661
 rtnetlink_rcv_msg+0x733/0x1090 net/core/rtnetlink.c:4301
 netlink_rcv_skb+0x216/0x440 net/netlink/af_netlink.c:2408
 rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:4313
 netlink_unicast_kernel net/netlink/af_netlink.c:1273 [inline]
 netlink_unicast+0x4e8/0x6f0 net/netlink/af_netlink.c:1299
 netlink_sendmsg+0xa4a/0xe70 net/netlink/af_netlink.c:1862
 sock_sendmsg_nosec net/socket.c:633 [inline]
 sock_sendmsg+0xca/0x110 net/socket.c:643
 ___sys_sendmsg+0x75b/0x8a0 net/socket.c:2049
 __sys_sendmsg+0xe5/0x210 net/socket.c:2083
 SYSC_sendmsg net/socket.c:2094 [inline]
 SyS_sendmsg+0x2d/0x50 net/socket.c:2090
 entry_SYSCALL_64_fastpath+0x1f/0xbe
RIP: 0033:0x7fa9174d3320
RSP: 002b:7ffe302ae9e8 EFLAGS: 0246 ORIG_RAX: 002e
RAX: ffda RBX: 7ffe302b2ae0 RCX: 7fa9174d3320
RDX:  RSI: 7ffe302aea20 RDI: 0016
RBP: 0082 R08:  R09: 000f
R10:  R11: 0246 R12: 7ffe302b32a0
R13:  R14: 7ffe302b2ab8 R15: 7ffe302b32b8

Fixes: f1705ec197e7 ("net: ipv6: Make address flushing on ifdown optional")
Signed-off-by: Eric Dumazet 
Cc: David Ahern 
---
 net/ipv6/addrconf.c |1 +
 1 file changed, 1 insertion(+)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 
4a96ebbf8eda5f59a6ff88e836d666a404d2bf0d..8a1c846d3df949a4638589f187120db22a3525ba
 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -3335,6 +3335,7 @@ static void addrconf_permanent_addr(struct net_device 
*dev)
if ((ifp->flags & IFA_F_PERMANENT) &&
fixup_permanent_addr(idev, ifp) < 0) {
write_unlock_bh(>lock);
+   in6_ifa_hold(ifp);
ipv6_del_addr(ifp);
write_lock_bh(>lock);
 




Re: [Patch net 00/16] net_sched: fix races with RCU callbacks

2017-10-30 Thread Cong Wang
On Mon, Oct 30, 2017 at 6:03 PM, Lucas Bates  wrote:
> On Oct 30, 2017 19:13, "Cong Wang"  wrote:
>>
>> On Mon, Oct 30, 2017 at 3:39 PM, Lucas Bates  wrote:
>> > e.On Thu, Oct 26, 2017 at 9:24 PM, Cong Wang 
>> > wrote:
>> >> Recently, the RCU callbacks used in TC filters and TC actions keep
>> >> drawing my attention, they introduce at least 4 race condition bugs:
>> > 
>> >> As suggested by Paul, we could defer the work to a workqueue and
>> >> gain the permission of holding RTNL again without any performance
>> >> impact, however, in tcf_block_put() we could have a deadlock when
>> >> flushing workqueue while hodling RTNL lock, the trick here is to
>> >> defer the work itself in workqueue and make it queued after all
>> >> other works so that we keep the same ordering to avoid any
>> >> use-after-free. Please see the first patch for details.
>> >
>> > Cong, I don't believe the problem's been resolved just yet  I have
>> > a new kernel, compiled just today and I'm still tripping over a kernel
>> > bug in this scenario when I run Chris' new test case.
>> >
>>
>> Without a stack trace, I can't do anything. "a kernel bug" could
>> be anything, why do you believe it is caused by this patchset?
>>
> The stack trace I saw touched on the same code that was affected by the
> patchset.  I've attached a photo of the trace - sorry, I should have sent it
> earlier.  I also apologize for having to send a photo but I was doing the
> testing on a small device lacking any kind of serial console access.


Can you try this patch? From your stack trace it is not clear where
the cause is, but we know that the crash is in __tcf_idr_release(),
this is how I came up with the following patch:


diff --git a/include/net/act_api.h b/include/net/act_api.h
index b944e0eb93be..5072446d5f06 100644
--- a/include/net/act_api.h
+++ b/include/net/act_api.h
@@ -122,7 +122,9 @@ void tcf_idrinfo_destroy(const struct tc_action_ops *ops,

 static inline void tc_action_net_exit(struct tc_action_net *tn)
 {
+   rtnl_lock();
tcf_idrinfo_destroy(tn->ops, tn->idrinfo);
+   rtnl_unlock();
kfree(tn->idrinfo);
 }



>
> If you need more information or need me to try something else, I'll be able
> to tomorrow.
>

I will look deeper tomorrow. It doesn't look like caused by this patchset
so far, probably yet another missing rtnl like what the above patch shows.


Thanks!


linux-next: Signed-off-by missing for commit in the net-next tree

2017-10-30 Thread Stephen Rothwell
Hi all,

Commit

  509708310cf9 ("r8169: Add support for interrupt coalesce tuning (ethtool -C)")

is missing a Signed-off-by from its author (or its author is wrong).

-- 
Cheers,
Stephen Rothwell


Re: [BUG ?] ipv6: addrconf: Adds a missing in6_ifa_hold()

2017-10-30 Thread Eric Dumazet
On Mon, 2017-10-30 at 22:49 -0600, David Ahern wrote:
> On 10/30/17 9:53 PM, Eric Dumazet wrote:
> > David, I was looking at addrconf_permanent_addr() and wondered
> > if there is not some problem with it.
> > 
> > It seems we need to increment ifp refcount before calling
> > ipv6_del_addr()
> > 
> > Could you double check if this patch is needed, I am guessing you have a
> > test suite exercising this code path ?
> 
> A lot has changed in 20 months since the patch that added the code. For
> instance, taking down the 'lo' device no longer affects host routes on
> other interfaces. Also, fixup_permanent_addr only fails on memory
> allocation. Did you hit this with a test case because I do not have a
> general one that causes the memory failure (hard coding a failure for an
> address is the only way).
> 
> > 
> > Thanks.
> > 
> > PS : Presumably CONFIG_REFCOUNT_FULL=y should have warned you of the
> > problem.
> 
> I have not run a debug kernel in a while -- and did not have this option
> set. Added it to my debug config.
> 
> > 
> > diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> > index 
> > 4a96ebbf8eda5f59a6ff88e836d666a404d2bf0d..8a1c846d3df949a4638589f187120db22a3525ba
> >  100644
> > --- a/net/ipv6/addrconf.c
> > +++ b/net/ipv6/addrconf.c
> > @@ -3335,6 +3335,7 @@ static void addrconf_permanent_addr(struct net_device 
> > *dev)
> > if ((ifp->flags & IFA_F_PERMANENT) &&
> > fixup_permanent_addr(idev, ifp) < 0) {
> > write_unlock_bh(>lock);
> > +   in6_ifa_hold(ifp);
> > ipv6_del_addr(ifp);
> > write_lock_bh(>lock);
> >  
> 
> Yes, forcing a failure here does trigger refcnt warning, but then you
> knew that. ;-)
> 

Okay ;)

> 
> PS. is the following a known failure? I triggered it looking into your
> report
> 
> [  170.385741] ==
> [  170.387490] WARNING: possible circular locking dependency detected
> [  170.389214] 4.14.0-rc5+ #338 Not tainted
> [  170.390323] --
> [  170.392017] swapper/0/0 is trying to acquire lock:
> [  170.393408]  (slock-AF_INET){+.-.}, at: []
> tcp_delack_timer+0x29/0xb1
> [  170.395622]
> but task is already holding lock:
> [  170.396943]  ((timer)){+.-.}, at: []
> call_timer_fn+0x5/0x36b
> [  170.397912]
> which lock already depends on the new lock.
> 
> [  170.398986]
> the existing dependency chain (in reverse order) is:
> [  170.399965]
> -> #1 ((timer)){+.-.}:
> [  170.400629]lock_acquire+0x154/0x220
> [  170.401198]del_timer_sync+0x47/0xbd
> [  170.401760]inet_csk_reqsk_queue_drop+0x109/0x141
> [  170.402464]inet_csk_complete_hashdance+0x3b/0x68
> [  170.403173]tcp_check_req+0x517/0x5f1
> [  170.403746]tcp_v4_rcv+0x6ad/0xce7
> [  170.404287]ip_local_deliver_finish+0x1d4/0x281
> [  170.404985]ip_local_deliver+0xaf/0xcf
> [  170.405571]ip_rcv_finish+0x632/0x6ff
> [  170.406140]ip_rcv+0x45d/0x4a6
> ...

Yeah, please look at 
https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?h=timers/core=52f737c2da40259ac9962170ce608b6fb1b55ee4




Re: [BUG ?] ipv6: addrconf: Adds a missing in6_ifa_hold()

2017-10-30 Thread David Ahern
On 10/30/17 9:53 PM, Eric Dumazet wrote:
> David, I was looking at addrconf_permanent_addr() and wondered
> if there is not some problem with it.
> 
> It seems we need to increment ifp refcount before calling
> ipv6_del_addr()
> 
> Could you double check if this patch is needed, I am guessing you have a
> test suite exercising this code path ?

A lot has changed in 20 months since the patch that added the code. For
instance, taking down the 'lo' device no longer affects host routes on
other interfaces. Also, fixup_permanent_addr only fails on memory
allocation. Did you hit this with a test case because I do not have a
general one that causes the memory failure (hard coding a failure for an
address is the only way).

> 
> Thanks.
> 
> PS : Presumably CONFIG_REFCOUNT_FULL=y should have warned you of the
> problem.

I have not run a debug kernel in a while -- and did not have this option
set. Added it to my debug config.

> 
> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> index 
> 4a96ebbf8eda5f59a6ff88e836d666a404d2bf0d..8a1c846d3df949a4638589f187120db22a3525ba
>  100644
> --- a/net/ipv6/addrconf.c
> +++ b/net/ipv6/addrconf.c
> @@ -3335,6 +3335,7 @@ static void addrconf_permanent_addr(struct net_device 
> *dev)
>   if ((ifp->flags & IFA_F_PERMANENT) &&
>   fixup_permanent_addr(idev, ifp) < 0) {
>   write_unlock_bh(>lock);
> + in6_ifa_hold(ifp);
>   ipv6_del_addr(ifp);
>   write_lock_bh(>lock);
>  

Yes, forcing a failure here does trigger refcnt warning, but then you
knew that. ;-)


PS. is the following a known failure? I triggered it looking into your
report

[  170.385741] ==
[  170.387490] WARNING: possible circular locking dependency detected
[  170.389214] 4.14.0-rc5+ #338 Not tainted
[  170.390323] --
[  170.392017] swapper/0/0 is trying to acquire lock:
[  170.393408]  (slock-AF_INET){+.-.}, at: []
tcp_delack_timer+0x29/0xb1
[  170.395622]
but task is already holding lock:
[  170.396943]  ((timer)){+.-.}, at: []
call_timer_fn+0x5/0x36b
[  170.397912]
which lock already depends on the new lock.

[  170.398986]
the existing dependency chain (in reverse order) is:
[  170.399965]
-> #1 ((timer)){+.-.}:
[  170.400629]lock_acquire+0x154/0x220
[  170.401198]del_timer_sync+0x47/0xbd
[  170.401760]inet_csk_reqsk_queue_drop+0x109/0x141
[  170.402464]inet_csk_complete_hashdance+0x3b/0x68
[  170.403173]tcp_check_req+0x517/0x5f1
[  170.403746]tcp_v4_rcv+0x6ad/0xce7
[  170.404287]ip_local_deliver_finish+0x1d4/0x281
[  170.404985]ip_local_deliver+0xaf/0xcf
[  170.405571]ip_rcv_finish+0x632/0x6ff
[  170.406140]ip_rcv+0x45d/0x4a6
...


[PATCH net-next 1/2] net: phy: Cosmetic fixes to phylink/sfp/sfp-bus.c

2017-10-30 Thread Florian Fainelli
Perform a number of stylistic changes to phylink.c, sfp.c and sfp-bus.c:

- align with netdev-style comments
- align function arguments to the opening parenthesis
- remove blank lines
- fixup a few lines over 80 columns

Signed-off-by: Florian Fainelli 
---
 drivers/net/phy/phylink.c | 13 ++---
 drivers/net/phy/sfp-bus.c | 11 +++
 drivers/net/phy/sfp.c | 27 +--
 3 files changed, 22 insertions(+), 29 deletions(-)

diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
index bcb4755bcd95..05c8f1c10e36 100644
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -357,7 +357,7 @@ static void phylink_get_fixed_state(struct phylink *pl, 
struct phylink_link_stat
  *1 1   0 1 TX
  */
 static void phylink_resolve_flow(struct phylink *pl,
-   struct phylink_link_state *state)
+struct phylink_link_state *state)
 {
int new_pause = 0;
 
@@ -506,7 +506,8 @@ static int phylink_register_sfp(struct phylink *pl, struct 
device_node *np)
 }
 
 struct phylink *phylink_create(struct net_device *ndev, struct device_node *np,
-   phy_interface_t iface, const struct phylink_mac_ops *ops)
+  phy_interface_t iface,
+  const struct phylink_mac_ops *ops)
 {
struct phylink *pl;
int ret;
@@ -585,7 +586,7 @@ void phylink_phy_change(struct phy_device *phydev, bool up, 
bool do_carrier)
phylink_run_resolve(pl);
 
netdev_dbg(pl->netdev, "phy link %s %s/%s/%s\n", up ? "up" : "down",
-  phy_modes(phydev->interface),
+  phy_modes(phydev->interface),
   phy_speed_to_str(phydev->speed),
   phy_duplex_to_str(phydev->duplex));
 }
@@ -823,7 +824,7 @@ static void phylink_get_ksettings(const struct 
phylink_link_state *state,
 }
 
 int phylink_ethtool_ksettings_get(struct phylink *pl,
-   struct ethtool_link_ksettings *kset)
+ struct ethtool_link_ksettings *kset)
 {
struct phylink_link_state link_state;
 
@@ -870,7 +871,7 @@ int phylink_ethtool_ksettings_get(struct phylink *pl,
 EXPORT_SYMBOL_GPL(phylink_ethtool_ksettings_get);
 
 int phylink_ethtool_ksettings_set(struct phylink *pl,
-   const struct ethtool_link_ksettings *kset)
+ const struct ethtool_link_ksettings *kset)
 {
struct ethtool_link_ksettings our_kset;
struct phylink_link_state config;
@@ -1337,8 +1338,6 @@ int phylink_mii_ioctl(struct phylink *pl, struct ifreq 
*ifr, int cmd)
 }
 EXPORT_SYMBOL_GPL(phylink_mii_ioctl);
 
-
-
 static int phylink_sfp_module_insert(void *upstream,
 const struct sfp_eeprom_id *id)
 {
diff --git a/drivers/net/phy/sfp-bus.c b/drivers/net/phy/sfp-bus.c
index 5cb5384697ea..8a1b1f4c1b7c 100644
--- a/drivers/net/phy/sfp-bus.c
+++ b/drivers/net/phy/sfp-bus.c
@@ -26,7 +26,6 @@ struct sfp_bus {
bool started;
 };
 
-
 int sfp_parse_port(struct sfp_bus *bus, const struct sfp_eeprom_id *id,
   unsigned long *support)
 {
@@ -208,7 +207,6 @@ void sfp_parse_support(struct sfp_bus *bus, const struct 
sfp_eeprom_id *id,
 }
 EXPORT_SYMBOL_GPL(sfp_parse_support);
 
-
 static LIST_HEAD(sfp_buses);
 static DEFINE_MUTEX(sfp_mutex);
 
@@ -295,7 +293,6 @@ static void sfp_unregister_bus(struct sfp_bus *bus)
bus->registered = false;
 }
 
-
 int sfp_get_module_info(struct sfp_bus *bus, struct ethtool_modinfo *modinfo)
 {
if (!bus->registered)
@@ -305,7 +302,7 @@ int sfp_get_module_info(struct sfp_bus *bus, struct 
ethtool_modinfo *modinfo)
 EXPORT_SYMBOL_GPL(sfp_get_module_info);
 
 int sfp_get_module_eeprom(struct sfp_bus *bus, struct ethtool_eeprom *ee,
-   u8 *data)
+ u8 *data)
 {
if (!bus->registered)
return -ENOIOCTLCMD;
@@ -330,8 +327,8 @@ void sfp_upstream_stop(struct sfp_bus *bus)
 EXPORT_SYMBOL_GPL(sfp_upstream_stop);
 
 struct sfp_bus *sfp_register_upstream(struct device_node *np,
-   struct net_device *ndev, void *upstream,
-   const struct sfp_upstream_ops *ops)
+ struct net_device *ndev, void *upstream,
+ const struct sfp_upstream_ops *ops)
 {
struct sfp_bus *bus = sfp_bus_get(np);
int ret = 0;
@@ -368,7 +365,6 @@ void sfp_unregister_upstream(struct sfp_bus *bus)
 }
 EXPORT_SYMBOL_GPL(sfp_unregister_upstream);
 
-
 /* Socket driver entry points */
 int sfp_add_phy(struct sfp_bus *bus, struct phy_device *phydev)
 {
@@ -395,7 +391,6 @@ void sfp_remove_phy(struct sfp_bus *bus)
 }
 EXPORT_SYMBOL_GPL(sfp_remove_phy);
 
-
 void sfp_link_up(struct sfp_bus *bus)
 {
const struct sfp_upstream_ops *ops = sfp_get_upstream_ops(bus);
diff --git a/drivers/net/phy/sfp.c b/drivers/net/phy/sfp.c
index baee371bf767..448465da0422 100644
--- 

[PATCH net-next 0/2] PHYLINK cosmetic and build fixes

2017-10-30 Thread Florian Fainelli
Hi David, Russell,

Please find two small "fixes" one that corrects some stylistic changes and
another one that fixes an actual build failure in sfp.c. Since PHYLINK is
not directly visible to user, and there are no in-tree users yet (coming)
this is not targeted at "net" but "net-next" instead.

Thanks

Florian Fainelli (2):
  net: phy: Cosmetic fixes to phylink/sfp/sfp-bus.c
  net: phy: Fix sfp.c build against GPIO definitions

 drivers/net/phy/phylink.c | 13 ++---
 drivers/net/phy/sfp-bus.c | 11 +++
 drivers/net/phy/sfp.c | 29 ++---
 3 files changed, 23 insertions(+), 30 deletions(-)

-- 
2.14.1



[PATCH net-next 2/2] net: phy: Fix sfp.c build against GPIO definitions

2017-10-30 Thread Florian Fainelli
include/gpio.h does not contain the references we want, we should be including
linux/gpio/consumer.h instead.

Fixes: 73970055450e ("sfp: add SFP module support")
Signed-off-by: Florian Fainelli 
---
 drivers/net/phy/sfp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/phy/sfp.c b/drivers/net/phy/sfp.c
index 448465da0422..e381811e5f11 100644
--- a/drivers/net/phy/sfp.c
+++ b/drivers/net/phy/sfp.c
@@ -1,5 +1,5 @@
 #include 
-#include 
+#include 
 #include 
 #include 
 #include 
-- 
2.14.1



Re: [PATCH] net: caif: chnl_net: remove unnecessary null check in chnl_recv_cb

2017-10-30 Thread Gustavo A. R. Silva


Quoting "Gustavo A. R. Silva" :


Quoting "Gustavo A. R. Silva" :


Hi,

Quoting "Gustavo A. R. Silva" :


container_of is never null, so this null check is unnecessary.

This code was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva 
---
net/caif/chnl_net.c | 2 --
1 file changed, 2 deletions(-)

diff --git a/net/caif/chnl_net.c b/net/caif/chnl_net.c
index 922ac1d..489298d 100644
--- a/net/caif/chnl_net.c
+++ b/net/caif/chnl_net.c
@@ -77,8 +77,6 @@ static int chnl_recv_cb(struct cflayer *layr,  
struct cfpkt *pkt)

u8 buf;

priv = container_of(layr, struct chnl_net, chnl);
-   if (!priv)
-   return -EINVAL;


But the driver only passes in to container_of the type of driver  
structure registered with it. So this type of manipulation must be  
completely safe. Right?


Now it seems to me (again) that the patch is fine.

I'm sorry, I'm a little bit confused here.



skb = (struct sk_buff *) cfpkt_tonative(pkt);

--
2.7.4


Please, ignore this patch.

I just realized that function chnl_recv_cb is being called only  
during initialization:


chnl_init_module() -> rtnl_link_register() -> ipcaif_net_setup() ->  
chnl_recv_cb():




Well, here ipcaif_net_setup stores a pointer to chnl_recv_cb in a  
structure. It doesn't call it.



static void ipcaif_net_setup(struct net_device *dev)
{
[...]
priv = netdev_priv(dev);
priv->chnl.receive = chnl_recv_cb;

[...]
}

static struct rtnl_link_ops ipcaif_link_ops __read_mostly = {
.kind   = "caif",
.priv_size  = sizeof(struct chnl_net),
.setup  = ipcaif_net_setup,
.maxtype= IFLA_CAIF_MAX,
.policy = ipcaif_policy,
.newlink= ipcaif_newlink,
.changelink = ipcaif_changelink,
.get_size   = ipcaif_get_size,
.fill_info  = ipcaif_fill_info,

};

static int __init chnl_init_module(void)
{
return rtnl_link_register(_link_ops);
}





--
Gustavo A. R. Silva








Re: Using the aesni generic gcm(aes) aead in atomic context

2017-10-30 Thread Herbert Xu
On Mon, Oct 30, 2017 at 03:18:21PM +, Ilya Lesokhin wrote:
> Hi,
> I've tried using the aesni generic gcm(aes) aead to implement TLS SW fallback 
> and
> I'm getting 
> [ 3356.839506] BUG: sleeping function called from invalid context at 
> ./include/crypto/algapi.h:417
> 
> The warning is coming from a ___might_sleep() macro that is called if 
> CRYPTO_TFM_REQ_MAY_SLEEP is set.
> I'm getting the warning regardless of if pass CRYPTO_ALG_ASYNC or 0 as flags 
> to crypto_alloc_aead("gcm(aes)", 0, flags).
> 
> I've also noticed that rfc4106_encrypt() includes a irq_fpu_usable() check 
> while generic_gcmaes_encrypt() doesn't. 
> Is the generic gcm(aes) aead unsafe in atomic context?
> And if so which aead should I use?
> 
> Finally, out of curiosity, doesn't macsec crypto run in atomic context?

Are you allocating the tfm from atomic context? That is not allowed.

Normally you would allocate the tfm in process context, e.g., when
the connection is setup.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


[BUG ?] ipv6: addrconf: Adds a missing in6_ifa_hold()

2017-10-30 Thread Eric Dumazet
David, I was looking at addrconf_permanent_addr() and wondered
if there is not some problem with it.

It seems we need to increment ifp refcount before calling
ipv6_del_addr()

Could you double check if this patch is needed, I am guessing you have a
test suite exercising this code path ?

Thanks.

PS : Presumably CONFIG_REFCOUNT_FULL=y should have warned you of the
problem.

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 
4a96ebbf8eda5f59a6ff88e836d666a404d2bf0d..8a1c846d3df949a4638589f187120db22a3525ba
 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -3335,6 +3335,7 @@ static void addrconf_permanent_addr(struct net_device 
*dev)
if ((ifp->flags & IFA_F_PERMANENT) &&
fixup_permanent_addr(idev, ifp) < 0) {
write_unlock_bh(>lock);
+   in6_ifa_hold(ifp);
ipv6_del_addr(ifp);
write_lock_bh(>lock);
 




Re: [PATCH net v4 2/3] dt-binding: net: sfp binding documentation

2017-10-30 Thread Florian Fainelli
Hi Baruch,

On 09/07/2017 02:25 AM, Baruch Siach wrote:
> Add device-tree binding documentation SFP transceivers. Support for SFP
> transceivers has been recently introduced (drivers/net/phy/sfp.c).
> 
> Signed-off-by: Baruch Siach 
> ---
> v4:
>   Remove redundant 'single' from the gpio specifier
>   Rename 'moddef0-gpios' property to 'mod-def0-gpios'
>   Remove 'phy-mode' property from the example; SFP determines the mode
> 
> v3:
>   Mention gpios phandle and specifier
>   Mention the polarity of each gpio
>   Fix example property names
> 
> v2:
>   Rename -gpio properties to -gpios
>   Rename the rate-select-gpio property to rate-select0-gpios
>   Add the rate-select1-gpios property
>   Add examples
> ---
>  Documentation/devicetree/bindings/net/sff,sfp.txt | 76 
> +++
>  1 file changed, 76 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/net/sff,sfp.txt
> 
> diff --git a/Documentation/devicetree/bindings/net/sff,sfp.txt 
> b/Documentation/devicetree/bindings/net/sff,sfp.txt
> new file mode 100644
> index ..60e970ce10ee
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/net/sff,sfp.txt
> @@ -0,0 +1,76 @@
> +Small Form Factor (SFF) Committee Small Form-factor Pluggable (SFP)
> +Transceiver
> +
> +Required properties:
> +
> +- compatible : must be "sff,sfp"
> +
> +Optional Properties:
> +
> +- i2c-bus : phandle of an I2C bus controller for the SFP two wire serial
> +  interface

What was the reasoning behind using this property instead of making the
SFP a child of the i2c bus directly? Were you thinking that there could
be systems where the SFP is not i2c-addresable, but another 2-wire
protocol is used instead? This is not particularly wrong per-se I guess,
but usually, the parent/child relationship should make that more obvious.

Right now, we have to have a series of platform devices matching
sff,sfp, so that puts some constraints on where these devices can be
within a Device Tree.

Sorry for catching up on this so late...

Thanks!
-- 
Florian


Re: [PATCH v2 net-next] ipv6: Implement limits on Hop-by-Hop and Destination options

2017-10-30 Thread Eric Dumazet
On Tue, 2017-10-31 at 11:10 +0900, David Miller wrote:
> From: Tom Herbert 
> Date: Mon, 30 Oct 2017 14:16:00 -0700
> 
> > I wrote a quick test program that floods a whole bunch of these
> > packets to a host and sure enough there is substantial time spent
> > in ip6_parse_tlv.
>  ...
> >   25.38%  [kernel][k] __fib6_clean_all
> >   21.63%  [kernel][k] ip6_parse_tlv
> 
> Yet the routing code still dominates the cost.

I am guessing the ip6_rt_max_size defaulting to 4096 needs to be
revisited, after all per-cpu added stuff.






[PATCH net-next] bpf: document answers to common questions about BPF

2017-10-30 Thread Alexei Starovoitov
to address common misconceptions about what BPF is and what it's not
add short BPF Q that clarifies core BPF design principles and
answers some common questions.

Signed-off-by: Alexei Starovoitov 
Acked-by: Daniel Borkmann 
Acked-by: John Fastabend 
Acked-by: Jakub Kicinski 
---
 Documentation/bpf/bpf_design_QA.txt | 156 
 MAINTAINERS |   1 +
 2 files changed, 157 insertions(+)
 create mode 100644 Documentation/bpf/bpf_design_QA.txt

diff --git a/Documentation/bpf/bpf_design_QA.txt 
b/Documentation/bpf/bpf_design_QA.txt
new file mode 100644
index ..f3e458a0bb2f
--- /dev/null
+++ b/Documentation/bpf/bpf_design_QA.txt
@@ -0,0 +1,156 @@
+BPF extensibility and applicability to networking, tracing, security
+in the linux kernel and several user space implementations of BPF
+virtual machine led to a number of misunderstanding on what BPF actually is.
+This short QA is an attempt to address that and outline a direction
+of where BPF is heading long term.
+
+Q: Is BPF a generic instruction set similar to x64 and arm64?
+A: NO.
+
+Q: Is BPF a generic virtual machine ?
+A: NO.
+
+BPF is generic instruction set _with_ C calling convention.
+
+Q: Why C calling convention was chosen?
+A: Because BPF programs are designed to run in the linux kernel
+   which is written in C, hence BPF defines instruction set compatible
+   with two most used architectures x64 and arm64 (and takes into
+   consideration important quirks of other architectures) and
+   defines calling convention that is compatible with C calling
+   convention of the linux kernel on those architectures.
+
+Q: can multiple return values be supported in the future?
+A: NO. BPF allows only register R0 to be used as return value.
+
+Q: can more than 5 function arguments be supported in the future?
+A: NO. BPF calling convention only allows registers R1-R5 to be used
+   as arguments. BPF is not a standalone instruction set.
+   (unlike x64 ISA that allows msft, cdecl and other conventions)
+
+Q: can BPF programs access instruction pointer or return address?
+A: NO.
+
+Q: can BPF programs access stack pointer ?
+A: NO. Only frame pointer (register R10) is accessible.
+   From compiler point of view it's necessary to have stack pointer.
+   For example LLVM defines register R11 as stack pointer in its
+   BPF backend, but it makes sure that generated code never uses it.
+
+Q: Does C-calling convention diminishes possible use cases?
+A: YES. BPF design forces addition of major functionality in the form
+   of kernel helper functions and kernel objects like BPF maps with
+   seamless interoperability between them. It lets kernel call into
+   BPF programs and programs call kernel helpers with zero overhead.
+   As all of them were native C code. That is particularly the case
+   for JITed BPF programs that are indistinguishable from
+   native kernel C code.
+
+Q: Does it mean that 'innovative' extensions to BPF code are disallowed?
+A: Soft yes. At least for now until BPF core has support for
+   bpf-to-bpf calls, indirect calls, loops, global variables,
+   jump tables, read only sections and all other normal constructs
+   that C code can produce.
+
+Q: Can loops be supported in a safe way?
+A: It's not clear yet. BPF developers are trying to find a way to
+   support bounded loops where the verifier can guarantee that
+   the program terminates in less than 4096 instructions.
+
+Q: How come LD_ABS and LD_IND instruction are present in BPF whereas
+   C code cannot express them and has to use builtin intrinsics?
+A: This is artifact of compatibility with classic BPF. Modern
+   networking code in BPF performs better without them.
+   See 'direct packet access'.
+
+Q: It seems not all BPF instructions are one-to-one to native CPU.
+   For example why BPF_JNE and other compare and jumps are not cpu-like?
+A: This was necessary to avoid introducing flags into ISA which are
+   impossible to make generic and efficient across CPU architectures.
+
+Q: why BPF_DIV instruction doesn't map to x64 div?
+A: Because if we picked one-to-one relationship to x64 it would have made
+   it more complicated to support on arm64 and other archs. Also it
+   needs div-by-zero runtime check.
+
+Q: why there is no BPF_SDIV for signed divide operation?
+A: Because it would be rarely used. llvm errors in such case and
+   prints a suggestion to use unsigned divide instead
+
+Q: Why BPF has implicit prologue and epilogue?
+A: Because architectures like sparc have register windows and in general
+   there are enough subtle differences between architectures, so naive
+   store return address into stack won't work. Another reason is BPF has
+   to be safe from division by zero (and legacy exception path
+   of LD_ABS insn). Those instructions need to invoke epilogue and
+   return implicitly.
+
+Q: Why BPF_JLT and BPF_JLE 

Re: [PATCH v2 net-next] ipv6: Implement limits on Hop-by-Hop and Destination options

2017-10-30 Thread Tom Herbert
On Mon, Oct 30, 2017 at 7:10 PM, David Miller  wrote:
> From: Tom Herbert 
> Date: Mon, 30 Oct 2017 14:16:00 -0700
>
>> I wrote a quick test program that floods a whole bunch of these
>> packets to a host and sure enough there is substantial time spent
>> in ip6_parse_tlv.
>  ...
>>   25.38%  [kernel][k] __fib6_clean_all
>>   21.63%  [kernel][k] ip6_parse_tlv
>
> Yet the routing code still dominates the cost.

I wouldn't read too much into that. This was unconnected UDP on VMs
and the only purpose here was to demonstrate that ip6_parse_tlv does
get a lot of work with a lot of TLVs. Martin's results listed in the
tested section are probably a more accurate gauge of the impact and
potential to mitigate DOS.

Tom


Re: [PATCH] net: caif: chnl_net: remove unnecessary null check in chnl_recv_cb

2017-10-30 Thread Gustavo A. R. Silva


Quoting "Gustavo A. R. Silva" :


Hi,

Quoting "Gustavo A. R. Silva" :


container_of is never null, so this null check is unnecessary.

This code was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva 
---
net/caif/chnl_net.c | 2 --
1 file changed, 2 deletions(-)

diff --git a/net/caif/chnl_net.c b/net/caif/chnl_net.c
index 922ac1d..489298d 100644
--- a/net/caif/chnl_net.c
+++ b/net/caif/chnl_net.c
@@ -77,8 +77,6 @@ static int chnl_recv_cb(struct cflayer *layr,  
struct cfpkt *pkt)

u8 buf;

priv = container_of(layr, struct chnl_net, chnl);
-   if (!priv)
-   return -EINVAL;

skb = (struct sk_buff *) cfpkt_tonative(pkt);

--
2.7.4


Please, ignore this patch.

I just realized that function chnl_recv_cb is being called only  
during initialization:


chnl_init_module() -> rtnl_link_register() -> ipcaif_net_setup() ->  
chnl_recv_cb():




Well, here ipcaif_net_setup stores a pointer to chnl_recv_cb in a  
structure. It doesn't call it.



static void ipcaif_net_setup(struct net_device *dev)
{
[...]
priv = netdev_priv(dev);
priv->chnl.receive = chnl_recv_cb;

[...]
}

static struct rtnl_link_ops ipcaif_link_ops __read_mostly = {
.kind   = "caif",
.priv_size  = sizeof(struct chnl_net),
.setup  = ipcaif_net_setup,
.maxtype= IFLA_CAIF_MAX,
.policy = ipcaif_policy,
.newlink= ipcaif_newlink,
.changelink = ipcaif_changelink,
.get_size   = ipcaif_get_size,
.fill_info  = ipcaif_fill_info,

};

static int __init chnl_init_module(void)
{
return rtnl_link_register(_link_ops);
}



--
Gustavo A. R. Silva








Re: [PATCH net] tc-testing: fix arg to ip command: -s -> -n

2017-10-30 Thread David Miller
From: "Brenda J. Butler" 
Date: Mon, 30 Oct 2017 17:59:22 -0400

> Fixes: 31c2611b66e0 ("selftests: Introduce a new test case to tc testsuite")
> Fixes: 76b903ee198d ("selftests: Introduce tc testsuite")
> Signed-off-by: Brenda J. Butler 

Applied, thanks Brenda.


Re: [PATCH v2 net-next] ipv6: Implement limits on Hop-by-Hop and Destination options

2017-10-30 Thread David Miller
From: Tom Herbert 
Date: Mon, 30 Oct 2017 14:16:00 -0700

> I wrote a quick test program that floods a whole bunch of these
> packets to a host and sure enough there is substantial time spent
> in ip6_parse_tlv.
 ...
>   25.38%  [kernel][k] __fib6_clean_all
>   21.63%  [kernel][k] ip6_parse_tlv

Yet the routing code still dominates the cost.


Re: [Patch net] net_sched: remove tcf_block_put_deferred()

2017-10-30 Thread David Miller
From: Cong Wang 
Date: Mon, 30 Oct 2017 11:10:09 -0700

> In commit 7aa0045dadb6 ("net_sched: introduce a workqueue for RCU callbacks 
> of tc filter")
> I defer tcf_chain_flush() to a workqueue, this causes a use-after-free
> because qdisc is already destroyed after we queue this work.
> 
> The tcf_block_put_deferred() is no longer necessary after we get RTNL
> for each tc filter destroy work, no others could jump in at this point.
> Same for tcf_chain_hold(), we are fully serialized now.
> 
> This also reduces one indirection therefore makes the code more
> readable. Note this brings back a rcu_barrier(), however comparing
> to the code prior to commit 7aa0045dadb6 we still reduced one
> rcu_barrier(). For net-next, we can consider to refcnt tcf block to
> avoid it.
> 
> Fixes: 7aa0045dadb6 ("net_sched: introduce a workqueue for RCU callbacks of 
> tc filter")
> Cc: Daniel Borkmann 
> Cc: Jiri Pirko 
> Cc: John Fastabend 
> Cc: Jamal Hadi Salim 
> Cc: "Paul E. McKenney" 
> Cc: Eric Dumazet 
> Signed-off-by: Cong Wang 

Applied, thanks for fixing this use-after-free so quickly.

This will be another fun merge into net-next :-)


Re: [PATCH net] l2tp: hold tunnel in pppol2tp_connect()

2017-10-30 Thread David Miller
From: Guillaume Nault 
Date: Mon, 30 Oct 2017 17:58:58 +0100

> Use l2tp_tunnel_get() in pppol2tp_connect() to ensure the tunnel isn't
> going to disappear while processing the rest of the function.
> 
> Fixes: fd558d186df2 ("l2tp: Split pppol2tp patch into separate l2tp and ppp 
> parts")
> Signed-off-by: Guillaume Nault 
> ---
> 
> Note: in case of backporting to -stable, this patch only makes sense if
> f3c66d4e144a ("l2tp: prevent creation of sessions on terminated tunnels")
> is already present in the tree, as we need this issue to be fixed
> before fixing the current one.
> 
> The reason is that when connecting a session, we don't only depend on
> the tunnel, but also on its socket. Therefore, holding a reference on
> the tunnel is not enough, we also have to make sure that it's not going
> to drop its socket before the session is registered.

Applied and thank you for the detailed -stable explanation, it really
helps.


Re: [PATCH net-next] tc-testing: correction to docstring in get_unique_item

2017-10-30 Thread Lucas Bates
On Mon, Oct 30, 2017 at 8:37 PM, Brenda J. Butler  wrote:
> The docstring says the function returns a "set" but it returns
> a "list".  These are both python data types, so we should refer to
> the right one that is being returned.
>
> Signed-off-by: Brenda J. Butler 

Acked-by: Lucas Bates 


Re: [Patch net 00/16] net_sched: fix races with RCU callbacks

2017-10-30 Thread Lucas Bates
On Mon, Oct 30, 2017 at 7:12 PM, Cong Wang  wrote:
> On Mon, Oct 30, 2017 at 3:39 PM, Lucas Bates  wrote:
>> e.On Thu, Oct 26, 2017 at 9:24 PM, Cong Wang  
>> wrote:
>>> Recently, the RCU callbacks used in TC filters and TC actions keep
>>> drawing my attention, they introduce at least 4 race condition bugs:
>> 
>>> As suggested by Paul, we could defer the work to a workqueue and
>>> gain the permission of holding RTNL again without any performance
>>> impact, however, in tcf_block_put() we could have a deadlock when
>>> flushing workqueue while hodling RTNL lock, the trick here is to
>>> defer the work itself in workqueue and make it queued after all
>>> other works so that we keep the same ordering to avoid any
>>> use-after-free. Please see the first patch for details.
>>
>> Cong, I don't believe the problem's been resolved just yet  I have
>> a new kernel, compiled just today and I'm still tripping over a kernel
>> bug in this scenario when I run Chris' new test case.
>>
>
> Without a stack trace, I can't do anything. "a kernel bug" could
> be anything, why do you believe it is caused by this patchset?
>
(Sending again due to email format issues)

The stack trace I saw touched on the same code that was affected by
the patchset. I've sent a photo of the trace to you directly - I'm
sorry, I should have sent it earlier.  I also apologize for having to
send the photo but I was doing the testing on a small device lacking
any kind of serial console access.


>> I'm doing this on a machine where I don't have a spare device to use
>> on the run. Instead I created a veth pair that will have one end
>> migrated into the container.
>>
>> The bug isn't consistent. I'm running into it anywhere between one and
>> four runs of the d052 test case.
>>
>> Steps to reproduce:
>>
>> sudo ip li add foo type veth
>> sudo ./tdc.py -d foo -c flower
>> [repeat until kernel bug encountered]
>
> I tested with basic and u32 filter, since it is not specific to any type
> of filters, all filters had the same problem. I will try flower filter too
> before you provide a stack trace or a detailed report.
If you need more information or need me to try something else, I'll be
able to tomorrow.


Re: [PATCH 2/2] samples/bpf: add a test for bpf_override_return

2017-10-30 Thread Alexei Starovoitov

On 10/30/17 2:19 PM, Josef Bacik wrote:

+
+rm -f testfile.img
+dd if=/dev/zero of=testfile.img bs=1M seek=1000 count=1
+DEVICE=$(losetup --show -f testfile.img)
+mkfs.btrfs -f $DEVICE
+mkdir tmpmnt
+./tracex7 $DEVICE
+if [ $? -eq 0 ]
+then
+   echo "SUCCESS!"
+else
+   echo "FAILED!"
+fi
+losetup -d $DEVICE


nice test!


diff --git a/samples/bpf/tracex7_kern.c b/samples/bpf/tracex7_kern.c
new file mode 100644
index ..a2f74f736e66
--- /dev/null
+++ b/samples/bpf/tracex7_kern.c
@@ -0,0 +1,15 @@
+#include 
+#include 
+#include 
+#include "bpf_helpers.h"
+
+SEC("kprobe/open_ctree")
+int bpf_prog1(struct pt_regs *ctx)
+{
+   unsigned long rc = -12;


pls add empty line here.


+   bpf_override_return(ctx, rc);
+   return 0;
+}




Re: [PATCH 1/2] bpf: add a bpf_override_function helper

2017-10-30 Thread Alexei Starovoitov

On 10/30/17 2:19 PM, Josef Bacik wrote:

From: Josef Bacik 

Error injection is sloppy and very ad-hoc.  BPF could fill this niche
perfectly with it's kprobe functionality.  We could make sure errors are
only triggered in specific call chains that we care about with very
specific situations.  Accomplish this with the bpf_override_funciton
helper.  This will modify the probe'd callers return value to the
specified value and set the PC to an override function that simply
returns, bypassing the originally probed function.  This gives us a nice
clean way to implement systematic error injection for all of our code
paths.

Signed-off-by: Josef Bacik 
---
 arch/Kconfig |  3 +++
 arch/x86/Kconfig |  1 +
 arch/x86/include/asm/kprobes.h   |  4 
 arch/x86/include/asm/ptrace.h|  5 +
 arch/x86/kernel/kprobes/ftrace.c | 14 
 include/uapi/linux/bpf.h |  7 +-
 kernel/trace/Kconfig | 11 ++
 kernel/trace/bpf_trace.c | 47 +++-
 kernel/trace/trace.h |  6 +
 kernel/trace/trace_kprobe.c  | 23 ++--
 10 files changed, 108 insertions(+), 13 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index d789a89cb32c..4fb618082259 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -195,6 +195,9 @@ config HAVE_OPTPROBES
 config HAVE_KPROBES_ON_FTRACE
bool

+config HAVE_KPROBE_OVERRIDE
+   bool
+
 config HAVE_NMI
bool

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 971feac13506..5126d2750dd0 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -152,6 +152,7 @@ config X86
select HAVE_KERNEL_XZ
select HAVE_KPROBES
select HAVE_KPROBES_ON_FTRACE
+   select HAVE_KPROBE_OVERRIDE
select HAVE_KRETPROBES
select HAVE_KVM
select HAVE_LIVEPATCH   if X86_64
diff --git a/arch/x86/include/asm/kprobes.h b/arch/x86/include/asm/kprobes.h
index 6cf65437b5e5..c6c3b1f4306a 100644
--- a/arch/x86/include/asm/kprobes.h
+++ b/arch/x86/include/asm/kprobes.h
@@ -67,6 +67,10 @@ extern const int kretprobe_blacklist_size;
 void arch_remove_kprobe(struct kprobe *p);
 asmlinkage void kretprobe_trampoline(void);

+#ifdef CONFIG_KPROBES_ON_FTRACE
+extern void arch_ftrace_kprobe_override_function(struct pt_regs *regs);
+#endif
+
 /* Architecture specific copy of original instruction*/
 struct arch_specific_insn {
/* copy of the original instruction */
diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h
index 91c04c8e67fa..f04e71800c2f 100644
--- a/arch/x86/include/asm/ptrace.h
+++ b/arch/x86/include/asm/ptrace.h
@@ -108,6 +108,11 @@ static inline unsigned long regs_return_value(struct 
pt_regs *regs)
return regs->ax;
 }

+static inline void regs_set_return_value(struct pt_regs *regs, unsigned long 
rc)
+{
+   regs->ax = rc;
+}
+
 /*
  * user_mode(regs) determines whether a register set came from user
  * mode.  On x86_32, this is true if V8086 mode was enabled OR if the
diff --git a/arch/x86/kernel/kprobes/ftrace.c b/arch/x86/kernel/kprobes/ftrace.c
index 041f7b6dfa0f..3c455bf490cb 100644
--- a/arch/x86/kernel/kprobes/ftrace.c
+++ b/arch/x86/kernel/kprobes/ftrace.c
@@ -97,3 +97,17 @@ int arch_prepare_kprobe_ftrace(struct kprobe *p)
p->ainsn.boostable = false;
return 0;
 }
+
+asmlinkage void override_func(void);
+asm(
+   ".type override_func, @function\n"
+   "override_func:\n"
+   "  ret\n"
+   ".size override_func, .-override_func\n"
+);
+
+void arch_ftrace_kprobe_override_function(struct pt_regs *regs)
+{
+   regs->ip = (unsigned long)_func;
+}
+NOKPROBE_SYMBOL(arch_ftrace_kprobe_override_function);
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 0b7b54d898bd..1ad5b87a42f6 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -673,6 +673,10 @@ union bpf_attr {
  * @buf: buf to fill
  * @buf_size: size of the buf
  * Return : 0 on success or negative error code
+ *
+ * int bpf_override_return(pt_regs, rc)
+ * @pt_regs: pointer to struct pt_regs
+ * @rc: the return value to set
  */
 #define __BPF_FUNC_MAPPER(FN)  \
FN(unspec), \
@@ -732,7 +736,8 @@ union bpf_attr {
FN(xdp_adjust_meta),\
FN(perf_event_read_value),  \
FN(perf_prog_read_value),   \
-   FN(getsockopt),
+   FN(getsockopt), \
+   FN(override_return),

 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 434c840e2d82..9dc0deeaad2b 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -518,6 +518,17 @@ config FUNCTION_PROFILER

  If in doubt, say N.

+config BPF_KPROBE_OVERRIDE
+   bool "Enable BPF programs to override a 

[PATCH net-next] tc-testing: correction to docstring in get_unique_item

2017-10-30 Thread Brenda J. Butler
The docstring says the function returns a "set" but it returns
a "list".  These are both python data types, so we should refer to
the right one that is being returned.

Signed-off-by: Brenda J. Butler 
---
This patch was formerly sent as part of a 15-patch series.  It is being
split off as a stand-alone patch.

 tools/testing/selftests/tc-testing/tdc_helper.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/tc-testing/tdc_helper.py 
b/tools/testing/selftests/tc-testing/tdc_helper.py
index c3254f861fb2..6f99a4efe761 100644
--- a/tools/testing/selftests/tc-testing/tdc_helper.py
+++ b/tools/testing/selftests/tc-testing/tdc_helper.py
@@ -15,7 +15,7 @@ def get_categorized_testlist(alltests, ucat):
 
 
 def get_unique_item(lst):
-""" For a list, return a set of the unique items in the list. """
+""" For a list, return a list of the unique items in the list. """
 return list(set(lst))
 
 
-- 
2.15.0.rc0



Re: [PATCH net-next] net: filter: remove unused variable and fix warning

2017-10-30 Thread David Miller
From: Jakub Kicinski 
Date: Mon, 30 Oct 2017 13:46:47 -0700

> bpf_getsockopt bpf call sets the ret variable to zero and
> never changes it.  What's worse in case CONFIG_INET is
> not selected the variable is completely unused generating
> a warning.
> 
> Signed-off-by: Jakub Kicinski 
> Reviewed-by: Quentin Monnet 

Applied.


Re: [PATCH] net: caif: chnl_net: remove unnecessary null check in chnl_recv_cb

2017-10-30 Thread Gustavo A. R. Silva

Hi,

Quoting "Gustavo A. R. Silva" :


container_of is never null, so this null check is unnecessary.

This code was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva 
---
 net/caif/chnl_net.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/net/caif/chnl_net.c b/net/caif/chnl_net.c
index 922ac1d..489298d 100644
--- a/net/caif/chnl_net.c
+++ b/net/caif/chnl_net.c
@@ -77,8 +77,6 @@ static int chnl_recv_cb(struct cflayer *layr,  
struct cfpkt *pkt)

u8 buf;

priv = container_of(layr, struct chnl_net, chnl);
-   if (!priv)
-   return -EINVAL;

skb = (struct sk_buff *) cfpkt_tonative(pkt);

--
2.7.4


Please, ignore this patch.

I just realized that function chnl_recv_cb is being called only during  
initialization:


chnl_init_module() -> rtnl_link_register() -> ipcaif_net_setup() ->  
chnl_recv_cb():


static void ipcaif_net_setup(struct net_device *dev)
{
[...]
priv = netdev_priv(dev);
priv->chnl.receive = chnl_recv_cb;

[...]
}

static struct rtnl_link_ops ipcaif_link_ops __read_mostly = {
.kind   = "caif",
.priv_size  = sizeof(struct chnl_net),
.setup  = ipcaif_net_setup,
.maxtype= IFLA_CAIF_MAX,
.policy = ipcaif_policy,
.newlink= ipcaif_newlink,
.changelink = ipcaif_changelink,
.get_size   = ipcaif_get_size,
.fill_info  = ipcaif_fill_info,

};

static int __init chnl_init_module(void)
{
return rtnl_link_register(_link_ops);
}

Thanks
--
Gustavo A. R. Silva






Re: [PATCH v2 net-next] net: display hw address of source machine during ipv6 DAD failure

2017-10-30 Thread David Ahern
On 10/30/17 5:38 PM, Vishwanath Pai wrote:
> This patch updates the error messages displayed in kernel log to include
> hwaddress of the source machine that caused ipv6 duplicate address
> detection failures.
> 
> Examples:
> 
> a) When we receive a NA packet from another machine advertising our
> address:
> 
> ICMPv6: NA: 34:ab:cd:56:11:e8 advertised our address 2001:db8:: on eth0!
> 
> b) When we detect DAD failure during address assignment to an interface:
> 
> IPv6: eth0: IPv6 duplicate address 2001:db8:: used by 34:ab:cd:56:11:e8
> detected!
> 
> v2:
> Changed %pI6 to %pI6c in ndisc_recv_na()
> Chaged the v6 address in the commit message to 2001:db8::
> 
> Suggested-by: Igor Lubashev 
> Signed-off-by: Vishwanath Pai 
> ---
>  include/net/addrconf.h | 2 +-
>  net/ipv6/addrconf.c| 6 +++---
>  net/ipv6/ndisc.c   | 9 +
>  3 files changed, 9 insertions(+), 8 deletions(-)

LGTM

Acked-by: David Ahern 


[PATCH v2 net-next] net: display hw address of source machine during ipv6 DAD failure

2017-10-30 Thread Vishwanath Pai
This patch updates the error messages displayed in kernel log to include
hwaddress of the source machine that caused ipv6 duplicate address
detection failures.

Examples:

a) When we receive a NA packet from another machine advertising our
address:

ICMPv6: NA: 34:ab:cd:56:11:e8 advertised our address 2001:db8:: on eth0!

b) When we detect DAD failure during address assignment to an interface:

IPv6: eth0: IPv6 duplicate address 2001:db8:: used by 34:ab:cd:56:11:e8
detected!

v2:
Changed %pI6 to %pI6c in ndisc_recv_na()
Chaged the v6 address in the commit message to 2001:db8::

Suggested-by: Igor Lubashev 
Signed-off-by: Vishwanath Pai 
---
 include/net/addrconf.h | 2 +-
 net/ipv6/addrconf.c| 6 +++---
 net/ipv6/ndisc.c   | 9 +
 3 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/include/net/addrconf.h b/include/net/addrconf.h
index 15b5ffd..2a616ea 100644
--- a/include/net/addrconf.h
+++ b/include/net/addrconf.h
@@ -208,7 +208,7 @@ bool inet6_mc_check(struct sock *sk, const struct in6_addr 
*mc_addr,
 void ipv6_mc_init_dev(struct inet6_dev *idev);
 void ipv6_mc_destroy_dev(struct inet6_dev *idev);
 int ipv6_mc_check_mld(struct sk_buff *skb, struct sk_buff **skb_trimmed);
-void addrconf_dad_failure(struct inet6_ifaddr *ifp);
+void addrconf_dad_failure(struct sk_buff *skb, struct inet6_ifaddr *ifp);
 
 bool ipv6_chk_mcast_addr(struct net_device *dev, const struct in6_addr *group,
 const struct in6_addr *src_addr);
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 5a8a102..cfa374c 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -1987,7 +1987,7 @@ static int addrconf_dad_end(struct inet6_ifaddr *ifp)
return err;
 }
 
-void addrconf_dad_failure(struct inet6_ifaddr *ifp)
+void addrconf_dad_failure(struct sk_buff *skb, struct inet6_ifaddr *ifp)
 {
struct inet6_dev *idev = ifp->idev;
struct net *net = dev_net(ifp->idev->dev);
@@ -1997,8 +1997,8 @@ void addrconf_dad_failure(struct inet6_ifaddr *ifp)
return;
}
 
-   net_info_ratelimited("%s: IPv6 duplicate address %pI6c detected!\n",
-ifp->idev->dev->name, >addr);
+   net_info_ratelimited("%s: IPv6 duplicate address %pI6c used by %pM 
detected!\n",
+ifp->idev->dev->name, >addr, 
eth_hdr(skb)->h_source);
 
spin_lock_bh(>lock);
 
diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
index 266a530..f9c3ffe 100644
--- a/net/ipv6/ndisc.c
+++ b/net/ipv6/ndisc.c
@@ -46,6 +46,7 @@
 #endif
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -822,7 +823,7 @@ static void ndisc_recv_ns(struct sk_buff *skb)
 * who is doing DAD
 * so fail our DAD process
 */
-   addrconf_dad_failure(ifp);
+   addrconf_dad_failure(skb, ifp);
return;
} else {
/*
@@ -975,7 +976,7 @@ static void ndisc_recv_na(struct sk_buff *skb)
if (ifp) {
if (skb->pkt_type != PACKET_LOOPBACK
&& (ifp->flags & IFA_F_TENTATIVE)) {
-   addrconf_dad_failure(ifp);
+   addrconf_dad_failure(skb, ifp);
return;
}
/* What should we make now? The advertisement
@@ -989,8 +990,8 @@ static void ndisc_recv_na(struct sk_buff *skb)
 */
if (skb->pkt_type != PACKET_LOOPBACK)
ND_PRINTK(1, warn,
- "NA: someone advertises our address %pI6 on 
%s!\n",
- >addr, ifp->idev->dev->name);
+ "NA: %pM advertised our address %pI6c on 
%s!\n",
+ eth_hdr(skb)->h_source, >addr, 
ifp->idev->dev->name);
in6_ifa_put(ifp);
return;
}
-- 
1.9.1



Re: [Patch net 00/16] net_sched: fix races with RCU callbacks

2017-10-30 Thread Cong Wang
On Mon, Oct 30, 2017 at 3:39 PM, Lucas Bates  wrote:
> e.On Thu, Oct 26, 2017 at 9:24 PM, Cong Wang  wrote:
>> Recently, the RCU callbacks used in TC filters and TC actions keep
>> drawing my attention, they introduce at least 4 race condition bugs:
> 
>> As suggested by Paul, we could defer the work to a workqueue and
>> gain the permission of holding RTNL again without any performance
>> impact, however, in tcf_block_put() we could have a deadlock when
>> flushing workqueue while hodling RTNL lock, the trick here is to
>> defer the work itself in workqueue and make it queued after all
>> other works so that we keep the same ordering to avoid any
>> use-after-free. Please see the first patch for details.
>
> Cong, I don't believe the problem's been resolved just yet  I have
> a new kernel, compiled just today and I'm still tripping over a kernel
> bug in this scenario when I run Chris' new test case.
>

Without a stack trace, I can't do anything. "a kernel bug" could
be anything, why do you believe it is caused by this patchset?

Note, there is a use-after-free bug caused by this patchset, the
fix is already on netdev in case you hit the same bug.

> I'm doing this on a machine where I don't have a spare device to use
> on the run. Instead I created a veth pair that will have one end
> migrated into the container.
>
> The bug isn't consistent. I'm running into it anywhere between one and
> four runs of the d052 test case.
>
> Steps to reproduce:
>
> sudo ip li add foo type veth
> sudo ./tdc.py -d foo -c flower
> [repeat until kernel bug encountered]

I tested with basic and u32 filter, since it is not specific to any type
of filters, all filters had the same problem. I will try flower filter too
before you provide a stack trace or a detailed report.


Re: [PATCH net] tun/tap: sanitize TUNSETSNDBUF input

2017-10-30 Thread Eric Dumazet
On Mon, 2017-10-30 at 18:50 -0400, Craig Gallek wrote:
> From: Craig Gallek 
> 
> Syzkaller found several variants of the lockup below by setting negative
> values with the TUNSETSNDBUF ioctl.  This patch adds a sanity check
> to both the tun and tap versions of this ioctl.
> 

Reviewed-by: Eric Dumazet 

Thanks !




[PATCH net] tun/tap: sanitize TUNSETSNDBUF input

2017-10-30 Thread Craig Gallek
From: Craig Gallek 

Syzkaller found several variants of the lockup below by setting negative
values with the TUNSETSNDBUF ioctl.  This patch adds a sanity check
to both the tun and tap versions of this ioctl.

  watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [repro:2389]
  Modules linked in:
  irq event stamp: 329692056
  hardirqs last  enabled at (329692055): [] 
_raw_spin_unlock_irqrestore+0x31/0x75
  hardirqs last disabled at (329692056): [] 
apic_timer_interrupt+0x98/0xb0
  softirqs last  enabled at (35659740): [] 
__do_softirq+0x328/0x48c
  softirqs last disabled at (35659731): [] irq_exit+0xbc/0xd0
  CPU: 0 PID: 2389 Comm: repro Not tainted 4.14.0-rc7 #23
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
  task: 880009452140 task.stack: 880006a2
  RIP: 0010:_raw_spin_lock_irqsave+0x11/0x80
  RSP: 0018:880006a27c50 EFLAGS: 0282 ORIG_RAX: ff10
  RAX: 880009ac68d0 RBX: 880006a27ce0 RCX: 
  RDX: 0001 RSI: 880006a27ce0 RDI: 880009ac6900
  RBP: 880006a27c60 R08:  R09: 
  R10: 0001 R11: 0063ff00 R12: 880009ac6900
  R13: 880006a27cf8 R14: 0001 R15: 880006a27cf8
  FS:  7f4be4838700() GS:88000cc0() knlGS:
  CS:  0010 DS:  ES:  CR0: 80050033
  CR2: 20101000 CR3: 09616000 CR4: 06f0
  Call Trace:
   prepare_to_wait+0x26/0xc0
   sock_alloc_send_pskb+0x14e/0x270
   ? remove_wait_queue+0x60/0x60
   tun_get_user+0x2cc/0x19d0
   ? __tun_get+0x60/0x1b0
   tun_chr_write_iter+0x57/0x86
   __vfs_write+0x156/0x1e0
   vfs_write+0xf7/0x230
   SyS_write+0x57/0xd0
   entry_SYSCALL_64_fastpath+0x1f/0xbe
  RIP: 0033:0x7f4be4356df9
  RSP: 002b:7ffc18101c08 EFLAGS: 0293 ORIG_RAX: 0001
  RAX: ffda RBX:  RCX: 7f4be4356df9
  RDX: 0046 RSI: 20101000 RDI: 0005
  RBP: 7ffc18101c40 R08: 0001 R09: 0001
  R10: 0001 R11: 0293 R12: 559c75f64780
  R13: 7ffc18101d30 R14:  R15: 

Fixes: 33dccbb050bb ("tun: Limit amount of queued packets per device")
Fixes: 20d29d7a916a ("net: macvtap driver")
Signed-off-by: Craig Gallek 
---
 drivers/net/tap.c | 2 ++
 drivers/net/tun.c | 4 
 2 files changed, 6 insertions(+)

diff --git a/drivers/net/tap.c b/drivers/net/tap.c
index 1b10fcc6a58d..6c0c84c33e1f 100644
--- a/drivers/net/tap.c
+++ b/drivers/net/tap.c
@@ -1032,6 +1032,8 @@ static long tap_ioctl(struct file *file, unsigned int cmd,
case TUNSETSNDBUF:
if (get_user(s, sp))
return -EFAULT;
+   if (s <= 0)
+   return -EINVAL;
 
q->sk.sk_sndbuf = s;
return 0;
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 5550f56cb895..42bb820a56c9 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -2429,6 +2429,10 @@ static long __tun_chr_ioctl(struct file *file, unsigned 
int cmd,
ret = -EFAULT;
break;
}
+   if (sndbuf <= 0) {
+   ret = -EINVAL;
+   break;
+   }
 
tun->sndbuf = sndbuf;
tun_set_sndbuf(tun);
-- 
2.15.0.rc2.357.g7e34df9404-goog



[PATCH] net: caif: chnl_net: remove unnecessary null check in chnl_recv_cb

2017-10-30 Thread Gustavo A. R. Silva
container_of is never null, so this null check is unnecessary.

This code was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva 
---
 net/caif/chnl_net.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/net/caif/chnl_net.c b/net/caif/chnl_net.c
index 922ac1d..489298d 100644
--- a/net/caif/chnl_net.c
+++ b/net/caif/chnl_net.c
@@ -77,8 +77,6 @@ static int chnl_recv_cb(struct cflayer *layr, struct cfpkt 
*pkt)
u8 buf;
 
priv = container_of(layr, struct chnl_net, chnl);
-   if (!priv)
-   return -EINVAL;
 
skb = (struct sk_buff *) cfpkt_tonative(pkt);
 
-- 
2.7.4



Re: [Patch net 00/16] net_sched: fix races with RCU callbacks

2017-10-30 Thread Lucas Bates
e.On Thu, Oct 26, 2017 at 9:24 PM, Cong Wang  wrote:
> Recently, the RCU callbacks used in TC filters and TC actions keep
> drawing my attention, they introduce at least 4 race condition bugs:

> As suggested by Paul, we could defer the work to a workqueue and
> gain the permission of holding RTNL again without any performance
> impact, however, in tcf_block_put() we could have a deadlock when
> flushing workqueue while hodling RTNL lock, the trick here is to
> defer the work itself in workqueue and make it queued after all
> other works so that we keep the same ordering to avoid any
> use-after-free. Please see the first patch for details.

Cong, I don't believe the problem's been resolved just yet  I have
a new kernel, compiled just today and I'm still tripping over a kernel
bug in this scenario when I run Chris' new test case.

I'm doing this on a machine where I don't have a spare device to use
on the run. Instead I created a veth pair that will have one end
migrated into the container.

The bug isn't consistent. I'm running into it anywhere between one and
four runs of the d052 test case.

Steps to reproduce:

sudo ip li add foo type veth
sudo ./tdc.py -d foo -c flower
[repeat until kernel bug encountered]


[PATCH] netfilter: ipset: use swap macro instead of _manually_ swapping values

2017-10-30 Thread Gustavo A. R. Silva
Make use of the swap macro and remove unnecessary variables tmp.
This makes the code easier to read and maintain.

This code was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva 
---
 net/netfilter/ipset/ip_set_bitmap_ip.c| 8 ++--
 net/netfilter/ipset/ip_set_bitmap_ipmac.c | 8 ++--
 net/netfilter/ipset/ip_set_bitmap_port.c  | 8 ++--
 3 files changed, 6 insertions(+), 18 deletions(-)

diff --git a/net/netfilter/ipset/ip_set_bitmap_ip.c 
b/net/netfilter/ipset/ip_set_bitmap_ip.c
index d8975a0..488d6d0 100644
--- a/net/netfilter/ipset/ip_set_bitmap_ip.c
+++ b/net/netfilter/ipset/ip_set_bitmap_ip.c
@@ -263,12 +263,8 @@ bitmap_ip_create(struct net *net, struct ip_set *set, 
struct nlattr *tb[],
ret = ip_set_get_hostipaddr4(tb[IPSET_ATTR_IP_TO], _ip);
if (ret)
return ret;
-   if (first_ip > last_ip) {
-   u32 tmp = first_ip;
-
-   first_ip = last_ip;
-   last_ip = tmp;
-   }
+   if (first_ip > last_ip)
+   swap(first_ip, last_ip);
} else if (tb[IPSET_ATTR_CIDR]) {
u8 cidr = nla_get_u8(tb[IPSET_ATTR_CIDR]);
 
diff --git a/net/netfilter/ipset/ip_set_bitmap_ipmac.c 
b/net/netfilter/ipset/ip_set_bitmap_ipmac.c
index 4c279fb..c00b6a2 100644
--- a/net/netfilter/ipset/ip_set_bitmap_ipmac.c
+++ b/net/netfilter/ipset/ip_set_bitmap_ipmac.c
@@ -337,12 +337,8 @@ bitmap_ipmac_create(struct net *net, struct ip_set *set, 
struct nlattr *tb[],
ret = ip_set_get_hostipaddr4(tb[IPSET_ATTR_IP_TO], _ip);
if (ret)
return ret;
-   if (first_ip > last_ip) {
-   u32 tmp = first_ip;
-
-   first_ip = last_ip;
-   last_ip = tmp;
-   }
+   if (first_ip > last_ip)
+   swap(first_ip, last_ip);
} else if (tb[IPSET_ATTR_CIDR]) {
u8 cidr = nla_get_u8(tb[IPSET_ATTR_CIDR]);
 
diff --git a/net/netfilter/ipset/ip_set_bitmap_port.c 
b/net/netfilter/ipset/ip_set_bitmap_port.c
index 7f9bbd7..b561ca8 100644
--- a/net/netfilter/ipset/ip_set_bitmap_port.c
+++ b/net/netfilter/ipset/ip_set_bitmap_port.c
@@ -238,12 +238,8 @@ bitmap_port_create(struct net *net, struct ip_set *set, 
struct nlattr *tb[],
 
first_port = ip_set_get_h16(tb[IPSET_ATTR_PORT]);
last_port = ip_set_get_h16(tb[IPSET_ATTR_PORT_TO]);
-   if (first_port > last_port) {
-   u16 tmp = first_port;
-
-   first_port = last_port;
-   last_port = tmp;
-   }
+   if (first_port > last_port)
+   swap(first_port, last_port);
 
elements = last_port - first_port + 1;
set->dsize = ip_set_elem_len(set, tb, 0, 0);
-- 
2.7.4



[PATCH net-next] net: dsa: b53: Have b53_hdr_setup() enable/disable tagging

2017-10-30 Thread Florian Fainelli
Have b53_hdr_setup() check what kind of tagging protocol is configured
(Broadcom or none) and apply the correct settings in both cases.

Signed-off-by: Florian Fainelli 
---
 drivers/net/dsa/b53/b53_common.c | 16 +---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/drivers/net/dsa/b53/b53_common.c b/drivers/net/dsa/b53/b53_common.c
index fa37f501f10b..a7ca62ba27b7 100644
--- a/drivers/net/dsa/b53/b53_common.c
+++ b/drivers/net/dsa/b53/b53_common.c
@@ -545,6 +545,7 @@ EXPORT_SYMBOL(b53_disable_port);
 
 void b53_brcm_hdr_setup(struct dsa_switch *ds, int port)
 {
+   bool tag_en = !!(ds->ops->get_tag_protocol(ds) == DSA_TAG_PROTO_BRCM);
struct b53_device *dev = ds->priv;
u8 hdr_ctl, val;
u16 reg;
@@ -567,7 +568,10 @@ void b53_brcm_hdr_setup(struct dsa_switch *ds, int port)
 
/* Enable Broadcom tags for IMP port */
b53_read8(dev, B53_MGMT_PAGE, B53_BRCM_HDR, _ctl);
-   hdr_ctl |= val;
+   if (tag_en)
+   hdr_ctl |= val;
+   else
+   hdr_ctl &= ~val;
b53_write8(dev, B53_MGMT_PAGE, B53_BRCM_HDR, hdr_ctl);
 
/* Registers below are only accessible on newer devices */
@@ -578,14 +582,20 @@ void b53_brcm_hdr_setup(struct dsa_switch *ds, int port)
 * allow us to tag outgoing frames
 */
b53_read16(dev, B53_MGMT_PAGE, B53_BRCM_HDR_RX_DIS, );
-   reg &= ~BIT(port);
+   if (tag_en)
+   reg &= ~BIT(port);
+   else
+   reg |= BIT(port);
b53_write16(dev, B53_MGMT_PAGE, B53_BRCM_HDR_RX_DIS, reg);
 
/* Enable transmission of Broadcom tags from the switch (CPU RX) to
 * allow delivering frames to the per-port net_devices
 */
b53_read16(dev, B53_MGMT_PAGE, B53_BRCM_HDR_TX_DIS, );
-   reg &= ~BIT(port);
+   if (tag_en)
+   reg &= ~BIT(port);
+   else
+   reg |= BIT(port);
b53_write16(dev, B53_MGMT_PAGE, B53_BRCM_HDR_TX_DIS, reg);
 }
 EXPORT_SYMBOL(b53_brcm_hdr_setup);
-- 
2.9.3



Re: [PATCH net-next v5 01/10] devlink: Add permanent config parameter get/set operations

2017-10-30 Thread Jakub Kicinski
On Mon, 30 Oct 2017 18:03:01 +0100, Jiri Pirko wrote:
> >+cfgparam_attr = nla_nest_start(msg, DEVLINK_ATTR_PERM_CONFIG);
> >+/* Update restart reqd - if any param needs restart, should be set */
> >+if (need_restart) {  
> 
> You should propagate this out so the caller would fill it to the
> message. This is a global thing, not per-param, shoult not be nested.

Right, I think Jiri has already asked for this but I feel like we
should provide the ability to query running/pending configurations
independently.  I don't see it in this patch set?

I'm not sure what the status of the reconfig trigger patches for mlxsw
is, but we actually may need 3 config sets:
 - current/runtime configurable, 
 - requiring soft reset of the device/driver reinit;
 - requiring hard reset/set on boot.

Secondly, IMHO calling set/get parameters "permanent" is a bit
backwards.  One device may not be able to change max VF counts or MSIX
allocation without full reinit of PCIe blocks, but for others soft
reset is more than enough.  Port splitting is another example.  For 
NICs port splitting at runtime is usually not a priority in HW/FW
development, so some form of reset is generally required, while
switches can split a port at runtime.  IOW we should define parameters
without assigning them to config sets in the ABI itself.  And also we
should make it in a way which will allow existing parameters to be
reused in permanent/sort reset required/runtime modes.

Does that make sense?


Re: [PATCH v2 net-next] tcp: add tracepoint trace_tcp_retransmit_synack()

2017-10-30 Thread Eric Dumazet
On Mon, 2017-10-30 at 21:43 +, Song Liu wrote:
> CCing key audience of the patch.
> 

Please stop doing this for every patch you are sending.

Include all relevant CC to your patch submission, and that's enough.

Thanks.




Re: WARNING in strp_data_ready

2017-10-30 Thread Tom Herbert
On Mon, Oct 30, 2017 at 2:44 PM, John Fastabend
 wrote:
> On 10/24/2017 08:20 AM, syzbot wrote:
>> Hello,
>>
>> syzkaller hit the following crash on 73d3393ada4f70fa3df5639c8d438f2f034c0ecb
>> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/master
>> compiler: gcc (GCC) 7.1.1 20170620
>> .config is attached
>> Raw console output is attached.
>> C reproducer is attached
>> syzkaller reproducer is attached. See https://goo.gl/kgGztJ
>> for information about syzkaller reproducers
>>
>>
>> WARNING: CPU: 0 PID: 2996 at ./include/net/sock.h:1505 sock_owned_by_me 
>> include/net/sock.h:1505 [inline]
>> WARNING: CPU: 0 PID: 2996 at ./include/net/sock.h:1505 sock_owned_by_user 
>> include/net/sock.h:1511 [inline]
>> WARNING: CPU: 0 PID: 2996 at ./include/net/sock.h:1505 
>> strp_data_ready+0x2b7/0x390 net/strparser/strparser.c:404
>> Kernel panic - not syncing: panic_on_warn set ...
>>
>> CPU: 0 PID: 2996 Comm: syzkaller142210 Not tainted 4.14.0-rc5+ #138
>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS 
>> Google 01/01/2011
>> Call Trace:
>>  
>>  __dump_stack lib/dump_stack.c:16 [inline]
>>  dump_stack+0x194/0x257 lib/dump_stack.c:52
>>  panic+0x1e4/0x417 kernel/panic.c:181
>>  __warn+0x1c4/0x1d9 kernel/panic.c:542
>>  report_bug+0x211/0x2d0 lib/bug.c:183
>>  fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:178
>>  do_trap_no_signal arch/x86/kernel/traps.c:212 [inline]
>>  do_trap+0x260/0x390 arch/x86/kernel/traps.c:261
>>  do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:298
>>  do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:311
>>  invalid_op+0x18/0x20 arch/x86/entry/entry_64.S:905
>> RIP: 0010:sock_owned_by_me include/net/sock.h:1505 [inline]
>> RIP: 0010:sock_owned_by_user include/net/sock.h:1511 [inline]
>> RIP: 0010:strp_data_ready+0x2b7/0x390 net/strparser/strparser.c:404
>> RSP: 0018:8801db206b18 EFLAGS: 00010206
>> RAX: 8801d1e02080 RBX: 8801dad74c48 RCX: 
>> RDX: 0100 RSI: 8801d29fa0a0 RDI: 85cbede0
>> RBP: 8801db206b38 R08: 0005 R09: 10ce0bcd
>> R10: 8801db206a00 R11: dc00 R12: 8801d29fa000
>> R13: 8801dad74c50 R14: 8801d4350a92 R15: 0001
>>  psock_data_ready+0x56/0x70 net/kcm/kcmsock.c:353
>
> Looks like KCM is calling sk_data_ready() without first taking the
> sock lock.
>
> /* Called with lower sock held */
> static void kcm_rcv_strparser(struct strparser *strp, struct sk_buff *skb)
> {
>  [...]
> if (kcm_queue_rcv_skb(>sk, skb)) {
>
> In this case kcm->sk is not the same lock the comment is referring to.
> And kcm_queue_rcv_skb() will eventually call sk_data_ready().
>
> @Tom, how about wrapping the sk_data_ready call in {lock|release}_sock?
> I don't have anything better in mind immediately.
>
The sock locks are taken in reverse order in the send path so so
grabbing kcm sock lock with lower lock held to call sk_data_ready may
lead to deadlock like I think.

It might be possible to change the order in the send path to do this.
Something like:

trylock on lower socket lock
-if trylock fails
  - release kcm sock lock
  - lock lower sock
  - lock kcm sock
- call sendpage locked function

I admit that dealing with two levels of socket locks in the data path
is quite a pain :-)

Tom

> Thanks,
> John


[PATCH net] tc-testing: fix arg to ip command: -s -> -n

2017-10-30 Thread Brenda J. Butler
Fixes: 31c2611b66e0 ("selftests: Introduce a new test case to tc testsuite")
Fixes: 76b903ee198d ("selftests: Introduce tc testsuite")
Signed-off-by: Brenda J. Butler 
---

We want to run the command inside the container, not in the host.

 tools/testing/selftests/tc-testing/tdc.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/tc-testing/tdc.py 
b/tools/testing/selftests/tc-testing/tdc.py
index 5f11f5d7456e..a8981c5d0aaf 100755
--- a/tools/testing/selftests/tc-testing/tdc.py
+++ b/tools/testing/selftests/tc-testing/tdc.py
@@ -152,11 +152,11 @@ def ns_create():
 exec_cmd(cmd, False)
 cmd = 'ip link set $DEV0 up'
 exec_cmd(cmd, False)
-cmd = 'ip -s $NS link set $DEV1 up'
+cmd = 'ip -n $NS link set $DEV1 up'
 exec_cmd(cmd, False)
 cmd = 'ip link set $DEV2 netns $NS'
 exec_cmd(cmd, False)
-cmd = 'ip -s $NS link set $DEV2 up'
+cmd = 'ip -n $NS link set $DEV2 up'
 exec_cmd(cmd, False)
 
 
-- 
2.15.0.rc0



Re: [lkp-robot] [bpf] 76cdd39f41: WARNING:trinity-c0_still_has_locks_held

2017-10-30 Thread Yonghong Song


I am not able to reproduce the bug with latest net-next.

Thanks Daniel to point it out, looks like your test uses an old-version
unapplied patchset:
https://github.com/0day-ci/linux/commit/76cdd39f4117a6cbd520b5d09993ac87acbdcfd8
which yes, there is a bug to leak the mutext lock and the bug
is fixed in subsequent patch set.



FYI, we noticed the following commit (built with gcc-4.8):

commit: 76cdd39f4117a6cbd520b5d09993ac87acbdcfd8 ("bpf: permit multiple bpf 
attachments for a single perf event")
url: 
https://github.com/0day-ci/linux/commits/Yonghong-Song/bpf-permit-multiple-bpf-attachments-for-a-single-perf-tracepoint-event/20171024-080608


in testcase: boot

on test machine: qemu-system-x86_64 -enable-kvm -smp 2 -m 512M

caused below changes (please refer to attached dmesg/kmsg for entire 
log/backtrace):


++++
|| 052bd6a4fb | 76cdd39f41 |
++++
| boot_successes | 64 | 66 |
++++



[   50.247341] WARNING: trinity-c0/3371 still has locks held!
[main] Random reseed: 4199236289
[   50.258948] 4.14.0-rc5-01671-g76cdd39 #1 Not tainted
[   50.260888] 
[child0:3373] io_getevents (208) returned ENOSYS, marking as inactive.
[child0:3373] sysfs (139) returned ENOSYS, marking as inactive.
[   50.273066] 1 lock held by trinity-c0/3371:
[   50.274247]  #0:  (bpf_event_mutex){}, at: [] 
perf_event_detach_bpf_prog+0x17/0xd0
[child0:3373] fanotify_mark (301) returned ENOSYS, marking as inactive.
*** glibc detected *** /trinity: double free or corruption (out): 
0x01d51000 ***
[   50.288585]
[   50.288585] stack backtrace:
[   50.290123] CPU: 1 PID: 3371 Comm: trinity-c0 Not tainted 
4.14.0-rc5-01671-g76cdd39 #1
[   50.292804] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.10.2-1 04/01/2014
[   50.295763] Call Trace:
[   50.296791]  dump_stack+0xcb/0x13f
[   50.298244]  do_exit+0xf1f/0x1380
[   50.299525]  ? syscall_trace_enter+0x3ce/0x490
[   50.301186]  do_group_exit+0x9d/0x120
[   50.302531]  SyS_exit_group+0xb/0x10
[   50.303746]  do_syscall_64+0xa6/0x240
[   50.305256]  entry_SYSCALL64_slow_path+0x25/0x25
[   50.306847] RIP: 0033:0x7f60b8dae408
[   50.307944] RSP: 002b:7fff020656e8 EFLAGS: 0206 ORIG_RAX: 
00e7
[   50.310457] RAX: ffda RBX:  RCX: 7f60b8dae408
[   50.312482] RDX:  RSI: 003c RDI: 
[   50.314526] RBP: 7fff020665a0 R08: 00e7 R09: ffa0
[   50.316981] R10: 7fff02065480 R11: 0206 R12: 0299
[   50.319413] R13: 0059 R14: 7fff02065d10 R15: 0002
=== Backtrace: =

To reproduce:

 git clone https://github.com/intel/lkp-tests.git
 cd lkp-tests
 bin/lkp qemu -k  job-script  # job-script is attached in this 
email



Thanks,
Xiaolong



Re: [PATCH net-next] net: filter: remove unused variable and fix warning

2017-10-30 Thread Alexei Starovoitov
On Mon, Oct 30, 2017 at 01:46:47PM -0700, Jakub Kicinski wrote:
> bpf_getsockopt bpf call sets the ret variable to zero and
> never changes it.  What's worse in case CONFIG_INET is
> not selected the variable is completely unused generating
> a warning.
> 
> Signed-off-by: Jakub Kicinski 
> Reviewed-by: Quentin Monnet 

Acked-by: Alexei Starovoitov 



Re: [PATCH net-next] net: filter: remove unused variable and fix warning

2017-10-30 Thread Daniel Borkmann

On 10/30/2017 09:46 PM, Jakub Kicinski wrote:

bpf_getsockopt bpf call sets the ret variable to zero and
never changes it.  What's worse in case CONFIG_INET is
not selected the variable is completely unused generating
a warning.

Signed-off-by: Jakub Kicinski 
Reviewed-by: Quentin Monnet 


Acked-by: Daniel Borkmann 


Re: [PATCH] netfilter: ipset: ip_set_bitmap_ipmac: use swap macro in bitmap_ipmac_create

2017-10-30 Thread Gustavo A. R. Silva

Hi Jozsef,

Quoting Jozsef Kadlecsik :


Hi,

On Sat, 28 Oct 2017, Gustavo A. R. Silva wrote:


Make use of the swap macro and remove unnecessary variable tmp.
This makes the code easier to read and maintain.

This code was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva 


Please resubmit the tree patches as a single one, they do the same thing.
Thanks!



OK. I'll do that.

Thanks
--
Gustavo A. R. Silva






[PATCH net-next] bpf: reduce verifier memory consumption

2017-10-30 Thread Alexei Starovoitov
the verifier got progressively smarter over time and size of its internal
state grew as well. Time to reduce the memory consumption.

Before:
sizeof(struct bpf_verifier_state) = 6520
After:
sizeof(struct bpf_verifier_state) = 896

It's done by observing that majority of BPF programs use little to
no stack whereas verifier kept all of 512 stack slots ready always.
Instead dynamically reallocate struct verifier state when stack
access is detected.
Besides memory savings such approach gives few % runtime perf
improvement as well and small reduction in number of processed insns:
 before  after
bpf_lb-DLB_L3.o2285   2043
bpf_lb-DLB_L4.o3723   3570
bpf_lb-DUNKNOWN.o  1110   1109
bpf_lxc-DDROP_ALL.o   27954  27849
bpf_lxc-DUNKNOWN.o38954  38724
bpf_netdev.o  16943  16131
bpf_overlay.o  7929   7733

Signed-off-by: Alexei Starovoitov 
Acked-by: Daniel Borkmann 
---
 drivers/net/ethernet/netronome/nfp/bpf/verifier.c |   8 +-
 include/linux/bpf_verifier.h  |  18 +-
 kernel/bpf/verifier.c | 390 ++
 3 files changed, 268 insertions(+), 148 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/bpf/verifier.c 
b/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
index 3d3dcac1c942..a8c7615546a9 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
@@ -76,9 +76,9 @@ nfp_bpf_goto_meta(struct nfp_prog *nfp_prog, struct 
nfp_insn_meta *meta,
 
 static int
 nfp_bpf_check_exit(struct nfp_prog *nfp_prog,
-  const struct bpf_verifier_env *env)
+  struct bpf_verifier_env *env)
 {
-   const struct bpf_reg_state *reg0 = >cur_state.regs[0];
+   const struct bpf_reg_state *reg0 = cur_regs(env) + BPF_REG_0;
u64 imm;
 
if (nfp_prog->act == NN_ACT_XDP)
@@ -144,9 +144,9 @@ nfp_bpf_check_stack_access(struct nfp_prog *nfp_prog,
 
 static int
 nfp_bpf_check_ptr(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta,
- const struct bpf_verifier_env *env, u8 reg_no)
+ struct bpf_verifier_env *env, u8 reg_no)
 {
-   const struct bpf_reg_state *reg = >cur_state.regs[reg_no];
+   const struct bpf_reg_state *reg = cur_regs(env) + reg_no;
int err;
 
if (reg->type != PTR_TO_CTX &&
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index feeaea93d959..c1430701742c 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -88,20 +88,25 @@ enum bpf_stack_slot_type {
 
 #define BPF_REG_SIZE 8 /* size of eBPF register in bytes */
 
+struct bpf_verifier_stack {
+   struct bpf_reg_state spilled_ptr;
+   u8 slot_type[BPF_REG_SIZE];
+};
+
 /* state of the program:
  * type of all registers and stack info
  */
 struct bpf_verifier_state {
struct bpf_reg_state regs[MAX_BPF_REG];
-   u8 stack_slot_type[MAX_BPF_STACK];
-   struct bpf_reg_state spilled_regs[MAX_BPF_STACK / BPF_REG_SIZE];
struct bpf_verifier_state *parent;
+   int allocated_stack;
+   struct bpf_verifier_stack stack[0];
 };
 
 /* linked list of verifier states used to prune search */
 struct bpf_verifier_state_list {
-   struct bpf_verifier_state state;
struct bpf_verifier_state_list *next;
+   struct bpf_verifier_state state;
 };
 
 struct bpf_insn_aux_data {
@@ -145,7 +150,7 @@ struct bpf_verifier_env {
struct bpf_verifier_stack_elem *head; /* stack of verifier states to be 
processed */
int stack_size; /* number of states to be processed */
bool strict_alignment;  /* perform strict pointer alignment 
checks */
-   struct bpf_verifier_state cur_state; /* current verifier state */
+   struct bpf_verifier_state *cur_state; /* current verifier state */
struct bpf_verifier_state_list **explored_states; /* search pruning 
optimization */
const struct bpf_ext_analyzer_ops *analyzer_ops; /* external analyzer 
ops */
void *analyzer_priv; /* pointer to external analyzer's private data */
@@ -159,6 +164,11 @@ struct bpf_verifier_env {
struct bpf_verifer_log log;
 };
 
+static inline struct bpf_reg_state *cur_regs(struct bpf_verifier_env *env)
+{
+   return env->cur_state->regs;
+}
+
 int bpf_analyzer(struct bpf_prog *prog, const struct bpf_ext_analyzer_ops *ops,
 void *priv);
 
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index d906775e12c1..a9fbcf9812d0 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -145,10 +145,10 @@ struct bpf_verifier_stack_elem {
 * before processing instruction 'insn_idx'
 * and after processing instruction 'prev_insn_idx'
 */
-   struct bpf_verifier_state st;
int insn_idx;
int prev_insn_idx;
struct bpf_verifier_stack_elem *next;
+   struct 

Re: WARNING in strp_data_ready

2017-10-30 Thread John Fastabend
On 10/24/2017 08:20 AM, syzbot wrote:
> Hello,
> 
> syzkaller hit the following crash on 73d3393ada4f70fa3df5639c8d438f2f034c0ecb
> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/master
> compiler: gcc (GCC) 7.1.1 20170620
> .config is attached
> Raw console output is attached.
> C reproducer is attached
> syzkaller reproducer is attached. See https://goo.gl/kgGztJ
> for information about syzkaller reproducers
> 
> 
> WARNING: CPU: 0 PID: 2996 at ./include/net/sock.h:1505 sock_owned_by_me 
> include/net/sock.h:1505 [inline]
> WARNING: CPU: 0 PID: 2996 at ./include/net/sock.h:1505 sock_owned_by_user 
> include/net/sock.h:1511 [inline]
> WARNING: CPU: 0 PID: 2996 at ./include/net/sock.h:1505 
> strp_data_ready+0x2b7/0x390 net/strparser/strparser.c:404
> Kernel panic - not syncing: panic_on_warn set ...
> 
> CPU: 0 PID: 2996 Comm: syzkaller142210 Not tainted 4.14.0-rc5+ #138
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS 
> Google 01/01/2011
> Call Trace:
>  
>  __dump_stack lib/dump_stack.c:16 [inline]
>  dump_stack+0x194/0x257 lib/dump_stack.c:52
>  panic+0x1e4/0x417 kernel/panic.c:181
>  __warn+0x1c4/0x1d9 kernel/panic.c:542
>  report_bug+0x211/0x2d0 lib/bug.c:183
>  fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:178
>  do_trap_no_signal arch/x86/kernel/traps.c:212 [inline]
>  do_trap+0x260/0x390 arch/x86/kernel/traps.c:261
>  do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:298
>  do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:311
>  invalid_op+0x18/0x20 arch/x86/entry/entry_64.S:905
> RIP: 0010:sock_owned_by_me include/net/sock.h:1505 [inline]
> RIP: 0010:sock_owned_by_user include/net/sock.h:1511 [inline]
> RIP: 0010:strp_data_ready+0x2b7/0x390 net/strparser/strparser.c:404
> RSP: 0018:8801db206b18 EFLAGS: 00010206
> RAX: 8801d1e02080 RBX: 8801dad74c48 RCX: 
> RDX: 0100 RSI: 8801d29fa0a0 RDI: 85cbede0
> RBP: 8801db206b38 R08: 0005 R09: 10ce0bcd
> R10: 8801db206a00 R11: dc00 R12: 8801d29fa000
> R13: 8801dad74c50 R14: 8801d4350a92 R15: 0001
>  psock_data_ready+0x56/0x70 net/kcm/kcmsock.c:353

Looks like KCM is calling sk_data_ready() without first taking the
sock lock.

/* Called with lower sock held */
static void kcm_rcv_strparser(struct strparser *strp, struct sk_buff *skb)
{
 [...]
if (kcm_queue_rcv_skb(>sk, skb)) {

In this case kcm->sk is not the same lock the comment is referring to.
And kcm_queue_rcv_skb() will eventually call sk_data_ready().

@Tom, how about wrapping the sk_data_ready call in {lock|release}_sock?
I don't have anything better in mind immediately.

Thanks,
John


Re: [PATCH v2 net-next] tcp: add tracepoint trace_tcp_retransmit_synack()

2017-10-30 Thread Song Liu
CCing key audience of the patch.

Thanks,
Song

> On Oct 30, 2017, at 2:41 PM, Song Liu  wrote:
> 
> This tracepoint can be used to trace synack retransmits. It maintains
> pointer to struct request_sock.
> 
> We cannot simply reuse trace_tcp_retransmit_skb() here, because the
> sk here is the LISTEN socket. The IP addresses and ports should be
> extracted from struct request_sock.
> 
> Note that, like many other tracepoints, this patch uses IS_ENABLED
> in TP_fast_assign macro, which triggers sparse warning like:
> 
> ./include/trace/events/tcp.h:274:1: error: directive in argument list
> ./include/trace/events/tcp.h:281:1: error: directive in argument list
> 
> However, there is no good solution to avoid these warnings. To the
> best of our knowledge, these warnings are harmless.
> 
> Signed-off-by: Song Liu 
> Acked-by: Alexei Starovoitov 
> Acked-by: Martin KaFai Lau 
> ---
> include/trace/events/tcp.h | 56 ++
> net/ipv4/tcp_output.c  |  1 +
> 2 files changed, 57 insertions(+)
> 
> diff --git a/include/trace/events/tcp.h b/include/trace/events/tcp.h
> index 03699ba..07a 100644
> --- a/include/trace/events/tcp.h
> +++ b/include/trace/events/tcp.h
> @@ -237,6 +237,62 @@ TRACE_EVENT(tcp_set_state,
> show_tcp_state_name(__entry->newstate))
> );
> 
> +TRACE_EVENT(tcp_retransmit_synack,
> +
> + TP_PROTO(const struct sock *sk, const struct request_sock *req),
> +
> + TP_ARGS(sk, req),
> +
> + TP_STRUCT__entry(
> + __field(const void *, skaddr)
> + __field(const void *, req)
> + __field(__u16, sport)
> + __field(__u16, dport)
> + __array(__u8, saddr, 4)
> + __array(__u8, daddr, 4)
> + __array(__u8, saddr_v6, 16)
> + __array(__u8, daddr_v6, 16)
> + ),
> +
> + TP_fast_assign(
> + struct inet_request_sock *ireq = inet_rsk(req);
> + struct in6_addr *pin6;
> + __be32 *p32;
> +
> + __entry->skaddr = sk;
> + __entry->req = req;
> +
> + __entry->sport = ireq->ir_num;
> + __entry->dport = ntohs(ireq->ir_rmt_port);
> +
> + p32 = (__be32 *) __entry->saddr;
> + *p32 = ireq->ir_loc_addr;
> +
> + p32 = (__be32 *) __entry->daddr;
> + *p32 = ireq->ir_rmt_addr;
> +
> +#if IS_ENABLED(CONFIG_IPV6)
> + if (sk->sk_family == AF_INET6) {
> + pin6 = (struct in6_addr *)__entry->saddr_v6;
> + *pin6 = ireq->ir_v6_loc_addr;
> + pin6 = (struct in6_addr *)__entry->daddr_v6;
> + *pin6 = ireq->ir_v6_rmt_addr;
> + } else
> +#endif
> + {
> + pin6 = (struct in6_addr *)__entry->saddr_v6;
> + ipv6_addr_set_v4mapped(ireq->ir_loc_addr, pin6);
> + pin6 = (struct in6_addr *)__entry->daddr_v6;
> + ipv6_addr_set_v4mapped(ireq->ir_rmt_addr, pin6);
> + }
> + ),
> +
> + TP_printk("sport=%hu dport=%hu saddr=%pI4 daddr=%pI4 saddrv6=%pI6c 
> daddrv6=%pI6c",
> +   __entry->sport, __entry->dport,
> +   __entry->saddr, __entry->daddr,
> +   __entry->saddr_v6, __entry->daddr_v6)
> +);
> +
> #endif /* _TRACE_TCP_H */
> 
> /* This part must be outside protection */
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index a69a34f..74d1c4d 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -3781,6 +3781,7 @@ int tcp_rtx_synack(const struct sock *sk, struct 
> request_sock *req)
>   __NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPSYNRETRANS);
>   if (unlikely(tcp_passive_fastopen(sk)))
>   tcp_sk(sk)->total_retrans++;
> + trace_tcp_retransmit_synack(sk, req);
>   }
>   return res;
> }
> -- 
> 2.9.5
> 



[PATCH v2 net-next] tcp: add tracepoint trace_tcp_retransmit_synack()

2017-10-30 Thread Song Liu
This tracepoint can be used to trace synack retransmits. It maintains
pointer to struct request_sock.

We cannot simply reuse trace_tcp_retransmit_skb() here, because the
sk here is the LISTEN socket. The IP addresses and ports should be
extracted from struct request_sock.

Note that, like many other tracepoints, this patch uses IS_ENABLED
in TP_fast_assign macro, which triggers sparse warning like:

./include/trace/events/tcp.h:274:1: error: directive in argument list
./include/trace/events/tcp.h:281:1: error: directive in argument list

However, there is no good solution to avoid these warnings. To the
best of our knowledge, these warnings are harmless.

Signed-off-by: Song Liu 
Acked-by: Alexei Starovoitov 
Acked-by: Martin KaFai Lau 
---
 include/trace/events/tcp.h | 56 ++
 net/ipv4/tcp_output.c  |  1 +
 2 files changed, 57 insertions(+)

diff --git a/include/trace/events/tcp.h b/include/trace/events/tcp.h
index 03699ba..07a 100644
--- a/include/trace/events/tcp.h
+++ b/include/trace/events/tcp.h
@@ -237,6 +237,62 @@ TRACE_EVENT(tcp_set_state,
  show_tcp_state_name(__entry->newstate))
 );
 
+TRACE_EVENT(tcp_retransmit_synack,
+
+   TP_PROTO(const struct sock *sk, const struct request_sock *req),
+
+   TP_ARGS(sk, req),
+
+   TP_STRUCT__entry(
+   __field(const void *, skaddr)
+   __field(const void *, req)
+   __field(__u16, sport)
+   __field(__u16, dport)
+   __array(__u8, saddr, 4)
+   __array(__u8, daddr, 4)
+   __array(__u8, saddr_v6, 16)
+   __array(__u8, daddr_v6, 16)
+   ),
+
+   TP_fast_assign(
+   struct inet_request_sock *ireq = inet_rsk(req);
+   struct in6_addr *pin6;
+   __be32 *p32;
+
+   __entry->skaddr = sk;
+   __entry->req = req;
+
+   __entry->sport = ireq->ir_num;
+   __entry->dport = ntohs(ireq->ir_rmt_port);
+
+   p32 = (__be32 *) __entry->saddr;
+   *p32 = ireq->ir_loc_addr;
+
+   p32 = (__be32 *) __entry->daddr;
+   *p32 = ireq->ir_rmt_addr;
+
+#if IS_ENABLED(CONFIG_IPV6)
+   if (sk->sk_family == AF_INET6) {
+   pin6 = (struct in6_addr *)__entry->saddr_v6;
+   *pin6 = ireq->ir_v6_loc_addr;
+   pin6 = (struct in6_addr *)__entry->daddr_v6;
+   *pin6 = ireq->ir_v6_rmt_addr;
+   } else
+#endif
+   {
+   pin6 = (struct in6_addr *)__entry->saddr_v6;
+   ipv6_addr_set_v4mapped(ireq->ir_loc_addr, pin6);
+   pin6 = (struct in6_addr *)__entry->daddr_v6;
+   ipv6_addr_set_v4mapped(ireq->ir_rmt_addr, pin6);
+   }
+   ),
+
+   TP_printk("sport=%hu dport=%hu saddr=%pI4 daddr=%pI4 saddrv6=%pI6c 
daddrv6=%pI6c",
+ __entry->sport, __entry->dport,
+ __entry->saddr, __entry->daddr,
+ __entry->saddr_v6, __entry->daddr_v6)
+);
+
 #endif /* _TRACE_TCP_H */
 
 /* This part must be outside protection */
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index a69a34f..74d1c4d 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -3781,6 +3781,7 @@ int tcp_rtx_synack(const struct sock *sk, struct 
request_sock *req)
__NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPSYNRETRANS);
if (unlikely(tcp_passive_fastopen(sk)))
tcp_sk(sk)->total_retrans++;
+   trace_tcp_retransmit_synack(sk, req);
}
return res;
 }
-- 
2.9.5



[PATCH v2 net-next] tcp: add tracepoint trace_tcp_retransmit_synack

2017-10-30 Thread Song Liu
Change from v1: Updated commit message to highlight potential sparse warning.

Song Liu (1):
  tcp: add tracepoint trace_tcp_retransmit_synack()

 include/trace/events/tcp.h | 56 ++
 net/ipv4/tcp_output.c  |  1 +
 2 files changed, 57 insertions(+)

--
2.9.5


[PATCH 0/2] Add the ability to do BPF directed error injection

2017-10-30 Thread Josef Bacik
A lot of our error paths are not well tested because we have no good way of
injecting errors generically.  Some subystems (block, memory) have ways to
inject errors, but they are random so it's hard to get reproduceable results.

With BPF we can add determinism to our error injection.  We can use kprobes and
other things to verify we are injecting errors at the exact case we are trying
to test.  This patch gives us the tool to actual do the error injection part.
It is very simple, we just set the return value of the pt_regs we're given to
whatever we provide, and then override the PC with a dummy function that simply
returns.

Right now this only works on x86, but it would be simple enough to expand to
other architectures.  Thanks,

Josef


[PATCH 2/2] samples/bpf: add a test for bpf_override_return

2017-10-30 Thread Josef Bacik
From: Josef Bacik 

This adds a basic test for bpf_override_return to verify it works.  We
override the main function for mounting a btrfs fs so it'll return
-ENOMEM and then make sure that trying to mount a btrfs fs will fail.

Signed-off-by: Josef Bacik 
---
 samples/bpf/Makefile  |  4 
 samples/bpf/test_override_return.sh   | 15 +++
 samples/bpf/tracex7_kern.c| 15 +++
 samples/bpf/tracex7_user.c| 28 
 tools/include/uapi/linux/bpf.h|  7 ++-
 tools/testing/selftests/bpf/bpf_helpers.h |  3 ++-
 6 files changed, 70 insertions(+), 2 deletions(-)
 create mode 100755 samples/bpf/test_override_return.sh
 create mode 100644 samples/bpf/tracex7_kern.c
 create mode 100644 samples/bpf/tracex7_user.c

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index ea2b9e6135f3..83d06bc1f710 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -14,6 +14,7 @@ hostprogs-y += tracex3
 hostprogs-y += tracex4
 hostprogs-y += tracex5
 hostprogs-y += tracex6
+hostprogs-y += tracex7
 hostprogs-y += test_probe_write_user
 hostprogs-y += trace_output
 hostprogs-y += lathist
@@ -58,6 +59,7 @@ tracex3-objs := bpf_load.o $(LIBBPF) tracex3_user.o
 tracex4-objs := bpf_load.o $(LIBBPF) tracex4_user.o
 tracex5-objs := bpf_load.o $(LIBBPF) tracex5_user.o
 tracex6-objs := bpf_load.o $(LIBBPF) tracex6_user.o
+tracex7-objs := bpf_load.o $(LIBBPF) tracex7_user.o
 load_sock_ops-objs := bpf_load.o $(LIBBPF) load_sock_ops.o
 test_probe_write_user-objs := bpf_load.o $(LIBBPF) test_probe_write_user_user.o
 trace_output-objs := bpf_load.o $(LIBBPF) trace_output_user.o
@@ -100,6 +102,7 @@ always += tracex3_kern.o
 always += tracex4_kern.o
 always += tracex5_kern.o
 always += tracex6_kern.o
+always += tracex7_kern.o
 always += sock_flags_kern.o
 always += test_probe_write_user_kern.o
 always += trace_output_kern.o
@@ -153,6 +156,7 @@ HOSTLOADLIBES_tracex3 += -lelf
 HOSTLOADLIBES_tracex4 += -lelf -lrt
 HOSTLOADLIBES_tracex5 += -lelf
 HOSTLOADLIBES_tracex6 += -lelf
+HOSTLOADLIBES_tracex7 += -lelf
 HOSTLOADLIBES_test_cgrp2_sock2 += -lelf
 HOSTLOADLIBES_load_sock_ops += -lelf
 HOSTLOADLIBES_test_probe_write_user += -lelf
diff --git a/samples/bpf/test_override_return.sh 
b/samples/bpf/test_override_return.sh
new file mode 100755
index ..e68b9ee6814b
--- /dev/null
+++ b/samples/bpf/test_override_return.sh
@@ -0,0 +1,15 @@
+#!/bin/bash
+
+rm -f testfile.img
+dd if=/dev/zero of=testfile.img bs=1M seek=1000 count=1
+DEVICE=$(losetup --show -f testfile.img)
+mkfs.btrfs -f $DEVICE
+mkdir tmpmnt
+./tracex7 $DEVICE
+if [ $? -eq 0 ]
+then
+   echo "SUCCESS!"
+else
+   echo "FAILED!"
+fi
+losetup -d $DEVICE
diff --git a/samples/bpf/tracex7_kern.c b/samples/bpf/tracex7_kern.c
new file mode 100644
index ..a2f74f736e66
--- /dev/null
+++ b/samples/bpf/tracex7_kern.c
@@ -0,0 +1,15 @@
+#include 
+#include 
+#include 
+#include "bpf_helpers.h"
+
+SEC("kprobe/open_ctree")
+int bpf_prog1(struct pt_regs *ctx)
+{
+   unsigned long rc = -12;
+   bpf_override_return(ctx, rc);
+   return 0;
+}
+
+char _license[] SEC("license") = "GPL";
+u32 _version SEC("version") = LINUX_VERSION_CODE;
diff --git a/samples/bpf/tracex7_user.c b/samples/bpf/tracex7_user.c
new file mode 100644
index ..8a52ac492e8b
--- /dev/null
+++ b/samples/bpf/tracex7_user.c
@@ -0,0 +1,28 @@
+#define _GNU_SOURCE
+
+#include 
+#include 
+#include 
+#include "libbpf.h"
+#include "bpf_load.h"
+
+int main(int argc, char **argv)
+{
+   FILE *f;
+   char filename[256];
+   char command[256];
+   int ret;
+
+   snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);
+
+   if (load_bpf_file(filename)) {
+   printf("%s", bpf_log_buf);
+   return 1;
+   }
+
+   snprintf(command, 256, "mount %s tmpmnt/", argv[1]);
+   f = popen(command, "r");
+   ret = pclose(f);
+
+   return ret ? 0 : 1;
+}
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 4a4b6e78c977..3756dde69834 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -673,6 +673,10 @@ union bpf_attr {
  * @buf: buf to fill
  * @buf_size: size of the buf
  * Return : 0 on success or negative error code
+ *
+ * int bpf_override_return(pt_regs, rc)
+ * @pt_regs: pointer to struct pt_regs
+ * @rc: the return value to set
  */
 #define __BPF_FUNC_MAPPER(FN)  \
FN(unspec), \
@@ -732,7 +736,8 @@ union bpf_attr {
FN(xdp_adjust_meta),\
FN(perf_event_read_value),  \
FN(perf_prog_read_value),   \
-   FN(getsockopt),
+   FN(getsockopt), \
+   FN(override_return),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
diff --git 

[PATCH 1/2] bpf: add a bpf_override_function helper

2017-10-30 Thread Josef Bacik
From: Josef Bacik 

Error injection is sloppy and very ad-hoc.  BPF could fill this niche
perfectly with it's kprobe functionality.  We could make sure errors are
only triggered in specific call chains that we care about with very
specific situations.  Accomplish this with the bpf_override_funciton
helper.  This will modify the probe'd callers return value to the
specified value and set the PC to an override function that simply
returns, bypassing the originally probed function.  This gives us a nice
clean way to implement systematic error injection for all of our code
paths.

Signed-off-by: Josef Bacik 
---
 arch/Kconfig |  3 +++
 arch/x86/Kconfig |  1 +
 arch/x86/include/asm/kprobes.h   |  4 
 arch/x86/include/asm/ptrace.h|  5 +
 arch/x86/kernel/kprobes/ftrace.c | 14 
 include/uapi/linux/bpf.h |  7 +-
 kernel/trace/Kconfig | 11 ++
 kernel/trace/bpf_trace.c | 47 +++-
 kernel/trace/trace.h |  6 +
 kernel/trace/trace_kprobe.c  | 23 ++--
 10 files changed, 108 insertions(+), 13 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index d789a89cb32c..4fb618082259 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -195,6 +195,9 @@ config HAVE_OPTPROBES
 config HAVE_KPROBES_ON_FTRACE
bool
 
+config HAVE_KPROBE_OVERRIDE
+   bool
+
 config HAVE_NMI
bool
 
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 971feac13506..5126d2750dd0 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -152,6 +152,7 @@ config X86
select HAVE_KERNEL_XZ
select HAVE_KPROBES
select HAVE_KPROBES_ON_FTRACE
+   select HAVE_KPROBE_OVERRIDE
select HAVE_KRETPROBES
select HAVE_KVM
select HAVE_LIVEPATCH   if X86_64
diff --git a/arch/x86/include/asm/kprobes.h b/arch/x86/include/asm/kprobes.h
index 6cf65437b5e5..c6c3b1f4306a 100644
--- a/arch/x86/include/asm/kprobes.h
+++ b/arch/x86/include/asm/kprobes.h
@@ -67,6 +67,10 @@ extern const int kretprobe_blacklist_size;
 void arch_remove_kprobe(struct kprobe *p);
 asmlinkage void kretprobe_trampoline(void);
 
+#ifdef CONFIG_KPROBES_ON_FTRACE
+extern void arch_ftrace_kprobe_override_function(struct pt_regs *regs);
+#endif
+
 /* Architecture specific copy of original instruction*/
 struct arch_specific_insn {
/* copy of the original instruction */
diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h
index 91c04c8e67fa..f04e71800c2f 100644
--- a/arch/x86/include/asm/ptrace.h
+++ b/arch/x86/include/asm/ptrace.h
@@ -108,6 +108,11 @@ static inline unsigned long regs_return_value(struct 
pt_regs *regs)
return regs->ax;
 }
 
+static inline void regs_set_return_value(struct pt_regs *regs, unsigned long 
rc)
+{
+   regs->ax = rc;
+}
+
 /*
  * user_mode(regs) determines whether a register set came from user
  * mode.  On x86_32, this is true if V8086 mode was enabled OR if the
diff --git a/arch/x86/kernel/kprobes/ftrace.c b/arch/x86/kernel/kprobes/ftrace.c
index 041f7b6dfa0f..3c455bf490cb 100644
--- a/arch/x86/kernel/kprobes/ftrace.c
+++ b/arch/x86/kernel/kprobes/ftrace.c
@@ -97,3 +97,17 @@ int arch_prepare_kprobe_ftrace(struct kprobe *p)
p->ainsn.boostable = false;
return 0;
 }
+
+asmlinkage void override_func(void);
+asm(
+   ".type override_func, @function\n"
+   "override_func:\n"
+   "   ret\n"
+   ".size override_func, .-override_func\n"
+);
+
+void arch_ftrace_kprobe_override_function(struct pt_regs *regs)
+{
+   regs->ip = (unsigned long)_func;
+}
+NOKPROBE_SYMBOL(arch_ftrace_kprobe_override_function);
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 0b7b54d898bd..1ad5b87a42f6 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -673,6 +673,10 @@ union bpf_attr {
  * @buf: buf to fill
  * @buf_size: size of the buf
  * Return : 0 on success or negative error code
+ *
+ * int bpf_override_return(pt_regs, rc)
+ * @pt_regs: pointer to struct pt_regs
+ * @rc: the return value to set
  */
 #define __BPF_FUNC_MAPPER(FN)  \
FN(unspec), \
@@ -732,7 +736,8 @@ union bpf_attr {
FN(xdp_adjust_meta),\
FN(perf_event_read_value),  \
FN(perf_prog_read_value),   \
-   FN(getsockopt),
+   FN(getsockopt), \
+   FN(override_return),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 434c840e2d82..9dc0deeaad2b 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -518,6 +518,17 @@ config FUNCTION_PROFILER
 
  If in doubt, say N.
 
+config BPF_KPROBE_OVERRIDE
+   bool "Enable BPF programs to override a kprobed function"
+   

Re: [PATCH] ravb: Use common error handling code in ravb_probe()

2017-10-30 Thread Sergei Shtylyov

On 10/30/2017 11:51 PM, Sergei Shtylyov wrote:


From: Markus Elfring 
Date: Sat, 28 Oct 2017 19:10:08 +0200

Add a jump target so that a bit of exception handling can be better reused
at the end of this function.


Have you actually tried to see if there's any change in the resulting 
object code? I've looked at ARM gcc 4.8.5's output and it didn't really change 
-- gcc was able to optimize these repetitive assignments pretty well...



This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring 
---
  drivers/net/ethernet/renesas/ravb_main.c | 32 


  1 file changed, 16 insertions(+), 16 deletions(-)


Diffstat also speaks about the futility of this patch.


   Ah, you've added empty lines where there was none... Please don't do this.
I'll ACK the v2 patch, given my comments are addressed.

[...]

MBR, Sergei


Re: [PATCH net-next 1/1] net sched qdisc: pass netlink message flags in event notification

2017-10-30 Thread Roman Mashak
Cong Wang  writes:

> On Mon, Oct 30, 2017 at 11:07 AM, Roman Mashak  wrote:
>> Cong Wang  writes:
>>
>>> On Sat, Oct 28, 2017 at 8:36 PM, Roman Mashak  wrote:
 Cong Wang  writes:
>>>
>>> Hmm, I thought you use RTM_NEWQDISC+RTM_DELQDISC to
>>> determine it is replacement, no?
>>
>> Create is RTM_NEWQDISC and NLM_F_EXCL|NLM_F_CREATE, replacement is
>> RTM_NEWQDISC and NLM_F_REPLACE in netlink flags.
>
> Is there any reason we can't use RTM_NEWQDISC+RTM_DELQDISC
> rather than NLM_F_REPLACE to determine it is replacement?
>

I'm not sure this would be valid semantics for replace operation, look at
the rfc3549:

Additional flag bits for NEW requests
  NLM_F_REPLACE   Replace existing matching config object with
  this request.

> Note, RTM_NEWQDISC+RTM_DELQDISC are put in a same
> message not two.

Hmm, could you clarify how do you expect to put two event IDs in nlmsg_type?


[PATCH v2 net-next] ipv6: Implement limits on Hop-by-Hop and Destination options

2017-10-30 Thread Tom Herbert
RFC 8200 (IPv6) defines Hop-by-Hop options and Destination options
extension headers. Both of these carry a list of TLVs which is
only limited by the maximum length of the extension header (2048
bytes). By the spec a host must process all the TLVs in these
options, however these could be used as a fairly obvious
denial of service attack. I think this could in fact be
a significant DOS vector on the Internet, one mitigating
factor might be that many FWs drop all packets with EH (and
obviously this is only IPv6) so an Internet wide attack might not
be so effective (yet!).

By my calculation, the worse case packet with TLVs in a standard
1500 byte MTU packet that would be processed by the stack contains
1282 invidual TLVs (including pad TLVS) or 724 two byte TLVs. I
wrote a quick test program that floods a whole bunch of these
packets to a host and sure enough there is substantial time spent
in ip6_parse_tlv. These packets contain nothing but unknown TLVS
(that are ignored), TLV padding, and bogus UDP header with zero
payload length.

  25.38%  [kernel][k] __fib6_clean_all
  21.63%  [kernel][k] ip6_parse_tlv
   4.21%  [kernel][k] __local_bh_enable_ip
   2.18%  [kernel][k] ip6_pol_route.isra.39
   1.98%  [kernel][k] fib6_walk_continue
   1.88%  [kernel][k] _raw_write_lock_bh
   1.65%  [kernel][k] dst_release

This patch adds configurable limits to Destination and Hop-by-Hop
options. There are three limits that may be set:
  - Limit the number of options in a Hop-by-Hop or Destination options
extension header.
  - Limit the byte length of a Hop-by-Hop or Destination options
extension header.
  - Disallow unrecognized options in a Hop-by-Hop or Destination
options extension header.

The limits are set in corresponding sysctls:

  ipv6.sysctl.max_dst_opts_cnt
  ipv6.sysctl.max_hbh_opts_cnt
  ipv6.sysctl.max_dst_opts_len
  ipv6.sysctl.max_hbh_opts_len

If a max_*_opts_cnt is less than zero then unknown TLVs are disallowed.
The number of known TLVs that are allowed is the absolute value of
this number.

If a limit is exceeded when processing an extension header the packet is
dropped.

Default values are set to 8 for options counts, and set to INT_MAX
for maximum length. Note the choice to limit options to 8 is an
arbitrary guess (roughly based on the fact that the stack supports
three HBH options and just one destination option).

These limits have being proposed in draft-ietf-6man-rfc6434-bis.

Tested (by Martin Lau)

I tested out 1 thread (i.e. one raw_udp process).

I changed the net.ipv6.max_dst_(opts|hbh)_number between 8 to 2048.
With sysctls setting to 2048, the softirq% is packed to 100%.
With 8, the softirq% is almost unnoticable from mpstat.

v2;
  - Code and documention cleanup.
  - Change references of RFC2460 to be RFC8200.
  - Add reference to RFC6434-bis where the limits will be in standard.

Signed-off-by: Tom Herbert 
---
 Documentation/networking/ip-sysctl.txt | 24 
 include/net/ipv6.h | 40 
 include/net/netns/ipv6.h   |  4 ++
 net/ipv6/af_inet6.c|  4 ++
 net/ipv6/exthdrs.c | 67 --
 net/ipv6/sysctl_net_ipv6.c | 32 
 6 files changed, 159 insertions(+), 12 deletions(-)

diff --git a/Documentation/networking/ip-sysctl.txt 
b/Documentation/networking/ip-sysctl.txt
index 77f4de59dc9c..e6661b205f72 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -1385,6 +1385,30 @@ mld_qrv - INTEGER
Default: 2 (as specified by RFC3810 9.1)
Minimum: 1 (as specified by RFC6636 4.5)
 
+max_dst_opts_cnt - INTEGER
+   Maximum number of non-padding TLVs allowed in a Destination
+   options extension header. If this value is less than zero
+   then unknown options are disallowed and the number of known
+   TLVs allowed is the absolute value of this number.
+   Default: 8
+
+max_hbh_opts_cnt - INTEGER
+   Maximum number of non-padding TLVs allowed in a Hop-by-Hop
+   options extension header. If this value is less than zero
+   then unknown options are disallowed and the number of known
+   TLVs allowed is the absolute value of this number.
+   Default: 8
+
+max dst_opts_len - INTEGER
+   Maximum length allowed for a Destination options extension
+   header.
+   Default: INT_MAX (unlimited)
+
+max hbh_opts_len - INTEGER
+   Maximum length allowed for a Hop-by-Hop options extension
+   header.
+   Default: INT_MAX (unlimited)
+
 IPv6 Fragmentation:
 
 ip6frag_high_thresh - INTEGER
diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index 3cda3b521c36..fb6d67012de6 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -51,6 +51,46 @@
 #define 

Re: [PATCH net-next] net: filter: remove unused variable and fix warning

2017-10-30 Thread Lawrence Brakmo
Good catch, thanks!

On 10/30/17, 1:47 PM, "Jakub Kicinski"  wrote:

bpf_getsockopt bpf call sets the ret variable to zero and
never changes it.  What's worse in case CONFIG_INET is
not selected the variable is completely unused generating
a warning.

Signed-off-by: Jakub Kicinski 
Reviewed-by: Quentin Monnet 

Acked-by: Lawrence Brakmo 
---
 net/core/filter.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index 721c30889033..a0112168d6f9 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3288,7 +3288,6 @@ BPF_CALL_5(bpf_getsockopt, struct bpf_sock_ops_kern 
*, bpf_sock,
   int, level, int, optname, char *, optval, int, optlen)
 {
struct sock *sk = bpf_sock->sk;
-   int ret = 0;
 
if (!sk_fullsock(sk))
goto err_clear;
@@ -3308,7 +3307,7 @@ BPF_CALL_5(bpf_getsockopt, struct bpf_sock_ops_kern 
*, bpf_sock,
} else {
goto err_clear;
}
-   return ret;
+   return 0;
 #endif
 err_clear:
memset(optval, 0, optlen);
-- 
2.14.1





Re: [PATCH] ravb: Use common error handling code in ravb_probe()

2017-10-30 Thread Sergei Shtylyov

On 10/29/2017 02:00 PM, Geert Uytterhoeven wrote:


From: Markus Elfring 
Date: Sat, 28 Oct 2017 19:10:08 +0200

Add a jump target so that a bit of exception handling can be better reused
at the end of this function.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring 
---
  drivers/net/ethernet/renesas/ravb_main.c | 32 
  1 file changed, 16 insertions(+), 16 deletions(-)

diff --git a/drivers/net/ethernet/renesas/ravb_main.c 
b/drivers/net/ethernet/renesas/ravb_main.c
index a8822a756e08..62dbdf7de6cd 100644
--- a/drivers/net/ethernet/renesas/ravb_main.c
+++ b/drivers/net/ethernet/renesas/ravb_main.c
@@ -2069,10 +2069,9 @@ static int ravb_probe(struct platform_device *pdev)
 irq = platform_get_irq_byname(pdev, "ch22");
 else
 irq = platform_get_irq(pdev, 0);
-   if (irq < 0) {
-   error = irq;
-   goto out_release;
-   }
+   if (irq < 0)
+   goto failure_indication;


IMHO, it's really confusing that "irq" contains the error code, not "error".


   I think it would have been equally confusing if 'error' was assigned to 
'ndev->irq', etc. It's just the duality of the result of these functions that 
makes them confusing...



Especially when jumping to a meaningless label named "failure_indication"
("irq_failure" would be more intuitive).


   Yeah, the label sucks. :-)


So I prefer the original code, regardless of the label name.


   On the 2nd thought, the patch can be fixed up and then merged.


Gr{oetje,eeting}s,

 Geert


MBR, Sergei


Re: [PATCH] net: tipc: Convert timers to use timer_setup()

2017-10-30 Thread Kees Cook
On Mon, Oct 30, 2017 at 2:57 AM, Jon Maloy  wrote:
>
>
>> -Original Message-
>> From: keesc...@google.com [mailto:keesc...@google.com] On Behalf Of
>> Kees Cook
>> Sent: Friday, October 27, 2017 06:58
>> To: Jon Maloy 
>> Cc: David S. Miller ; Ying Xue
>> ; netdev@vger.kernel.org; tipc-
>> discuss...@lists.sourceforge.net; linux-ker...@vger.kernel.org
>> Subject: Re: [PATCH] net: tipc: Convert timers to use timer_setup()
>>
>> On Tue, Oct 24, 2017 at 11:44 AM, Jon Maloy 
>> wrote:
>> > NAK. It doesn't sound like a good idea to send this to net. Especially 
>> > since
>> one of these timers has already been refactored in net-next.
>>
>> Hi! I'm not sure what you mean about the one timer issue. I don't see any
>> use of timer_setup() in net/tipc (and no recent conversions to the older
>> setup_timer() API). What's the preferred path for landing this API conversion
>> in net/tipc/?
>>
>> And, just to note, these changes are almost entirely mechanical. The only
>> "special" case is in tipc_sk_timeout() where the argument needs to be
>> slightly adjusted to fetch the tsk from the sk again.
>
> I am referring to commit 0d5fcebf3c37 ("tipc: refactor tipc_sk_timeout() 
> function") which is in "net-next" but not yet in "net".
> I guess David M is the one who decides whether you deliver to net or 
> net-next, but in the former case you will have to wait until David merges in 
> the above commit, and then add your change again.

Ah, thanks, I see the change now. These patches passed in the night,
it seems. I've sent a v2 now for net-next.

Thanks!

-Kees

-- 
Kees Cook
Pixel Security


[PATCH v2] net: tipc: Convert timers to use timer_setup()

2017-10-30 Thread Kees Cook
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.

Cc: Jon Maloy 
Cc: Ying Xue 
Cc: "David S. Miller" 
Cc: netdev@vger.kernel.org
Cc: tipc-discuss...@lists.sourceforge.net
Signed-off-by: Kees Cook 
---
Rebased on commit 0d5fcebf3c37 ("tipc: refactor tipc_sk_timeout() function").
---
 net/tipc/discover.c |  6 +++---
 net/tipc/monitor.c  |  6 +++---
 net/tipc/node.c |  8 
 net/tipc/socket.c   | 10 +-
 net/tipc/subscr.c   |  6 +++---
 5 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/net/tipc/discover.c b/net/tipc/discover.c
index 02462d67d191..92e4828c6b09 100644
--- a/net/tipc/discover.c
+++ b/net/tipc/discover.c
@@ -224,9 +224,9 @@ void tipc_disc_remove_dest(struct tipc_link_req *req)
  *
  * Called whenever a link setup request timer associated with a bearer expires.
  */
-static void disc_timeout(unsigned long data)
+static void disc_timeout(struct timer_list *t)
 {
-   struct tipc_link_req *req = (struct tipc_link_req *)data;
+   struct tipc_link_req *req = from_timer(req, t, timer);
struct sk_buff *skb;
int max_delay;
 
@@ -292,7 +292,7 @@ int tipc_disc_create(struct net *net, struct tipc_bearer *b,
req->num_nodes = 0;
req->timer_intv = TIPC_LINK_REQ_INIT;
spin_lock_init(>lock);
-   setup_timer(>timer, disc_timeout, (unsigned long)req);
+   timer_setup(>timer, disc_timeout, 0);
mod_timer(>timer, jiffies + req->timer_intv);
b->link_req = req;
*skb = skb_clone(req->buf, GFP_ATOMIC);
diff --git a/net/tipc/monitor.c b/net/tipc/monitor.c
index 9e109bb1a207..b9c32557d73c 100644
--- a/net/tipc/monitor.c
+++ b/net/tipc/monitor.c
@@ -578,9 +578,9 @@ void tipc_mon_get_state(struct net *net, u32 addr,
read_unlock_bh(>lock);
 }
 
-static void mon_timeout(unsigned long m)
+static void mon_timeout(struct timer_list *t)
 {
-   struct tipc_monitor *mon = (void *)m;
+   struct tipc_monitor *mon = from_timer(mon, t, timer);
struct tipc_peer *self;
int best_member_cnt = dom_size(mon->peer_cnt) - 1;
 
@@ -623,7 +623,7 @@ int tipc_mon_create(struct net *net, int bearer_id)
self->is_up = true;
self->is_head = true;
INIT_LIST_HEAD(>list);
-   setup_timer(>timer, mon_timeout, (unsigned long)mon);
+   timer_setup(>timer, mon_timeout, 0);
mon->timer_intv = msecs_to_jiffies(MON_TIMEOUT + (tn->random & 0x));
mod_timer(>timer, jiffies + mon->timer_intv);
return 0;
diff --git a/net/tipc/node.c b/net/tipc/node.c
index 89f8ac73bf65..009a81631280 100644
--- a/net/tipc/node.c
+++ b/net/tipc/node.c
@@ -153,7 +153,7 @@ static void tipc_node_link_down(struct tipc_node *n, int 
bearer_id,
bool delete);
 static void node_lost_contact(struct tipc_node *n, struct sk_buff_head 
*inputq);
 static void tipc_node_delete(struct tipc_node *node);
-static void tipc_node_timeout(unsigned long data);
+static void tipc_node_timeout(struct timer_list *t);
 static void tipc_node_fsm_evt(struct tipc_node *n, int evt);
 static struct tipc_node *tipc_node_find(struct net *net, u32 addr);
 static void tipc_node_put(struct tipc_node *node);
@@ -361,7 +361,7 @@ struct tipc_node *tipc_node_create(struct net *net, u32 
addr, u16 capabilities)
goto exit;
}
tipc_node_get(n);
-   setup_timer(>timer, tipc_node_timeout, (unsigned long)n);
+   timer_setup(>timer, tipc_node_timeout, 0);
n->keepalive_intv = U32_MAX;
hlist_add_head_rcu(>hash, >node_htable[tipc_hashfn(addr)]);
list_for_each_entry_rcu(temp_node, >node_list, list) {
@@ -500,9 +500,9 @@ void tipc_node_remove_conn(struct net *net, u32 dnode, u32 
port)
 
 /* tipc_node_timeout - handle expiration of node timer
  */
-static void tipc_node_timeout(unsigned long data)
+static void tipc_node_timeout(struct timer_list *t)
 {
-   struct tipc_node *n = (struct tipc_node *)data;
+   struct tipc_node *n = from_timer(n, t, timer);
struct tipc_link_entry *le;
struct sk_buff_head xmitq;
int bearer_id;
diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index ea61c32f6b80..5d18c0caa92b 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -125,7 +125,7 @@ static void tipc_sock_destruct(struct sock *sk);
 static int tipc_release(struct socket *sock);
 static int tipc_accept(struct socket *sock, struct socket *new_sock, int flags,
   bool kern);
-static void tipc_sk_timeout(unsigned long data);
+static void tipc_sk_timeout(struct timer_list *t);
 static int tipc_sk_publish(struct tipc_sock *tsk, uint scope,
   struct tipc_name_seq const *seq);
 static int tipc_sk_withdraw(struct tipc_sock *tsk, uint scope,
@@ -464,7 

[PATCH] drivers/net: tundra: Convert timers to use timer_setup()

2017-10-30 Thread Kees Cook
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.

Cc: "David S. Miller" 
Cc: Philippe Reynes 
Cc: "yuval.sh...@oracle.com" 
Cc: Eric Dumazet 
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook 
---
 drivers/net/ethernet/tundra/tsi108_eth.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/tundra/tsi108_eth.c 
b/drivers/net/ethernet/tundra/tsi108_eth.c
index c2d15d9c0c33..0624b71ab5d4 100644
--- a/drivers/net/ethernet/tundra/tsi108_eth.c
+++ b/drivers/net/ethernet/tundra/tsi108_eth.c
@@ -164,7 +164,7 @@ static struct platform_driver tsi_eth_driver = {
},
 };
 
-static void tsi108_timed_checker(unsigned long dev_ptr);
+static void tsi108_timed_checker(struct timer_list *t);
 
 #ifdef DEBUG
 static void dump_eth_one(struct net_device *dev)
@@ -1370,7 +1370,7 @@ static int tsi108_open(struct net_device *dev)
 
napi_enable(>napi);
 
-   setup_timer(>timer, tsi108_timed_checker, (unsigned long)dev);
+   timer_setup(>timer, tsi108_timed_checker, 0);
mod_timer(>timer, jiffies + 1);
 
tsi108_restart_rx(data, dev);
@@ -1666,10 +1666,10 @@ tsi108_init_one(struct platform_device *pdev)
  * Thus, we have to do it using a timer.
  */
 
-static void tsi108_timed_checker(unsigned long dev_ptr)
+static void tsi108_timed_checker(struct timer_list *t)
 {
-   struct net_device *dev = (struct net_device *)dev_ptr;
-   struct tsi108_prv_data *data = netdev_priv(dev);
+   struct tsi108_prv_data *data = from_timer(data, t, timer);
+   struct net_device *dev = data->dev;
 
tsi108_check_phy(dev);
tsi108_check_rxring(dev);
-- 
2.7.4


-- 
Kees Cook
Pixel Security


[PATCH] drivers/net: ntb_netdev: Convert timers to use timer_setup()

2017-10-30 Thread Kees Cook
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.

Cc: Jon Mason 
Cc: Dave Jiang 
Cc: Allen Hubbe 
Cc: linux-...@googlegroups.com
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook 
---
 drivers/net/ntb_netdev.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ntb_netdev.c b/drivers/net/ntb_netdev.c
index 0250aa9ae2cb..9f6f7ccd44f7 100644
--- a/drivers/net/ntb_netdev.c
+++ b/drivers/net/ntb_netdev.c
@@ -230,10 +230,10 @@ static netdev_tx_t ntb_netdev_start_xmit(struct sk_buff 
*skb,
return NETDEV_TX_BUSY;
 }
 
-static void ntb_netdev_tx_timer(unsigned long data)
+static void ntb_netdev_tx_timer(struct timer_list *t)
 {
-   struct net_device *ndev = (struct net_device *)data;
-   struct ntb_netdev *dev = netdev_priv(ndev);
+   struct ntb_netdev *dev = from_timer(dev, t, tx_timer);
+   struct net_device *ndev = dev->ndev;
 
if (ntb_transport_tx_free_entry(dev->qp) < tx_stop) {
mod_timer(>tx_timer, jiffies + msecs_to_jiffies(tx_time));
@@ -269,7 +269,7 @@ static int ntb_netdev_open(struct net_device *ndev)
}
}
 
-   setup_timer(>tx_timer, ntb_netdev_tx_timer, (unsigned long)ndev);
+   timer_setup(>tx_timer, ntb_netdev_tx_timer, 0);
 
netif_carrier_off(ndev);
ntb_transport_link_up(dev->qp);
-- 
2.7.4


-- 
Kees Cook
Pixel Security


Re: [PATCH net-next] net: display hw address of source machine during ipv6 DAD failure

2017-10-30 Thread David Ahern
On 10/30/17 2:50 PM, Vishwanath Pai wrote:
> It does print out the full uncompressed IPv6 address. I modified the
> message manually by hand while copying it out to the commit message in
> order to hide the real IP address of my machines, but did not realize
> that it was different from what the kernel would actually print out.

use 2001:db8::/32 (https://tools.ietf.org/html/rfc3849)


[PATCH] drivers/net: cris: Convert timers to use timer_setup()

2017-10-30 Thread Kees Cook
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.

Cc: "David S. Miller" 
Cc: Kalle Valo 
Cc: Arnd Bergmann 
Cc: Greg Kroah-Hartman 
Cc: "yuval.sh...@oracle.com" 
Cc: Paul Gortmaker 
Cc: Philippe Reynes 
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook 
---
This patch depends on the tip/timers/core DEFINE_TIMER() cleanups, so
I intend to carry this in the timer tree. An Ack or Review would be
appreciated. Thanks!
---
 drivers/net/cris/eth_v10.c | 36 +---
 1 file changed, 17 insertions(+), 19 deletions(-)

diff --git a/drivers/net/cris/eth_v10.c b/drivers/net/cris/eth_v10.c
index 1fcc86fa4e05..7193dfafa577 100644
--- a/drivers/net/cris/eth_v10.c
+++ b/drivers/net/cris/eth_v10.c
@@ -164,9 +164,16 @@ static unsigned int network_rec_config_shadow = 0;
 
 static unsigned int network_tr_ctrl_shadow = 0;
 
+/* Timers */
+static void e100_check_speed(struct timer_list *unused);
+static void e100_clear_network_leds(struct timer_list *unused);
+static void e100_check_duplex(struct timer_list *unused);
+static DEFINE_TIMER(speed_timer, e100_check_speed);
+static DEFINE_TIMER(clear_led_timer, e100_clear_network_leds);
+static DEFINE_TIMER(duplex_timer, e100_check_duplex);
+static struct net_device *timer_dev;
+
 /* Network speed indication. */
-static DEFINE_TIMER(speed_timer, NULL);
-static DEFINE_TIMER(clear_led_timer, NULL);
 static int current_speed; /* Speed read from transceiver */
 static int current_speed_selection; /* Speed selected by user */
 static unsigned long led_next_time;
@@ -174,7 +181,6 @@ static int led_active;
 static int rx_queue_len;
 
 /* Duplex */
-static DEFINE_TIMER(duplex_timer, NULL);
 static int full_duplex;
 static enum duplex current_duplex;
 
@@ -199,9 +205,7 @@ static void update_rx_stats(struct net_device_stats *);
 static void update_tx_stats(struct net_device_stats *);
 static int e100_probe_transceiver(struct net_device* dev);
 
-static void e100_check_speed(unsigned long priv);
 static void e100_set_speed(struct net_device* dev, unsigned long speed);
-static void e100_check_duplex(unsigned long priv);
 static void e100_set_duplex(struct net_device* dev, enum duplex);
 static void e100_negotiate(struct net_device* dev);
 
@@ -213,7 +217,6 @@ static void e100_send_mdio_bit(unsigned char bit);
 static unsigned char e100_receive_mdio_bit(void);
 static void e100_reset_transceiver(struct net_device* net);
 
-static void e100_clear_network_leds(unsigned long dummy);
 static void e100_set_network_leds(int active);
 
 static const struct ethtool_ops e100_ethtool_ops;
@@ -380,17 +383,12 @@ etrax_ethernet_init(void)
current_speed = 10;
current_speed_selection = 0; /* Auto */
speed_timer.expires = jiffies + NET_LINK_UP_CHECK_INTERVAL;
-   speed_timer.data = (unsigned long)dev;
-   speed_timer.function = e100_check_speed;
-
-   clear_led_timer.function = e100_clear_network_leds;
-   clear_led_timer.data = (unsigned long)dev;
 
full_duplex = 0;
current_duplex = autoneg;
duplex_timer.expires = jiffies + NET_DUPLEX_CHECK_INTERVAL;
-duplex_timer.data = (unsigned long)dev;
-   duplex_timer.function = e100_check_duplex;
+
+   timer_dev = dev;
 
 /* Initialize mii interface */
np->mii_if.phy_id_mask = 0x1f;
@@ -679,9 +677,9 @@ intel_check_speed(struct net_device* dev)
 }
 #endif
 static void
-e100_check_speed(unsigned long priv)
+e100_check_speed(struct timer_list *unused)
 {
-   struct net_device* dev = (struct net_device*)priv;
+   struct net_device *dev = timer_dev;
struct net_local *np = netdev_priv(dev);
static int led_initiated = 0;
unsigned long data;
@@ -798,9 +796,9 @@ e100_set_speed(struct net_device* dev, unsigned long speed)
 }
 
 static void
-e100_check_duplex(unsigned long priv)
+e100_check_duplex(struct timer_list *unused)
 {
-   struct net_device *dev = (struct net_device *)priv;
+   struct net_device *dev = timer_dev;
struct net_local *np = netdev_priv(dev);
int old_duplex;
 
@@ -1668,9 +1666,9 @@ e100_hardware_send_packet(struct net_local *np, char 
*buf, int length)
 }
 
 static void
-e100_clear_network_leds(unsigned long dummy)
+e100_clear_network_leds(struct timer_list *unused)
 {
-   struct net_device *dev = (struct net_device *)dummy;
+   struct net_device *dev = timer_dev;
struct net_local *np = netdev_priv(dev);
 
spin_lock(>led_lock);
-- 
2.7.4


-- 
Kees Cook
Pixel Security


[PATCH] ip/ipvlan: enhance ability to add mode flags to existing modes

2017-10-30 Thread Mahesh Bandewar
From: Mahesh Bandewar 

IPvlan supported bridge-only functionality prior to commits
a190d04db937 ('ipvlan: introduce 'private' attribute for all
existing modes.') and fe89aa6b250c ('ipvlan: implement VEPA mode').
These two commits allow to configure the VEPA and private modes now.
This patch adds those options in ip command.

e.g.
  bash:~# ip link add link eth0 name ipvl0 type ipvlan mode l2 private
  -or-
  bash:~# ip link add link eth0 type ipvl0 type ipvlan mode l2 vepa

Also the output will reflect the mode and the mode-flag accordingly.
e.g.
  bash:~# ip -details link show ipvl0
  4: ipvl0@eth0:  mtu 1500 qdisc ...
 link/ether 00:1a:11:44:a5:3e brd ff:ff:ff:ff:ff:ff promiscuity 0
 ipvlan  mode l2 private addrgenmode eui64 numtxqueues 1 ...

Signed-off-by: Mahesh Bandewar 
---
 include/uapi/linux/if_link.h |  4 
 ip/iplink_ipvlan.c   | 35 ++-
 2 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index dafe0a6e0421..35bc598566e0 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -463,6 +463,7 @@ enum macsec_validation_type {
 enum {
IFLA_IPVLAN_UNSPEC,
IFLA_IPVLAN_MODE,
+   IFLA_IPVLAN_FLAGS,
__IFLA_IPVLAN_MAX
 };
 
@@ -475,6 +476,9 @@ enum ipvlan_mode {
IPVLAN_MODE_MAX
 };
 
+#define IPVLAN_F_PRIVATE   0x01
+#define IPVLAN_F_VEPA  0x02
+
 /* VXLAN section */
 enum {
IFLA_VXLAN_UNSPEC,
diff --git a/ip/iplink_ipvlan.c b/ip/iplink_ipvlan.c
index 9f48309ee030..8889808508fe 100644
--- a/ip/iplink_ipvlan.c
+++ b/ip/iplink_ipvlan.c
@@ -20,12 +20,21 @@
 
 static void ipvlan_explain(FILE *f)
 {
-   fprintf(f, "Usage: ... ipvlan [ mode { l2 | l3  | l3s } ]\n");
+   fprintf(f,
+   "Usage: ... ipvlan [ mode MODE ] [ FLAGS ]\n"
+   "\n"
+   "MODE: l3 | l3s | l2\n"
+   "FLAGS: bridge | private | vepa\n"
+   "(first values are the defaults if nothing is specified).\n"
+   );
 }
 
 static int ipvlan_parse_opt(struct link_util *lu, int argc, char **argv,
struct nlmsghdr *n)
 {
+   __u16 flags = 0;
+   bool mflag_given = false;
+
while (argc > 0) {
if (matches(*argv, "mode") == 0) {
__u16 mode = 0;
@@ -43,6 +52,14 @@ static int ipvlan_parse_opt(struct link_util *lu, int argc, 
char **argv,
return -1;
}
addattr16(n, 1024, IFLA_IPVLAN_MODE, mode);
+   } else if (matches(*argv, "private") == 0 && !mflag_given) {
+   flags |= IPVLAN_F_PRIVATE;
+   mflag_given = true;
+   } else if (matches(*argv, "vepa") == 0 && !mflag_given) {
+   flags |= IPVLAN_F_VEPA;
+   mflag_given = true;
+   } else if (matches(*argv, "bridge") == 0 && !mflag_given) {
+   mflag_given = true;
} else if (matches(*argv, "help") == 0) {
ipvlan_explain(stderr);
return -1;
@@ -55,6 +72,7 @@ static int ipvlan_parse_opt(struct link_util *lu, int argc, 
char **argv,
argc--;
argv++;
}
+   addattr16(n, 1024, IFLA_IPVLAN_FLAGS, flags);
 
return 0;
 }
@@ -75,6 +93,21 @@ static void ipvlan_print_opt(struct link_util *lu, FILE *f, 
struct rtattr *tb[])
print_string(PRINT_ANY, "mode", " mode %s ", mode_str);
}
}
+   if (tb[IFLA_IPVLAN_FLAGS]) {
+   if (RTA_PAYLOAD(tb[IFLA_IPVLAN_FLAGS]) == sizeof(__u16)) {
+   __u16 flags = rta_getattr_u16(tb[IFLA_IPVLAN_FLAGS]);
+
+   if (flags & IPVLAN_F_PRIVATE)
+   print_bool(PRINT_ANY, "private", "private ",
+  true);
+   else if (flags & IPVLAN_F_VEPA)
+   print_bool(PRINT_ANY, "vepa", "vepa ",
+  true);
+   else
+   print_bool(PRINT_ANY, "bridge", "bridge ",
+  true);
+   }
+   }
 }
 
 static void ipvlan_print_help(struct link_util *lu, int argc, char **argv,
-- 
2.15.0.rc2.357.g7e34df9404-goog



Re: [net-next] tcp_delack_timer circular locking dependancy

2017-10-30 Thread Eric Dumazet
On Mon, 2017-10-30 at 15:20 -0400, Dave Jones wrote:
> [  105.316650] ==
> [  105.316818] WARNING: possible circular locking dependency detected
> [  105.316986] 4.14.0-rc7-think+ #1 Not tainted
> [  105.317108] --
> [  105.317273] swapper/2/0 is trying to acquire lock:
> [  105.317407]  (
> [  105.317476] slock-AF_INET6
> [  105.317564] ){+.-.}
> [  105.317642] , at: [] tcp_delack_timer+0x26/0x130
> [  105.317807] 
>but task is already holding lock:
> [  105.317961]  (
> [  105.318024] (timer)
> [  105.318097] #5
> [  105.318168] ){+.-.}
> [  105.318241] , at: [] call_timer_fn+0x5/0x5e0
> [  105.318393] 
>which lock already depends on the new lock.
> 
> [  105.318594] 
>the existing dependency chain (in reverse order) is:
> [  105.318781] 
>-> #1
> [  105.318879]  (
> [  105.318939] (timer)
> [  105.319009] #5
> [  105.319068] ){+.-.}
> [  105.319137] :
> [  105.319195]del_timer_sync+0x3c/0xb0
> [  105.319313]inet_csk_reqsk_queue_drop+0x26c/0x4e0
> [  105.319459]inet_csk_complete_hashdance+0x1e/0x90
> [  105.319598]tcp_check_req+0x787/0x9a0
> [  105.319716]tcp_v6_rcv+0x914/0x1060
> [  105.319828]ip6_input_finish+0x291/0xba0
> [  105.319950]ip6_input+0xb2/0x380
> [  105.320059]ip6_rcv_finish+0x103/0x350
> [  105.320180]ipv6_rcv+0x93f/0xff0
> [  105.320291]__netif_receive_skb_core+0x13ef/0x1900
> [  105.320436]netif_receive_skb_internal+0xea/0x4c0
> [  105.320579]napi_gro_receive+0x28e/0x320
> [  105.320705]e1000_clean_rx_irq+0x3e9/0x6f0
> [  105.320838]e1000e_poll+0x14e/0x570
> [  105.320954]net_rx_action+0x4db/0xc80
> [  105.321075]__do_softirq+0x1ca/0x7bf
> [  105.321194]irq_exit+0x104/0x110
> [  105.321303]do_IRQ+0xb2/0x130
> [  105.321407]ret_from_intr+0x0/0x19
> [  105.321523]cpuidle_enter_state+0x223/0x5b0
> [  105.321655]do_idle+0x110/0x1b0
> [  105.321766]cpu_startup_entry+0xdb/0xe0
> [  105.321891]start_secondary+0x2e9/0x360
> [  105.322014]verify_cpu+0x0/0xf1
> [  105.322121] 
>-> #0
> [  105.322215]  (
> [  105.322276] slock-AF_INET6
> [  105.322359] ){+.-.}
> [  105.322428] :
> [  105.322487]lock_acquire+0x12e/0x350
> [  105.322602]_raw_spin_lock+0x30/0x70
> [  105.322722]tcp_delack_timer+0x26/0x130
> [  105.322846]call_timer_fn+0x188/0x5e0
> [  105.322966]__run_timers+0x54d/0x670
> [  105.323084]run_timer_softirq+0x2a/0x50
> [  105.323208]__do_softirq+0x1ca/0x7bf
> [  105.323325]irq_exit+0x104/0x110
> [  105.323435]smp_apic_timer_interrupt+0x14b/0x510
> [  105.323576]apic_timer_interrupt+0x9a/0xa0
> [  105.323705]cpuidle_enter_state+0x223/0x5b0
> [  105.323836]do_idle+0x110/0x1b0
> [  105.323944]cpu_startup_entry+0xdb/0xe0
> [  105.324067]start_secondary+0x2e9/0x360
> [  105.324189]verify_cpu+0x0/0xf1
> [  105.324295] 
>other info that might help us debug this:
> 
> [  105.324489]  Possible unsafe locking scenario:
> 
> [  105.324644]CPU0CPU1
> [  105.324767]
> [  105.324890]   lock(
> [  105.324963] (timer)
> [  105.325033] #5
> [  105.325093] );
> [  105.325152]lock(
> [  105.325278] slock-AF_INET6
> [  105.325360] );
> [  105.325419]lock(
> [  105.325544] (timer)
> [  105.325612] #5
> [  105.325670] );
> [  105.325729]   lock(
> [  105.325797] slock-AF_INET6
> [  105.325879] );
> [  105.325938] 
> *** DEADLOCK ***
> 
> [  105.326086] 1 lock held by swapper/2/0:
> [  105.326193]  #0: 
> [  105.326257]  (
> [  105.331697] (timer)
> [  105.337038] #5
> [  105.342339] ){+.-.}
> [  105.347620] , at: [] call_timer_fn+0x5/0x5e0
> [  105.353021] 
>stack backtrace:
> [  105.363515] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.14.0-rc7-think+ #1
> [  105.368886] Hardware name: LENOVO ThinkServer TS140/ThinkServer TS140, 
> BIOS FBKTB3AUS 06/16/2015
> [  105.374330] Call Trace:
> [  105.379697]  
> [  105.384997]  dump_stack+0xbc/0x145
> [  105.390339]  ? dma_virt_map_sg+0xfb/0xfb
> [  105.395733]  ? call_timer_fn+0x5/0x5e0
> [  105.401076]  ? print_lock+0x54/0x68
> [  105.406344]  print_circular_bug.isra.42+0x283/0x2bc
> [  105.411695]  ? print_circular_bug_header+0xda/0xda
> [  105.417054]  ? graph_lock+0x8d/0x100
> [  105.422419]  ? check_noncircular+0x20/0x20
> [  105.427857]  ? sched_clock_cpu+0x14/0xf0
> [  105.433309]  __lock_acquire+0x1f4a/0x2050
> [  105.438725]  ? debug_check_no_locks_freed+0x1a0/0x1a0
> [  105.444160]  ? __lock_acquire+0x6b3/0x2050
> [  105.449580]  ? debug_check_no_locks_freed+0x1a0/0x1a0
> [  105.455015]  ? 

Re: [PATCH] ravb: Use common error handling code in ravb_probe()

2017-10-30 Thread Sergei Shtylyov

Hello!

On 10/28/2017 08:19 PM, SF Markus Elfring wrote:


From: Markus Elfring 
Date: Sat, 28 Oct 2017 19:10:08 +0200

Add a jump target so that a bit of exception handling can be better reused
at the end of this function.


   Have you actually tried to see if there's any change in the resulting 
object code? I've looked at ARM gcc 4.8.5's output and it didn't really change 
-- gcc was able to optimize these repetitive assignments pretty well...



This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring 
---
  drivers/net/ethernet/renesas/ravb_main.c | 32 
  1 file changed, 16 insertions(+), 16 deletions(-)


   Diffstat also speaks about the futility of this patch.


diff --git a/drivers/net/ethernet/renesas/ravb_main.c 
b/drivers/net/ethernet/renesas/ravb_main.c
index a8822a756e08..62dbdf7de6cd 100644
--- a/drivers/net/ethernet/renesas/ravb_main.c
+++ b/drivers/net/ethernet/renesas/ravb_main.c



@@ -2226,6 +,10 @@ static int ravb_probe(struct platform_device *pdev)
pm_runtime_put(>dev);
pm_runtime_disable(>dev);
return error;
+
+failure_indication:


   Pretty poor label name -- I would've preferred 'no_irq' or smth...


+   error = irq;
+   goto out_release;


   The only good thing is that looking at the assembly, I was able to figure 
out that 'ndev' check at 'out_release' is futile... :-)


[...]

MBR, Sergei


Re: [PATCH net-next] net: display hw address of source machine during ipv6 DAD failure

2017-10-30 Thread Vishwanath Pai
On 10/30/2017 04:43 PM, David Ahern wrote:
> On 10/30/17 2:29 PM, Vishwanath Pai wrote:
>> This patch updates the error messages displayed in kernel log to include
>> hwaddress of the source machine that caused ipv6 duplicate address
>> detection failures.
>>
>> Examples:
>>
>> a) When we receive a NA packet from another machine advertising our
>> address:
>>
>> ICMPv6: NA: 34:ab:cd:56:11:e8 advertised our address 2601::2bb4 on eth0!
> 
> your example above does not agree with the format below. You have the
> compressed IPv6 address, yet the format ...
> 
>>
>> b) When we detect DAD failure during address assignment to an interface:
>>
>> IPv6: eth0: IPv6 duplicate address 2601::2b78 used by 34:ab:cd:56:11:e8
>> detected!
>>
>> Suggested-by: Igor Lubashev 
>> Signed-off-by: Vishwanath Pai 
>> ---
> 
> 
>> @@ -989,8 +990,8 @@ static void ndisc_recv_na(struct sk_buff *skb)
>>   */
>>  if (skb->pkt_type != PACKET_LOOPBACK)
>>  ND_PRINTK(1, warn,
>> -  "NA: someone advertises our address %pI6 on 
>> %s!\n",
>> -  >addr, ifp->idev->dev->name);
>> +  "NA: %pM advertised our address %pI6 on 
>> %s!\n",
> 
> ... is uncompressed. Please make that '%pI6c' instead of pI6 in the line
> above
> 
> 
>> +  eth_hdr(skb)->h_source, >addr, 
>> ifp->idev->dev->name);
>>  in6_ifa_put(ifp);
>>  return;
>>  }
> 
> 
> 

It does print out the full uncompressed IPv6 address. I modified the
message manually by hand while copying it out to the commit message in
order to hide the real IP address of my machines, but did not realize
that it was different from what the kernel would actually print out.

-Vishwanath


[PATCH net-next] bpf: avoid rcu_dereference inside bpf_event_mutex lock region

2017-10-30 Thread Yonghong Song
During perf event attaching/detaching bpf programs,
the tp_event->prog_array change is protected by the
bpf_event_mutex lock in both attaching and deteching
functions. Although tp_event->prog_array is a rcu
pointer, rcu_derefrence is not needed to access it
since mutex lock will guarantee ordering.

Verified through "make C=2" that sparse
locking check still happy with the new change.

Also change the label name in perf_event_{attach,detach}_bpf_prog
from "out" to "unlock" to reflect the code action after the label.

Signed-off-by: Yonghong Song 
Acked-by: Alexei Starovoitov 
Acked-by: Martin KaFai Lau 
---
 kernel/trace/bpf_trace.c | 17 +++--
 1 file changed, 7 insertions(+), 10 deletions(-)

Peter,

Could you check whether the below change to remove rcu_dereference_protected
is what you wanted or not?

Thanks!

diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index b65011d..e7685c5 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -767,20 +767,19 @@ int perf_event_attach_bpf_prog(struct perf_event *event,
mutex_lock(_event_mutex);
 
if (event->prog)
-   goto out;
+   goto unlock;
 
-   old_array = rcu_dereference_protected(event->tp_event->prog_array,
- 
lockdep_is_held(_event_mutex));
+   old_array = event->tp_event->prog_array;
ret = bpf_prog_array_copy(old_array, NULL, prog, _array);
if (ret < 0)
-   goto out;
+   goto unlock;
 
/* set the new array to event->tp_event and set event->prog */
event->prog = prog;
rcu_assign_pointer(event->tp_event->prog_array, new_array);
bpf_prog_array_free(old_array);
 
-out:
+unlock:
mutex_unlock(_event_mutex);
return ret;
 }
@@ -794,11 +793,9 @@ void perf_event_detach_bpf_prog(struct perf_event *event)
mutex_lock(_event_mutex);
 
if (!event->prog)
-   goto out;
-
-   old_array = rcu_dereference_protected(event->tp_event->prog_array,
- 
lockdep_is_held(_event_mutex));
+   goto unlock;
 
+   old_array = event->tp_event->prog_array;
ret = bpf_prog_array_copy(old_array, event->prog, NULL, _array);
if (ret < 0) {
bpf_prog_array_delete_safe(old_array, event->prog);
@@ -810,6 +807,6 @@ void perf_event_detach_bpf_prog(struct perf_event *event)
bpf_prog_put(event->prog);
event->prog = NULL;
 
-out:
+unlock:
mutex_unlock(_event_mutex);
 }
-- 
2.9.5



[PATCH net-next] net: filter: remove unused variable and fix warning

2017-10-30 Thread Jakub Kicinski
bpf_getsockopt bpf call sets the ret variable to zero and
never changes it.  What's worse in case CONFIG_INET is
not selected the variable is completely unused generating
a warning.

Signed-off-by: Jakub Kicinski 
Reviewed-by: Quentin Monnet 
---
 net/core/filter.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index 721c30889033..a0112168d6f9 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3288,7 +3288,6 @@ BPF_CALL_5(bpf_getsockopt, struct bpf_sock_ops_kern *, 
bpf_sock,
   int, level, int, optname, char *, optval, int, optlen)
 {
struct sock *sk = bpf_sock->sk;
-   int ret = 0;
 
if (!sk_fullsock(sk))
goto err_clear;
@@ -3308,7 +3307,7 @@ BPF_CALL_5(bpf_getsockopt, struct bpf_sock_ops_kern *, 
bpf_sock,
} else {
goto err_clear;
}
-   return ret;
+   return 0;
 #endif
 err_clear:
memset(optval, 0, optlen);
-- 
2.14.1



Re: [PATCH net-next] net: display hw address of source machine during ipv6 DAD failure

2017-10-30 Thread David Ahern
On 10/30/17 2:29 PM, Vishwanath Pai wrote:
> This patch updates the error messages displayed in kernel log to include
> hwaddress of the source machine that caused ipv6 duplicate address
> detection failures.
> 
> Examples:
> 
> a) When we receive a NA packet from another machine advertising our
> address:
> 
> ICMPv6: NA: 34:ab:cd:56:11:e8 advertised our address 2601::2bb4 on eth0!

your example above does not agree with the format below. You have the
compressed IPv6 address, yet the format ...

> 
> b) When we detect DAD failure during address assignment to an interface:
> 
> IPv6: eth0: IPv6 duplicate address 2601::2b78 used by 34:ab:cd:56:11:e8
> detected!
> 
> Suggested-by: Igor Lubashev 
> Signed-off-by: Vishwanath Pai 
> ---


> @@ -989,8 +990,8 @@ static void ndisc_recv_na(struct sk_buff *skb)
>*/
>   if (skb->pkt_type != PACKET_LOOPBACK)
>   ND_PRINTK(1, warn,
> -   "NA: someone advertises our address %pI6 on 
> %s!\n",
> -   >addr, ifp->idev->dev->name);
> +   "NA: %pM advertised our address %pI6 on 
> %s!\n",

... is uncompressed. Please make that '%pI6c' instead of pI6 in the line
above


> +   eth_hdr(skb)->h_source, >addr, 
> ifp->idev->dev->name);
>   in6_ifa_put(ifp);
>   return;
>   }





Re: [run_timer_softirq] BUG: unable to handle kernel paging request at 0000000000010007

2017-10-30 Thread Fengguang Wu

On Mon, Oct 30, 2017 at 12:29:47PM -0700, Linus Torvalds wrote:

On Sun, Oct 29, 2017 at 4:48 PM, Fengguang Wu  wrote:


Here are 3 dmesgs related to wireless and 1 from ethernet.


Fengguang, these would be lovelier still _if_ you have DEBUG_INFO
enabled on the kernel, and your script were to find things like
"symbol+0xhex/0xhex", and run "./scripts/faddr2line" on them.

So


[  235.425464] BUG: unable to handle kernel paging request at 00010007
[  235.425470] IP: run_timer_softirq+0x13a/0x470


would also then have

  run_timer_softirq at timer.c:XYZ

which would make it easier to see exactly _what_ it is that faults. As
it is, I think there's a fair number of inlining that makes it hard to
see the cause, but that faddrtoline would make very obvious.


Good idea and tips! It'll definitely help debug the issues where
bisect cannot help.


Finding that "symbol+xyz/abc" pattern should be fairly easy to
automate, and should fit the 0day model fairly well. No?


Sure. We'll add DEBUG_INFO and automate faddr2line.

Regards,
Fengguang


[PATCH net-next] net: display hw address of source machine during ipv6 DAD failure

2017-10-30 Thread Vishwanath Pai
This patch updates the error messages displayed in kernel log to include
hwaddress of the source machine that caused ipv6 duplicate address
detection failures.

Examples:

a) When we receive a NA packet from another machine advertising our
address:

ICMPv6: NA: 34:ab:cd:56:11:e8 advertised our address 2601::2bb4 on eth0!

b) When we detect DAD failure during address assignment to an interface:

IPv6: eth0: IPv6 duplicate address 2601::2b78 used by 34:ab:cd:56:11:e8
detected!

Suggested-by: Igor Lubashev 
Signed-off-by: Vishwanath Pai 
---
 include/net/addrconf.h | 2 +-
 net/ipv6/addrconf.c| 6 +++---
 net/ipv6/ndisc.c   | 9 +
 3 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/include/net/addrconf.h b/include/net/addrconf.h
index 15b5ffd..2a616ea 100644
--- a/include/net/addrconf.h
+++ b/include/net/addrconf.h
@@ -208,7 +208,7 @@ bool inet6_mc_check(struct sock *sk, const struct in6_addr 
*mc_addr,
 void ipv6_mc_init_dev(struct inet6_dev *idev);
 void ipv6_mc_destroy_dev(struct inet6_dev *idev);
 int ipv6_mc_check_mld(struct sk_buff *skb, struct sk_buff **skb_trimmed);
-void addrconf_dad_failure(struct inet6_ifaddr *ifp);
+void addrconf_dad_failure(struct sk_buff *skb, struct inet6_ifaddr *ifp);
 
 bool ipv6_chk_mcast_addr(struct net_device *dev, const struct in6_addr *group,
 const struct in6_addr *src_addr);
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 5a8a102..cfa374c 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -1987,7 +1987,7 @@ static int addrconf_dad_end(struct inet6_ifaddr *ifp)
return err;
 }
 
-void addrconf_dad_failure(struct inet6_ifaddr *ifp)
+void addrconf_dad_failure(struct sk_buff *skb, struct inet6_ifaddr *ifp)
 {
struct inet6_dev *idev = ifp->idev;
struct net *net = dev_net(ifp->idev->dev);
@@ -1997,8 +1997,8 @@ void addrconf_dad_failure(struct inet6_ifaddr *ifp)
return;
}
 
-   net_info_ratelimited("%s: IPv6 duplicate address %pI6c detected!\n",
-ifp->idev->dev->name, >addr);
+   net_info_ratelimited("%s: IPv6 duplicate address %pI6c used by %pM 
detected!\n",
+ifp->idev->dev->name, >addr, 
eth_hdr(skb)->h_source);
 
spin_lock_bh(>lock);
 
diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
index 266a530..9d6ea9d 100644
--- a/net/ipv6/ndisc.c
+++ b/net/ipv6/ndisc.c
@@ -46,6 +46,7 @@
 #endif
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -822,7 +823,7 @@ static void ndisc_recv_ns(struct sk_buff *skb)
 * who is doing DAD
 * so fail our DAD process
 */
-   addrconf_dad_failure(ifp);
+   addrconf_dad_failure(skb, ifp);
return;
} else {
/*
@@ -975,7 +976,7 @@ static void ndisc_recv_na(struct sk_buff *skb)
if (ifp) {
if (skb->pkt_type != PACKET_LOOPBACK
&& (ifp->flags & IFA_F_TENTATIVE)) {
-   addrconf_dad_failure(ifp);
+   addrconf_dad_failure(skb, ifp);
return;
}
/* What should we make now? The advertisement
@@ -989,8 +990,8 @@ static void ndisc_recv_na(struct sk_buff *skb)
 */
if (skb->pkt_type != PACKET_LOOPBACK)
ND_PRINTK(1, warn,
- "NA: someone advertises our address %pI6 on 
%s!\n",
- >addr, ifp->idev->dev->name);
+ "NA: %pM advertised our address %pI6 on 
%s!\n",
+ eth_hdr(skb)->h_source, >addr, 
ifp->idev->dev->name);
in6_ifa_put(ifp);
return;
}
-- 
1.9.1



Re: [PATCH net-next v5 06/10] bnxt: Add devlink support for config get/set

2017-10-30 Thread Steve Lin
On Mon, Oct 30, 2017 at 1:35 PM, Jiri Pirko  wrote:
> Mon, Oct 30, 2017 at 03:46:12PM CET, steven.l...@broadcom.com wrote:
>>Implements get and set of configuration parameters using new devlink
>>config get/set API. Parameters themselves defined in later patches.
>>
>>Signed-off-by: Steve Lin 
>>Acked-by: Andy Gospodarek 
>>---
>> drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c | 258 
>> +-
>> drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.h |  17 ++
>> 2 files changed, 263 insertions(+), 12 deletions(-)
>>
>>diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c 
>>b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c
>>index 402fa32..deb24e0 100644
>
> [...]
>
>>+static int bnxt_dl_perm_config_set(struct devlink *devlink,
>>+ enum devlink_perm_config_param param,
>>+ enum devlink_perm_config_type type,
>>+ union devlink_perm_config_value *value,
>>+ bool *restart_reqd)
>>+{
>>+  struct bnxt *bp = bnxt_get_bp_from_dl(devlink);
>>+  struct bnxt_drv_cfgparam *entry = NULL;
>>+  u32 value_int;
>>+  u32 bytesize;
>>+  int idx = 0;
>>+  int ret = 0;
>>+  u16 val16;
>>+  u8 val8;
>>+  int i;
>>+
>>+  *restart_reqd = 0;
>>+
>>+  /* Find parameter in table */
>>+  for (i = 0; i < BNXT_NUM_DRV_CFGPARAM; i++) {
>>+  if (param == bnxt_drv_cfgparam_list[i].param) {
>>+  entry = _drv_cfgparam_list[i];
>>+  break;
>>+  }
>>+  }
>>+
>>+  /* Not found */
>>+  if (!entry)
>>+  return -EINVAL;
>>+
>>+  /* Check to see if this func type can access variable */
>>+  if (BNXT_PF(bp) && !(entry->func & BNXT_DRV_PF))
>>+  return -EOPNOTSUPP;
>>+  if (BNXT_VF(bp) && !(entry->func & BNXT_DRV_VF))
>>+  return -EOPNOTSUPP;
>>+
>>+  /* If parameter is per port or function, compute index */
>>+  if (entry->appl == BNXT_DRV_APPL_PORT) {
>>+  idx = bp->pf.port_id;
>>+  } else if (entry->appl == BNXT_DRV_APPL_FUNCTION) {
>>+  if (BNXT_PF(bp))
>>+  idx = bp->pf.fw_fid - BNXT_FIRST_PF_FID;
>>   }
>>
>>+  bytesize = roundup(entry->bitlength, BITS_PER_BYTE) / BITS_PER_BYTE;
>>+
>>+  /* Type expected by device may or may not be the same as type passed
>>+   * in from devlink API.  So first convert to an interval u32 value,
>>+   * then to type needed by device.
>>+   */
>>+  if (type == DEVLINK_PERM_CONFIG_TYPE_U8) {
>>+  value_int = value->value8;
>>+  } else if (type == DEVLINK_PERM_CONFIG_TYPE_U16) {
>>+  value_int = value->value16;
>>+  } else if (type == DEVLINK_PERM_CONFIG_TYPE_U32) {
>>+  value_int = value->value32;
>>+  } else {
>>+  /* Unsupported type */
>>+  return -EINVAL;
>>+  }
>>+
>>+  if (bytesize == 1) {
>>+  val8 = value_int;
>
>>+  ret = bnxt_nvm_write(bp, entry->nvm_param, idx, ,
>>+   entry->bitlength);
>>+  } else if (bytesize == 2) {
>>+  val16 = value_int;
>>+  ret = bnxt_nvm_write(bp, entry->nvm_param, idx, ,
>>+   entry->bitlength);
>>+  } else {
>>+  ret = bnxt_nvm_write(bp, entry->nvm_param, idx, _int,
>>+   entry->bitlength);
>>+  }
>>+
>>+  /* Restart required for all nvm parameter writes */
>>+  *restart_reqd = 1;
>>+
>>+  return ret;
>>+}
>
> I don't like this swissknife approach for setters and getters. It's
> messy, and hard to read. There should be clean get/set pair function
> per parameter defined in array.

The advantage of the swiss kinfe approach is that you don't have a
large number of almost-identical functions (i.e. any "set" that
operates on a u32 will be doing essentially the exact same thing) in
the driver, which could lead to code bloat and also make it harder to
catch errors (since any bug that may creep in during the copy/paste to
make so many functions will only be caught when that specific
parameter is set, rather than when any parameter is set).

In general, I very much appreciate the thorough review.  I do feel,
though, like some of this basic architectural stuff could have been
addressed more efficiently in v1 or v2, rather than v5.  I admit that
when the comments to v4 included suggestions like removing extra
spaces in the commit message, I was thinking that we were close to
reaching consensus. :)

I just want to make sure that you feel that the overall goal of this
effort is worthwhile - I will make the changes you've suggested here
and in the other replies to v5, but I am hopeful that v6 does not
result in another set of big architectural changes.

> Something 

Re: [PATCH net-next v5 02/10] devlink: Adding SR-IOV enablement perm config param

2017-10-30 Thread Steve Lin
On Mon, Oct 30, 2017 at 1:20 PM, Jiri Pirko  wrote:
> Mon, Oct 30, 2017 at 03:46:08PM CET, steven.l...@broadcom.com wrote:
>>Adding DEVLINK_PERM_CONFIG_SRIOV_ENABLED permanent config
>>parameter; this parameter controls whether the device
>>advertises the SR-IOV PCI capability. Value is permanent, so
>>becomes the new default value for this device.
>>
>>  DEVLINK_PERM_CONFIG_DISABLE = Disable SR-IOV
>>  DEVLINK_PERM_CONFIG_ENABLE  = Enable SR-IOV
>>
>>Signed-off-by: Steve Lin 
>>Acked-by: Andy Gospodarek 
>>---
>> include/uapi/linux/devlink.h | 18 +-
>> net/core/devlink.c   |  1 +
>> 2 files changed, 18 insertions(+), 1 deletion(-)
>>
>>diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
>>index 55e35f2..1c5d4ae 100644
>>--- a/include/uapi/linux/devlink.h
>>+++ b/include/uapi/linux/devlink.h
>>@@ -256,8 +256,24 @@ enum devlink_dpipe_header_id {
>>   DEVLINK_DPIPE_HEADER_IPV6,
>> };
>>
>>-/* Permanent config parameters */
>>+/* Permanent config parameter enable/disable
>>+ * DEVLINK_PERM_CONFIG_DISABLE = disable a permanent config parameter
>>+ * DEVLINK_PERM_CONFIG_ENBALE  = enable a permanent config parameter
>>+ */
>>+enum devlink_perm_config_enabled {
>>+  DEVLINK_PERM_CONFIG_DISABLE,
>>+  DEVLINK_PERM_CONFIG_ENABLE,
>>+};
>
>
> Basically, this is "bool"
>
> Why don't you use some own enum instead of NLA_U*. Like team does for
> example:
>
> enum team_option_type {
> TEAM_OPTION_TYPE_U32,
> TEAM_OPTION_TYPE_STRING,
> TEAM_OPTION_TYPE_BINARY,
> TEAM_OPTION_TYPE_BOOL,
> TEAM_OPTION_TYPE_S32,
> };
>
>

I had added enum devlink_perm_config_type in v5 based on your comments
in v4 - I will add BOOL if it helps clarity.

>
>>+
>>+/* Permanent config parameters:
>>+ * DEVLINK_PERM_CONFIG_SRIOV_ENABLED: Configures whether SR-IOV PCI 
>>capability
>>+ * is advertised by the device.
>>+ *   DEVLINK_PERM_CONFIG_DISABLE = disable SR-IOV
>>+ *   DEVLINK_PERM_CONFIG_ENABLE  = enable SR-IOV
>>+ */
>> enum devlink_perm_config_param {
>>+  DEVLINK_PERM_CONFIG_SRIOV_ENABLED,
>
>
> Please align "ENABLED" vs "ENABLE".
> The rest of devlink code uses "ENABLED"
>

Ok.

>
>>+
>>   __DEVLINK_PERM_CONFIG_MAX,
>>   DEVLINK_PERM_CONFIG_MAX = __DEVLINK_PERM_CONFIG_MAX - 1
>> };
>>diff --git a/net/core/devlink.c b/net/core/devlink.c
>>index 5e77408..b4d43c29 100644
>>--- a/net/core/devlink.c
>>+++ b/net/core/devlink.c
>>@@ -1569,6 +1569,7 @@ static int devlink_nl_cmd_eswitch_set_doit(struct 
>>sk_buff *skb,
>> static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1];
>>
>> static const u8 devlink_perm_cfg_param_types[DEVLINK_PERM_CONFIG_MAX + 1] = {
>>+  [DEVLINK_PERM_CONFIG_SRIOV_ENABLED] = NLA_U8,
>> };
>>
>> static int devlink_nl_single_param_get(struct sk_buff *msg,
>>--
>>2.7.4
>>


Re: [PATCH net-next v5 01/10] devlink: Add permanent config parameter get/set operations

2017-10-30 Thread Steve Lin
On Mon, Oct 30, 2017 at 1:03 PM, Jiri Pirko  wrote:
> Mon, Oct 30, 2017 at 03:46:07PM CET, steven.l...@broadcom.com wrote:
>>Add support for permanent config parameter get/set commands. Used
>>for persistent device configuration parameters.
>>
>>Signed-off-by: Steve Lin 
>>Acked-by: Andy Gospodarek 
>
> In general, I don't like how you use netlink. Please see the rest of
> this code. We should do things in the common way. More info inlined.
>
>
> [...]
>
>>
>>+/* Permanent config parameters */
>>+enum devlink_perm_config_param {
>>+  __DEVLINK_PERM_CONFIG_MAX,
>>+  DEVLINK_PERM_CONFIG_MAX = __DEVLINK_PERM_CONFIG_MAX - 1
>>+};
>>+
>>+/* Permanent config types */
>>+enum devlink_perm_config_type {
>>+  DEVLINK_PERM_CONFIG_TYPE_UNSPEC,
>
> You should not need this. Type should be always specified.

This was an attempt to align enum devlink_perm_config_type with the
NLA_* enums in netlink.h, which include the unspecified type.  I will
remove the unspecified enum, but it seems like having multiple type
enums that are not aligned with each other could be problematic.

>
>
>>+  DEVLINK_PERM_CONFIG_TYPE_U8,
>>+  DEVLINK_PERM_CONFIG_TYPE_U16,
>>+  DEVLINK_PERM_CONFIG_TYPE_U32,
>>+};
>
> This definitelly should not be in uapi header.
>
>
>>+
>>+/* Permanent config value */
>>+union devlink_perm_config_value {
>>+  __u8value8;
>>+  __u16   value16;
>>+  __u32   value32;
>>+};
>
> This definitelly should not be in uapi header.

Ok, will move these.

>
>
>
>>+
>> #endif /* _UAPI_LINUX_DEVLINK_H_ */
>
> [...]
>
>
>>+static int devlink_nl_config_get_fill(struct sk_buff *msg,
>>+struct devlink *devlink,
>>+enum devlink_command cmd,
>>+struct genl_info *info)
>>+{
>>+  struct nlattr *tb[DEVLINK_ATTR_MAX + 1];
>>+  enum devlink_perm_config_param param;
>>+  enum devlink_perm_config_type type;
>>+  struct nlattr *cfgparam_attr;
>>+  struct nlattr *attr;
>>+  void *hdr;
>>+  int err;
>>+  int rem;
>>+
>>+  hdr = genlmsg_put(msg, info->snd_portid, info->snd_seq,
>>+_nl_family, 0, cmd);
>>+  if (!hdr)
>>+  return -EMSGSIZE;
>>+
>>+  err = devlink_nl_put_handle(msg, devlink);
>>+  if (err)
>>+  goto nla_put_failure;
>>+
>>+  if (!info->attrs[DEVLINK_ATTR_PERM_CONFIGS]) {
>>+  err = -EINVAL;
>>+  goto nla_put_failure;
>>+  }
>>+
>>+  cfgparam_attr = nla_nest_start(msg, DEVLINK_ATTR_PERM_CONFIGS);
>>+
>>+  nla_for_each_nested(attr, info->attrs[DEVLINK_ATTR_PERM_CONFIGS],
>
>
> Okay. So I have to know what is in kernel in order to get the value.
>
> We need a dump operation. In fact, why can't we just have dump operation?

Ok, I can add a dump operation.

>
>
>
>>+  rem) {
>>+  err = nla_parse_nested(tb, DEVLINK_ATTR_MAX, attr,
>>+ devlink_nl_policy, NULL);
>>+  if (err)
>>+  goto nla_nest_failure;
>>+  if (!tb[DEVLINK_ATTR_PERM_CONFIG_PARAMETER] ||
>>+  !tb[DEVLINK_ATTR_PERM_CONFIG_TYPE])
>>+  continue;
>>+
>>+  param = nla_get_u32(tb[DEVLINK_ATTR_PERM_CONFIG_PARAMETER]);
>>+  type = nla_get_u8(tb[DEVLINK_ATTR_PERM_CONFIG_TYPE]);
>
> So to get parameter, I have to specify a type? What if I specify wrong
> type? That is wrong. The type should be read attr for get.

For set, seems like the user would need to know and set the type.  So,
I was just doing the same for get().  But I can make the type a read
attribute for get.

>
> You should have a param -> type table and always use that.
>
>
>>+  if (param > DEVLINK_PERM_CONFIG_MAX ||
>>+  type > NLA_TYPE_MAX) {
>>+  continue;
>>+  }
>>+  /* Note if single_param_get fails, that param won't be in
>>+   * response msg, so caller will know which param(s) failed to
>>+   * get.
>>+   */
>>+  devlink_nl_single_param_get(msg, devlink, param, type);
>>+  }
>>+
>>+  nla_nest_end(msg, cfgparam_attr);
>>+
>>+  genlmsg_end(msg, hdr);
>>+  return 0;
>>+
>>+nla_nest_failure:
>>+  nla_nest_cancel(msg, cfgparam_attr);
>>+nla_put_failure:
>
>
> You should make this multipart aware. If config parameters won't fit
> into a single skb, you have to allocate another one. See 
> devlink_dpipe_tables_fill
> as an example how to do it. It is important to have this from the
> beginning as the userspace knows to expect multipart message.

In a response to a similar comment from v2, I tried to describe why I
felt multipart messages wouldn't be required.  There was no response
to my reply, so I thought that my argument was accepted.  Apparently
not, so I will 

Re: [PATCH net-next] net: bcmgenet: Avoid calling platform_device_put() twice in bcmgenet_mii_exit()

2017-10-30 Thread Florian Fainelli
On 10/27/2017 10:05 PM, Wei Yongjun wrote:
> Remove platform_device_put() call after platform_device_unregister()
> from function bcmgenet_mii_exit(), otherwise, we will call
> platform_device_put() twice.
> 
> Fixes: 9a4e79697009 ("net: bcmgenet: utilize generic Broadcom UniMAC
> MDIO controller driver")

Please don't force a wrap to 80 columns for commit subjects.

> Signed-off-by: Wei Yongjun 

Acked-by: Florian Fainelli 

> ---
>  drivers/net/ethernet/broadcom/genet/bcmmii.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/broadcom/genet/bcmmii.c 
> b/drivers/net/ethernet/broadcom/genet/bcmmii.c
> index ba3fcfd..5333274 100644
> --- a/drivers/net/ethernet/broadcom/genet/bcmmii.c
> +++ b/drivers/net/ethernet/broadcom/genet/bcmmii.c
> @@ -571,5 +571,4 @@ void bcmgenet_mii_exit(struct net_device *dev)
>   of_phy_deregister_fixed_link(dn);
>   of_node_put(priv->phy_dn);
>   platform_device_unregister(priv->mii_pdev);
> - platform_device_put(priv->mii_pdev);
>  }
> 


-- 
Florian


Re: [run_timer_softirq] BUG: unable to handle kernel paging request at 0000000000010007

2017-10-30 Thread Linus Torvalds
On Sun, Oct 29, 2017 at 4:48 PM, Fengguang Wu  wrote:
>
> Here are 3 dmesgs related to wireless and 1 from ethernet.

Fengguang, these would be lovelier still _if_ you have DEBUG_INFO
enabled on the kernel, and your script were to find things like
"symbol+0xhex/0xhex", and run "./scripts/faddr2line" on them.

So

> [  235.425464] BUG: unable to handle kernel paging request at 00010007
> [  235.425470] IP: run_timer_softirq+0x13a/0x470

would also then have

   run_timer_softirq at timer.c:XYZ

which would make it easier to see exactly _what_ it is that faults. As
it is, I think there's a fair number of inlining that makes it hard to
see the cause, but that faddrtoline would make very obvious.

Finding that "symbol+xyz/abc" pattern should be fairly easy to
automate, and should fit the 0day model fairly well. No?

In this case, the trapping instruction ends up decoding to

   0: 4c 8d 6c c5 90lea-0x70(%rbp,%rax,8),%r13
   5: 49 8b 45 00  mov0x0(%r13),%rax
   9: 48 85 c0  test   %rax,%rax
   c: 74 deje 0xffec
   e: 4d 8b 7d 00  mov0x0(%r13),%r15
  12: 4d 89 7c 24 08mov%r15,0x8(%r12)
  17: 0f 1f 44 00 00nopl   0x0(%rax,%rax,1)
  1c: 49 8b 07  mov(%r15),%rax
  1f: 49 8b 57 08  mov0x8(%r15),%rdx
  23: 48 85 c0  test   %rax,%rax
  26: 48 89 02  mov%rax,(%rdx)
  29: 74 04je 0x2f
  2b:* 48 89 50 08  mov%rdx,0x8(%rax) <-- trapping instruction
  2f: 41 f6 47 2a 20testb  $0x20,0x2a(%r15)
  34: 49 c7 47 08 00 00 00 movq   $0x0,0x8(%r15)

and %rax has the value 0x, so yes, it will trap at 0x10007.

It's not trivial to see just *wjhat* access it is.

I *think* that "testb $32" is checking for TIMER_IRQSAFE in
expire_timers(), and that the oops is due to the list operations in
detach_timer() (inlined).

Which doesn't really help: it looks like the timer lists are corrupt.
With some luck, some register state could have the timer function
pointer in it, and we'd get a hint of *which* timer this is, but that
doesn't look to be the case here either.

I'm not seeing anything to really help debug this here.

   Linus


Re: [patch net-next RFC 1/9] net_sch: red: Add offload ability to RED qdisc

2017-10-30 Thread David Ahern
On 10/30/17 7:54 AM, Nogah Frankel wrote:
>>> diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
>>> index 0e88cc2..743c42a 100644
>>> --- a/include/uapi/linux/pkt_sched.h
>>> +++ b/include/uapi/linux/pkt_sched.h
>>> @@ -255,6 +255,7 @@ struct tc_red_qopt {
>>>  #define TC_RED_ECN 1
>>>  #define TC_RED_HARDDROP2
>>>  #define TC_RED_ADAPTATIVE  4
>>> +#define TC_RED_OFFLOADED   8
>>>  };
>>>
>>>  struct tc_red_xstats {
>>
>> What keeps a user from setting this flag in the tc_red_qopt it
>> passes into the a change operation?
> 
> Nothing keeps the user from doing it, but it has no effect.
> The decision to offload is the driver's only.
> It is basically a read-only flag.
> 

If it is read-only, then attempts to set it from userspace should cause
the command to fail.


Re: [PATCH net-next 1/1] net sched qdisc: pass netlink message flags in event notification

2017-10-30 Thread Cong Wang
On Mon, Oct 30, 2017 at 11:07 AM, Roman Mashak  wrote:
> Cong Wang  writes:
>
>> On Sat, Oct 28, 2017 at 8:36 PM, Roman Mashak  wrote:
>>> Cong Wang  writes:
>>
>> Hmm, I thought you use RTM_NEWQDISC+RTM_DELQDISC to
>> determine it is replacement, no?
>
> Create is RTM_NEWQDISC and NLM_F_EXCL|NLM_F_CREATE, replacement is
> RTM_NEWQDISC and NLM_F_REPLACE in netlink flags.

Is there any reason we can't use RTM_NEWQDISC+RTM_DELQDISC
rather than NLM_F_REPLACE to determine it is replacement?

Note, RTM_NEWQDISC+RTM_DELQDISC are put in a same
message not two.


[net-next] tcp_delack_timer circular locking dependancy

2017-10-30 Thread Dave Jones
[  105.316650] ==
[  105.316818] WARNING: possible circular locking dependency detected
[  105.316986] 4.14.0-rc7-think+ #1 Not tainted
[  105.317108] --
[  105.317273] swapper/2/0 is trying to acquire lock:
[  105.317407]  (
[  105.317476] slock-AF_INET6
[  105.317564] ){+.-.}
[  105.317642] , at: [] tcp_delack_timer+0x26/0x130
[  105.317807] 
   but task is already holding lock:
[  105.317961]  (
[  105.318024] (timer)
[  105.318097] #5
[  105.318168] ){+.-.}
[  105.318241] , at: [] call_timer_fn+0x5/0x5e0
[  105.318393] 
   which lock already depends on the new lock.

[  105.318594] 
   the existing dependency chain (in reverse order) is:
[  105.318781] 
   -> #1
[  105.318879]  (
[  105.318939] (timer)
[  105.319009] #5
[  105.319068] ){+.-.}
[  105.319137] :
[  105.319195]del_timer_sync+0x3c/0xb0
[  105.319313]inet_csk_reqsk_queue_drop+0x26c/0x4e0
[  105.319459]inet_csk_complete_hashdance+0x1e/0x90
[  105.319598]tcp_check_req+0x787/0x9a0
[  105.319716]tcp_v6_rcv+0x914/0x1060
[  105.319828]ip6_input_finish+0x291/0xba0
[  105.319950]ip6_input+0xb2/0x380
[  105.320059]ip6_rcv_finish+0x103/0x350
[  105.320180]ipv6_rcv+0x93f/0xff0
[  105.320291]__netif_receive_skb_core+0x13ef/0x1900
[  105.320436]netif_receive_skb_internal+0xea/0x4c0
[  105.320579]napi_gro_receive+0x28e/0x320
[  105.320705]e1000_clean_rx_irq+0x3e9/0x6f0
[  105.320838]e1000e_poll+0x14e/0x570
[  105.320954]net_rx_action+0x4db/0xc80
[  105.321075]__do_softirq+0x1ca/0x7bf
[  105.321194]irq_exit+0x104/0x110
[  105.321303]do_IRQ+0xb2/0x130
[  105.321407]ret_from_intr+0x0/0x19
[  105.321523]cpuidle_enter_state+0x223/0x5b0
[  105.321655]do_idle+0x110/0x1b0
[  105.321766]cpu_startup_entry+0xdb/0xe0
[  105.321891]start_secondary+0x2e9/0x360
[  105.322014]verify_cpu+0x0/0xf1
[  105.322121] 
   -> #0
[  105.322215]  (
[  105.322276] slock-AF_INET6
[  105.322359] ){+.-.}
[  105.322428] :
[  105.322487]lock_acquire+0x12e/0x350
[  105.322602]_raw_spin_lock+0x30/0x70
[  105.322722]tcp_delack_timer+0x26/0x130
[  105.322846]call_timer_fn+0x188/0x5e0
[  105.322966]__run_timers+0x54d/0x670
[  105.323084]run_timer_softirq+0x2a/0x50
[  105.323208]__do_softirq+0x1ca/0x7bf
[  105.323325]irq_exit+0x104/0x110
[  105.323435]smp_apic_timer_interrupt+0x14b/0x510
[  105.323576]apic_timer_interrupt+0x9a/0xa0
[  105.323705]cpuidle_enter_state+0x223/0x5b0
[  105.323836]do_idle+0x110/0x1b0
[  105.323944]cpu_startup_entry+0xdb/0xe0
[  105.324067]start_secondary+0x2e9/0x360
[  105.324189]verify_cpu+0x0/0xf1
[  105.324295] 
   other info that might help us debug this:

[  105.324489]  Possible unsafe locking scenario:

[  105.324644]CPU0CPU1
[  105.324767]
[  105.324890]   lock(
[  105.324963] (timer)
[  105.325033] #5
[  105.325093] );
[  105.325152]lock(
[  105.325278] slock-AF_INET6
[  105.325360] );
[  105.325419]lock(
[  105.325544] (timer)
[  105.325612] #5
[  105.325670] );
[  105.325729]   lock(
[  105.325797] slock-AF_INET6
[  105.325879] );
[  105.325938] 
*** DEADLOCK ***

[  105.326086] 1 lock held by swapper/2/0:
[  105.326193]  #0: 
[  105.326257]  (
[  105.331697] (timer)
[  105.337038] #5
[  105.342339] ){+.-.}
[  105.347620] , at: [] call_timer_fn+0x5/0x5e0
[  105.353021] 
   stack backtrace:
[  105.363515] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.14.0-rc7-think+ #1
[  105.368886] Hardware name: LENOVO ThinkServer TS140/ThinkServer TS140, BIOS 
FBKTB3AUS 06/16/2015
[  105.374330] Call Trace:
[  105.379697]  
[  105.384997]  dump_stack+0xbc/0x145
[  105.390339]  ? dma_virt_map_sg+0xfb/0xfb
[  105.395733]  ? call_timer_fn+0x5/0x5e0
[  105.401076]  ? print_lock+0x54/0x68
[  105.406344]  print_circular_bug.isra.42+0x283/0x2bc
[  105.411695]  ? print_circular_bug_header+0xda/0xda
[  105.417054]  ? graph_lock+0x8d/0x100
[  105.422419]  ? check_noncircular+0x20/0x20
[  105.427857]  ? sched_clock_cpu+0x14/0xf0
[  105.433309]  __lock_acquire+0x1f4a/0x2050
[  105.438725]  ? debug_check_no_locks_freed+0x1a0/0x1a0
[  105.444160]  ? __lock_acquire+0x6b3/0x2050
[  105.449580]  ? debug_check_no_locks_freed+0x1a0/0x1a0
[  105.455015]  ? sched_clock_cpu+0x14/0xf0
[  105.460514]  ? __lock_acquire+0x6b3/0x2050
[  105.465984]  ? cyc2ns_read_end+0x10/0x10
[  105.471395]  ? debug_check_no_locks_freed+0x1a0/0x1a0
[  105.476934]  ? mark_lock+0x16f/0x9b0
[  105.482507]  ? print_irqtrace_events+0x110/0x110
[  105.488150]  ? 

Re: [PATCH net-next] net: bcmgenet: Avoid calling platform_device_put() twice in bcmgenet_mii_exit()

2017-10-30 Thread Doug Berger
On 10/27/2017 10:05 PM, Wei Yongjun wrote:
> Remove platform_device_put() call after platform_device_unregister()
> from function bcmgenet_mii_exit(), otherwise, we will call
> platform_device_put() twice.
> 
> Fixes: 9a4e79697009 ("net: bcmgenet: utilize generic Broadcom UniMAC
> MDIO controller driver")
> Signed-off-by: Wei Yongjun 
> ---
>  drivers/net/ethernet/broadcom/genet/bcmmii.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/broadcom/genet/bcmmii.c 
> b/drivers/net/ethernet/broadcom/genet/bcmmii.c
> index ba3fcfd..5333274 100644
> --- a/drivers/net/ethernet/broadcom/genet/bcmmii.c
> +++ b/drivers/net/ethernet/broadcom/genet/bcmmii.c
> @@ -571,5 +571,4 @@ void bcmgenet_mii_exit(struct net_device *dev)
>   of_phy_deregister_fixed_link(dn);
>   of_node_put(priv->phy_dn);
>   platform_device_unregister(priv->mii_pdev);
> - platform_device_put(priv->mii_pdev);
>  }
> 
Acked-by: Doug Berger 


Re: [PATCH] net: phy: leds: Add support for "link" trigger

2017-10-30 Thread Florian Fainelli
On 10/30/2017 11:46 AM, Maciej S. Szmigiero wrote:
> On 30.10.2017 19:36, Florian Fainelli wrote:
>> On 10/28/2017 08:27 AM, Maciej S. Szmigiero wrote:
>>> Currently, we create a LED trigger for any link speed known to a PHY.
>>> These triggers only fire when their exact link speed had been negotiated
>>> (they aren't cumulative, that is, they don't fire for "their or any higher"
>>> link speed).
>>>
>>> What we are missing, however, is a trigger which will fire on any link
>>> speed known to the PHY. Such trigger can then be used for implementing a
>>> poor man's substitute of the "link" LED on boards that lack it.
>>> Let's add it.
>>
>> The use case definitively makes sense, but the implementation a little less.
>>
>>>
>>> Signed-off-by: Maciej S. Szmigiero 
>>> ---
>>>  drivers/net/phy/Kconfig|  7 +--
>>>  drivers/net/phy/phy_led_triggers.c | 19 ---
>>>  2 files changed, 21 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig
>>> index cd931cf9dcc2..3bcc2107ad77 100644
>>> --- a/drivers/net/phy/Kconfig
>>> +++ b/drivers/net/phy/Kconfig
>>> @@ -191,11 +191,14 @@ config LED_TRIGGER_PHY
>>>   Adds support for a set of LED trigger events per-PHY.  Link
>>>   state change will trigger the events, for consumption by an
>>>   LED class driver.  There are triggers for each link speed currently
>>> - supported by the phy, and are of the form:
>>> + supported by the PHY and also a one common "link" trigger as a
>>> + logical-or of all the link speed ones.
>>> + All these triggers are named according to the following pattern:
>>>   ::
>>>  
>>>   Where speed is in the form:
>>> -   Mbps or Gbps
>>> +   Mbps OR Gbps OR link
>>> +   for any speed known to the PHY.
>>>  
>>>  
>>>  comment "MII PHY device drivers"
>>> diff --git a/drivers/net/phy/phy_led_triggers.c 
>>> b/drivers/net/phy/phy_led_triggers.c
>>> index 94ca42e630bb..5b6f1876f514 100644
>>> --- a/drivers/net/phy/phy_led_triggers.c
>>> +++ b/drivers/net/phy/phy_led_triggers.c
>>> @@ -20,7 +20,8 @@ static struct phy_led_trigger 
>>> *phy_speed_to_led_trigger(struct phy_device *phy,
>>>  {
>>> unsigned int i;
>>>  
>>> -   for (i = 0; i < phy->phy_num_led_triggers; i++) {
>>> +   /* the first (i = 0) trigger is for "any" speed */
>>> +   for (i = 1; i < phy->phy_num_led_triggers; i++) {
>>> if (phy->phy_led_triggers[i].speed == speed)
>>> return >phy_led_triggers[i];
>>
>> This looks unnecessary, because existing LED triggers are all relative
>> to a particular speed, you need to coerce the existing code into
>> accepting SPEED_UNKNOWN as a hint to mean "link". Just implement "link"
>> as a different trigger that does not need the speed argument/lookup to
>> be happening and don't change how the indexing into the array of speed
>> triggers is done. Just have a separate "link" trigger that is stored
>> separately from that existing array.
> 
> This way of implementation (overloading SPEED_UNKNOWN) needed least
> modification to the driver, but if a purer way is preferred then ok.

Sure, that makes it simpler to implement, but it seems to me this would
be cleaner if the "link" trigger was handled separately, the next thing
that comes to mind is actually implementing an "activity" trigger, and
this one is likely also reasonably speed independent.

> 
>> Also, what happens if I set both "link" and a given "speed" and my RJ-45
>> connector only has two LEDs, and one of them is already used for "activity"?
> 
> One LED can have only one trigger attached AFAIK.

True, so we should be fine, thanks!
-- 
Florian


Re: [PATCH v9 06/10] arm64: dts: allwinner: Restore EMAC changes

2017-10-30 Thread Corentin Labbe
On Fri, Oct 27, 2017 at 05:11:14PM +0200, Maxime Ripard wrote:
> On Tue, Oct 24, 2017 at 07:57:10PM +0200, Corentin Labbe wrote:
> > The original dwmac-sun8i DT bindings have some issue on how to handle
> > integrated PHY and was reverted in last RC of 4.13.
> > But now we have a solution so we need to get back that was reverted.
> > 
> > This patch restore arm64 DT about dwmac-sun8i
> > This reverts commit 87e1f5e8bb4b ("arm64: dts: allwinner: Revert EMAC 
> > changes")
> > 
> > Signed-off-by: Corentin Labbe 
> > ---
> >  .../boot/dts/allwinner/sun50i-a64-bananapi-m64.dts   | 16 
> >  .../boot/dts/allwinner/sun50i-a64-pine64-plus.dts| 15 +++
> >  arch/arm64/boot/dts/allwinner/sun50i-a64-pine64.dts  | 17 +
> >  .../dts/allwinner/sun50i-a64-sopine-baseboard.dts| 16 
> >  arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi| 20 
> > 
> >  .../boot/dts/allwinner/sun50i-h5-nanopi-neo2.dts | 17 +
> >  .../boot/dts/allwinner/sun50i-h5-orangepi-pc2.dts| 17 +
> >  .../boot/dts/allwinner/sun50i-h5-orangepi-prime.dts  | 17 +
> >  8 files changed, 135 insertions(+)
> 
> Can you split the changes between the A64 and the H5? It's going to be
> difficult to merge otherwise.
> 
> (You also forgot to add Florian's Acked-by on your whole serie).
> 

Will do

Thanks
Regards


Re: [PATCH] net: phy: leds: Add support for "link" trigger

2017-10-30 Thread Maciej S. Szmigiero
On 30.10.2017 19:36, Florian Fainelli wrote:
> On 10/28/2017 08:27 AM, Maciej S. Szmigiero wrote:
>> Currently, we create a LED trigger for any link speed known to a PHY.
>> These triggers only fire when their exact link speed had been negotiated
>> (they aren't cumulative, that is, they don't fire for "their or any higher"
>> link speed).
>>
>> What we are missing, however, is a trigger which will fire on any link
>> speed known to the PHY. Such trigger can then be used for implementing a
>> poor man's substitute of the "link" LED on boards that lack it.
>> Let's add it.
> 
> The use case definitively makes sense, but the implementation a little less.
> 
>>
>> Signed-off-by: Maciej S. Szmigiero 
>> ---
>>  drivers/net/phy/Kconfig|  7 +--
>>  drivers/net/phy/phy_led_triggers.c | 19 ---
>>  2 files changed, 21 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig
>> index cd931cf9dcc2..3bcc2107ad77 100644
>> --- a/drivers/net/phy/Kconfig
>> +++ b/drivers/net/phy/Kconfig
>> @@ -191,11 +191,14 @@ config LED_TRIGGER_PHY
>>Adds support for a set of LED trigger events per-PHY.  Link
>>state change will trigger the events, for consumption by an
>>LED class driver.  There are triggers for each link speed currently
>> -  supported by the phy, and are of the form:
>> +  supported by the PHY and also a one common "link" trigger as a
>> +  logical-or of all the link speed ones.
>> +  All these triggers are named according to the following pattern:
>>::
>>  
>>Where speed is in the form:
>> -Mbps or Gbps
>> +Mbps OR Gbps OR link
>> +for any speed known to the PHY.
>>  
>>  
>>  comment "MII PHY device drivers"
>> diff --git a/drivers/net/phy/phy_led_triggers.c 
>> b/drivers/net/phy/phy_led_triggers.c
>> index 94ca42e630bb..5b6f1876f514 100644
>> --- a/drivers/net/phy/phy_led_triggers.c
>> +++ b/drivers/net/phy/phy_led_triggers.c
>> @@ -20,7 +20,8 @@ static struct phy_led_trigger 
>> *phy_speed_to_led_trigger(struct phy_device *phy,
>>  {
>>  unsigned int i;
>>  
>> -for (i = 0; i < phy->phy_num_led_triggers; i++) {
>> +/* the first (i = 0) trigger is for "any" speed */
>> +for (i = 1; i < phy->phy_num_led_triggers; i++) {
>>  if (phy->phy_led_triggers[i].speed == speed)
>>  return >phy_led_triggers[i];
> 
> This looks unnecessary, because existing LED triggers are all relative
> to a particular speed, you need to coerce the existing code into
> accepting SPEED_UNKNOWN as a hint to mean "link". Just implement "link"
> as a different trigger that does not need the speed argument/lookup to
> be happening and don't change how the indexing into the array of speed
> triggers is done. Just have a separate "link" trigger that is stored
> separately from that existing array.

This way of implementation (overloading SPEED_UNKNOWN) needed least
modification to the driver, but if a purer way is preferred then ok.

> Also, what happens if I set both "link" and a given "speed" and my RJ-45
> connector only has two LEDs, and one of them is already used for "activity"?

One LED can have only one trigger attached AFAIK.

Maciej


Re: [PATCH net-next 3/7] net: dsa: get port type at parse time

2017-10-30 Thread Florian Fainelli
On 10/30/2017 11:46 AM, Vivien Didelot wrote:
> Hi Florian,
> 
> Florian Fainelli  writes:
> 
>> Reviewed-by: Florian Fainelli 
>>
>> One nit below:
>>
>>>  static int dsa_port_parse_of(struct dsa_port *dp, struct device_node *dn)
>>>  {
>>> +   struct device_node *ethernet = of_parse_phandle(dn, "ethernet", 0);
>>> +   struct device_node *link = of_parse_phandle(dn, "link", 0);
>>> +
>>> +   if (ethernet) {
>>> +   dp->type = DSA_PORT_TYPE_CPU;
>>> +   } else if (link) {
>>> +   dp->type = DSA_PORT_TYPE_DSA;
>>> +   } else {
>>> +   dp->type = DSA_PORT_TYPE_USER;
>>> +   }
>>> +
>>
>> The curly braces are probably not necessary since all of these are
>> single line statements.
> 
> I didn't mention it in the commit message because it was obvious that
> the next patches will extend these condition arms.

Fair enough then ;)
-- 
Florian


Re: [PATCH net-next 3/7] net: dsa: get port type at parse time

2017-10-30 Thread Vivien Didelot
Hi Florian,

Florian Fainelli  writes:

> Reviewed-by: Florian Fainelli 
>
> One nit below:
>
>>  static int dsa_port_parse_of(struct dsa_port *dp, struct device_node *dn)
>>  {
>> +struct device_node *ethernet = of_parse_phandle(dn, "ethernet", 0);
>> +struct device_node *link = of_parse_phandle(dn, "link", 0);
>> +
>> +if (ethernet) {
>> +dp->type = DSA_PORT_TYPE_CPU;
>> +} else if (link) {
>> +dp->type = DSA_PORT_TYPE_DSA;
>> +} else {
>> +dp->type = DSA_PORT_TYPE_USER;
>> +}
>> +
>
> The curly braces are probably not necessary since all of these are
> single line statements.

I didn't mention it in the commit message because it was obvious that
the next patches will extend these condition arms.


Thanks,

Vivien


Re: [PATCH net-next 7/7] net: dsa: remove name arg from slave create

2017-10-30 Thread Florian Fainelli
On 10/27/2017 12:55 PM, Vivien Didelot wrote:
> Now that slave dsa_port always have their name set, there is no need to
> pass it to dsa_slave_create() anymore. Remove this argument.
> 
> Signed-off-by: Vivien Didelot 

Reviewed-by: Florian Fainelli 
-- 
Florian


Re: [PATCH net-next 6/7] net: dsa: get port name at parse time

2017-10-30 Thread Florian Fainelli
On 10/27/2017 12:55 PM, Vivien Didelot wrote:
> Get the optional "label" property and assign a default one directly at
> parse time instead of doing it when creating the slave.
> 
> For legacy, simply assign the port name stored in cd->port_names.
> 
> Signed-off-by: Vivien Didelot 

Reviewed-by: Florian Fainelli 
-- 
Florian


Re: KASAN: use-after-free Read in sit_tunnel_xmit

2017-10-30 Thread Cong Wang
On Mon, Oct 30, 2017 at 8:34 AM, syzbot

wrote:
> Hello,
>
> syzkaller hit the following crash on
> 4dc12ffeaeac939097a3f55c881d3dc3523dff0c
> git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/master
> compiler: gcc (GCC) 7.1.1 20170620
> .config is attached
> Raw console output is attached.
>
>
>
>
> skbuff: bad partial csum: csum=53081/14726 len=2273
> ==
> BUG: KASAN: use-after-free in ipv6_get_dsfield include/net/dsfield.h:23
> [inline]
> BUG: KASAN: use-after-free in ipip6_tunnel_xmit net/ipv6/sit.c:968 [inline]
> BUG: KASAN: use-after-free in sit_tunnel_xmit+0x2a41/0x3130
> net/ipv6/sit.c:1016
> Read of size 2 at addr 8801c64afd00 by task syz-executor3/16942
>
> CPU: 0 PID: 16942 Comm: syz-executor3 Not tainted 4.14.0-rc5+ #97
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:16 [inline]
>  dump_stack+0x194/0x257 lib/dump_stack.c:52
>  print_address_description+0x73/0x250 mm/kasan/report.c:252
>  kasan_report_error mm/kasan/report.c:351 [inline]
>  kasan_report+0x25b/0x340 mm/kasan/report.c:409
>  __asan_report_load2_noabort+0x14/0x20 mm/kasan/report.c:428
>  ipv6_get_dsfield include/net/dsfield.h:23 [inline]
>  ipip6_tunnel_xmit net/ipv6/sit.c:968 [inline]
>  sit_tunnel_xmit+0x2a41/0x3130 net/ipv6/sit.c:1016
>  __netdev_start_xmit include/linux/netdevice.h:4022 [inline]
>  netdev_start_xmit include/linux/netdevice.h:4031 [inline]
>  xmit_one net/core/dev.c:3008 [inline]
>  dev_hard_start_xmit+0x248/0xac0 net/core/dev.c:3024
>  __dev_queue_xmit+0x17d2/0x2070 net/core/dev.c:3505
>  dev_queue_xmit+0x17/0x20 net/core/dev.c:3538
>  neigh_direct_output+0x15/0x20 net/core/neighbour.c:1390
>  neigh_output include/net/neighbour.h:481 [inline]
>  ip6_finish_output2+0xad1/0x22a0 net/ipv6/ip6_output.c:120
>  ip6_fragment+0x25ae/0x3420 net/ipv6/ip6_output.c:723
>  ip6_finish_output+0x319/0x920 net/ipv6/ip6_output.c:144
>  NF_HOOK_COND include/linux/netfilter.h:238 [inline]
>  ip6_output+0x1f4/0x850 net/ipv6/ip6_output.c:163
>  dst_output include/net/dst.h:459 [inline]
>  ip6_local_out+0x95/0x160 net/ipv6/output_core.c:176
>  ip6_send_skb+0xa1/0x330 net/ipv6/ip6_output.c:1658
>  ip6_push_pending_frames+0xb3/0xe0 net/ipv6/ip6_output.c:1678
>  rawv6_push_pending_frames net/ipv6/raw.c:616 [inline]
>  rawv6_sendmsg+0x2eb9/0x3e40 net/ipv6/raw.c:935
>  inet_sendmsg+0x11f/0x5e0 net/ipv4/af_inet.c:763
>  sock_sendmsg_nosec net/socket.c:633 [inline]
>  sock_sendmsg+0xca/0x110 net/socket.c:643
>  SYSC_sendto+0x352/0x5a0 net/socket.c:1750
>  SyS_sendto+0x40/0x50 net/socket.c:1718
>  entry_SYSCALL_64_fastpath+0x1f/0xbe
> RIP: 0033:0x452869
> RSP: 002b:7fe3c12e5be8 EFLAGS: 0212 ORIG_RAX: 002c
> RAX: ffda RBX: 007580d8 RCX: 00452869
> RDX: 07f1 RSI: 2013b7ff RDI: 0014
> RBP: 0161 R08: 204e8fe4 R09: 001c
> R10: 0100 R11: 0212 R12: 006f01b8
> R13:  R14: 7fe3c12e66d4 R15: 0017
>
> Allocated by task 16924:
>  save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
>  save_stack+0x43/0xd0 mm/kasan/kasan.c:447
>  set_track mm/kasan/kasan.c:459 [inline]
>  kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551
>  __do_kmalloc_node mm/slab.c:3689 [inline]
>  __kmalloc_node_track_caller+0x47/0x70 mm/slab.c:3703
>  __kmalloc_reserve.isra.40+0x41/0xd0 net/core/skbuff.c:138
>  __alloc_skb+0x13b/0x780 net/core/skbuff.c:206
>  alloc_skb include/linux/skbuff.h:985 [inline]
>  sock_wmalloc+0x140/0x1d0 net/core/sock.c:1932
>  __ip6_append_data.isra.43+0x2681/0x3340 net/ipv6/ip6_output.c:1397
>  ip6_append_data+0x189/0x290 net/ipv6/ip6_output.c:1552
>  rawv6_sendmsg+0x1dd9/0x3e40 net/ipv6/raw.c:928
>  inet_sendmsg+0x11f/0x5e0 net/ipv4/af_inet.c:763
>  sock_sendmsg_nosec net/socket.c:633 [inline]
>  sock_sendmsg+0xca/0x110 net/socket.c:643
>  SYSC_sendto+0x352/0x5a0 net/socket.c:1750
>  SyS_sendto+0x40/0x50 net/socket.c:1718
>  entry_SYSCALL_64_fastpath+0x1f/0xbe
>
> Freed by task 16942:
>  save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
>  save_stack+0x43/0xd0 mm/kasan/kasan.c:447
>  set_track mm/kasan/kasan.c:459 [inline]
>  kasan_slab_free+0x71/0xc0 mm/kasan/kasan.c:524
>  __cache_free mm/slab.c:3503 [inline]
>  kfree+0xca/0x250 mm/slab.c:3820
>  skb_free_head+0x74/0xb0 net/core/skbuff.c:554
>  pskb_expand_head+0x36b/0x1210 net/core/skbuff.c:1494
>  __pskb_pull_tail+0x14a/0x17c0 net/core/skbuff.c:1877
>  pskb_may_pull include/linux/skbuff.h:2102 [inline]
>  _decode_session6+0x8a4/0x1100 net/ipv6/xfrm6_policy.c:148

Looks like we should indicate error by returning some int
for afinfo->decode_session().



>  __xfrm_decode_session+0x63/0x100 net/xfrm/xfrm_policy.c:2343
>  xfrm_decode_session_reverse include/net/xfrm.h:1181 

Re: [PATCH net-next 5/7] net: dsa: get master device at port parsing time

2017-10-30 Thread Florian Fainelli
On 10/27/2017 12:55 PM, Vivien Didelot wrote:
> Fetching the master device can be done directly when a port is parsed
> from device tree or pdata, instead of waiting until dsa_dst_parse.
> 
> Now that -EPROBE_DEFER is returned before we add the switch to the tree,
> there is no need to check for this error after dsa_dst_parse.
> 
> Signed-off-by: Vivien Didelot 

Reviewed-by: Florian Fainelli 
-- 
Florian


Re: [PATCH net-next 3/7] net: dsa: get port type at parse time

2017-10-30 Thread Florian Fainelli
On 10/27/2017 12:55 PM, Vivien Didelot wrote:
> Assign a port's type at parsed time instead of waiting for the tree to
> be completed.
> 
> Because this is now done earlier, we can use the port's type in
> dsa_port_is_* helpers instead of digging again in topology description.
> 
> Signed-off-by: Vivien Didelot 

Reviewed-by: Florian Fainelli 

One nit below:


>  static int dsa_port_parse_of(struct dsa_port *dp, struct device_node *dn)
>  {
> + struct device_node *ethernet = of_parse_phandle(dn, "ethernet", 0);
> + struct device_node *link = of_parse_phandle(dn, "link", 0);
> +
> + if (ethernet) {
> + dp->type = DSA_PORT_TYPE_CPU;
> + } else if (link) {
> + dp->type = DSA_PORT_TYPE_DSA;
> + } else {
> + dp->type = DSA_PORT_TYPE_USER;
> + }
> +

The curly braces are probably not necessary since all of these are
single line statements.

>   dp->dn = dn;
>  
>   return 0;
> @@ -630,6 +629,14 @@ static int dsa_parse_ports_of(struct device_node *dn, 
> struct dsa_switch *ds)
>  static int dsa_port_parse(struct dsa_port *dp, const char *name,
> struct device *dev)
>  {
> + if (!strcmp(name, "cpu")) {
> + dp->type = DSA_PORT_TYPE_CPU;
> + } else if (!strcmp(name, "dsa")) {
> + dp->type = DSA_PORT_TYPE_DSA;
> + } else {
> + dp->type = DSA_PORT_TYPE_USER;
> + }

Likewise.
-- 
Florian


Re: [PATCH net-next 2/7] net: dsa: add port parse functions

2017-10-30 Thread Florian Fainelli
On 10/27/2017 12:55 PM, Vivien Didelot wrote:
> Add symmetrical DSA port parsing functions for pdata and device tree,
> used to parse and validate a given port node or platform data.
> 
> They don't do much for the moment but will be extended later on to
> assign a port type and get device references.
> 
> Signed-off-by: Vivien Didelot 

Reviewed-by: Florian Fainelli 
-- 
Florian


Re: [PATCH net-next 1/7] net: dsa: get ports within parsing code

2017-10-30 Thread Florian Fainelli
On 10/27/2017 12:55 PM, Vivien Didelot wrote:
> There is no point into hiding the -EINVAL error code in ERR_PTR from a
> dsa_get_ports function, simply get the "ports" node directly from within
> the dsa_parse_ports_dn function.
> 
> This also has the effect to make the pdata and device tree handling code
> symmetrical inside _dsa_register_switch.
> 
> At the same time, rename dsa_parse_ports_dn to dsa_parse_ports_of
> because _of is a more common suffix for device tree parsing functions.
> 
> Signed-off-by: Vivien Didelot 

Reviewed-by: Florian Fainelli 
-- 
Florian


Re: [PATCH] net: phy: leds: Add support for "link" trigger

2017-10-30 Thread Florian Fainelli
On 10/28/2017 08:27 AM, Maciej S. Szmigiero wrote:
> Currently, we create a LED trigger for any link speed known to a PHY.
> These triggers only fire when their exact link speed had been negotiated
> (they aren't cumulative, that is, they don't fire for "their or any higher"
> link speed).
> 
> What we are missing, however, is a trigger which will fire on any link
> speed known to the PHY. Such trigger can then be used for implementing a
> poor man's substitute of the "link" LED on boards that lack it.
> Let's add it.

The use case definitively makes sense, but the implementation a little less.

> 
> Signed-off-by: Maciej S. Szmigiero 
> ---
>  drivers/net/phy/Kconfig|  7 +--
>  drivers/net/phy/phy_led_triggers.c | 19 ---
>  2 files changed, 21 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig
> index cd931cf9dcc2..3bcc2107ad77 100644
> --- a/drivers/net/phy/Kconfig
> +++ b/drivers/net/phy/Kconfig
> @@ -191,11 +191,14 @@ config LED_TRIGGER_PHY
> Adds support for a set of LED trigger events per-PHY.  Link
> state change will trigger the events, for consumption by an
> LED class driver.  There are triggers for each link speed currently
> -   supported by the phy, and are of the form:
> +   supported by the PHY and also a one common "link" trigger as a
> +   logical-or of all the link speed ones.
> +   All these triggers are named according to the following pattern:
> ::
>  
> Where speed is in the form:
> - Mbps or Gbps
> + Mbps OR Gbps OR link
> + for any speed known to the PHY.
>  
>  
>  comment "MII PHY device drivers"
> diff --git a/drivers/net/phy/phy_led_triggers.c 
> b/drivers/net/phy/phy_led_triggers.c
> index 94ca42e630bb..5b6f1876f514 100644
> --- a/drivers/net/phy/phy_led_triggers.c
> +++ b/drivers/net/phy/phy_led_triggers.c
> @@ -20,7 +20,8 @@ static struct phy_led_trigger 
> *phy_speed_to_led_trigger(struct phy_device *phy,
>  {
>   unsigned int i;
>  
> - for (i = 0; i < phy->phy_num_led_triggers; i++) {
> + /* the first (i = 0) trigger is for "any" speed */
> + for (i = 1; i < phy->phy_num_led_triggers; i++) {
>   if (phy->phy_led_triggers[i].speed == speed)
>   return >phy_led_triggers[i];

This looks unnecessary, because existing LED triggers are all relative
to a particular speed, you need to coerce the existing code into
accepting SPEED_UNKNOWN as a hint to mean "link". Just implement "link"
as a different trigger that does not need the speed argument/lookup to
be happening and don't change how the indexing into the array of speed
triggers is done. Just have a separate "link" trigger that is stored
separately from that existing array.

Also, what happens if I set both "link" and a given "speed" and my RJ-45
connector only has two LEDs, and one of them is already used for "activity"?
-- 
Florian


Re: [PATCH v9 02/10] dt-bindings: net: dwmac-sun8i: update documentation about integrated PHY

2017-10-30 Thread Corentin Labbe
On Fri, Oct 27, 2017 at 09:37:10AM -0500, Rob Herring wrote:
> On Tue, Oct 24, 2017 at 07:57:06PM +0200, Corentin Labbe wrote:
> > This patch add documentation about the MDIO switch used on sun8i-h3-emac
> > for integrated PHY.
> > 
> > Signed-off-by: Corentin Labbe 
> > ---
> >  .../devicetree/bindings/net/dwmac-sun8i.txt| 145 
> > +++--
> >  1 file changed, 133 insertions(+), 12 deletions(-)
> > 
> > diff --git a/Documentation/devicetree/bindings/net/dwmac-sun8i.txt 
> > b/Documentation/devicetree/bindings/net/dwmac-sun8i.txt
> > index 725f3b187886..2600ce9ad3cc 100644
> > --- a/Documentation/devicetree/bindings/net/dwmac-sun8i.txt
> > +++ b/Documentation/devicetree/bindings/net/dwmac-sun8i.txt
> > @@ -4,18 +4,18 @@ This device is a platform glue layer for stmmac.
> >  Please see stmmac.txt for the other unchanged properties.
> >  
> >  Required properties:
> > -- compatible: should be one of the following string:
> > +- compatible: must be one of the following string:
> > "allwinner,sun8i-a83t-emac"
> > "allwinner,sun8i-h3-emac"
> > "allwinner,sun8i-v3s-emac"
> > "allwinner,sun50i-a64-emac"
> >  - reg: address and length of the register for the device.
> >  - interrupts: interrupt for the device
> > -- interrupt-names: should be "macirq"
> > +- interrupt-names: must be "macirq"
> >  - clocks: A phandle to the reference clock for this device
> > -- clock-names: should be "stmmaceth"
> > +- clock-names: must be "stmmaceth"
> >  - resets: A phandle to the reset control for this device
> > -- reset-names: should be "stmmaceth"
> > +- reset-names: must be "stmmaceth"
> >  - phy-mode: See ethernet.txt
> >  - phy-handle: See ethernet.txt
> >  - #address-cells: shall be 1
> > @@ -39,23 +39,42 @@ Optional properties for the following compatibles:
> >  - allwinner,leds-active-low: EPHY LEDs are active low
> >  
> >  Required child node of emac:
> > -- mdio bus node: should be named mdio
> > +- mdio bus node: with compatible "snps,dwmac-mdio"
> 
> It should still be named mdio.
> 
> >  
> >  Required properties of the mdio node:
> >  - #address-cells: shall be 1
> >  - #size-cells: shall be 0
> >  
> > -The device node referenced by "phy" or "phy-handle" should be a child node
> > +The device node referenced by "phy" or "phy-handle" must be a child node
> >  of the mdio node. See phy.txt for the generic PHY bindings.
> >  
> > -Required properties of the phy node with the following compatibles:
> > +The following compatibles require that the emac node have a mdio-mux child
> > +node called "mdio-mux":
> > +  - "allwinner,sun8i-h3-emac"
> > +  - "allwinner,sun8i-v3s-emac":
> > +Required properties for the mdio-mux node:
> > +  - compatible = "allwinner,sun8i-h3-mdio-mux"
> > +  - mdio-parent-bus: a phandle to EMAC mdio
> > +  - one child mdio for the integrated mdio with the compatible
> > +"allwinner,sun8i-h3-mdio-internal"
> > +  - one child mdio for the external mdio if present (V3s have none)
> > +Required properties for the mdio-mux children node:
> > +  - reg: 1 for internal MDIO bus, 2 for external MDIO bus
> > +
> > +The following compatibles require a PHY node representing the integrated
> > +PHY, under the integrated MDIO bus node if an mdio-mux node is used:
> >- "allwinner,sun8i-h3-emac",
> >- "allwinner,sun8i-v3s-emac":
> > +
> > +Additional information regarding generic multiplexer properties can be 
> > found
> > +at Documentation/devicetree/bindings/net/mdio-mux.txt
> > +
> > +Required properties of the integrated phy node:
> >  - clocks: a phandle to the reference clock for the EPHY
> >  - resets: a phandle to the reset control for the EPHY
> > +- Must be a child of the integrated mdio
> >  
> > -Example:
> > -
> > +Example with integrated PHY:
> >  emac: ethernet@1c0b000 {
> > compatible = "allwinner,sun8i-h3-emac";
> > syscon = <>;
> > @@ -72,13 +91,115 @@ emac: ethernet@1c0b000 {
> > phy-handle = <_mii_phy>;
> > phy-mode = "mii";
> > allwinner,leds-active-low;
> > +
> > +   mdio0: mdio {
> > +   #address-cells = <1>;
> > +   #size-cells = <0>;
> > +   compatible = "snps,dwmac-mdio";
> > +   };
> > +
> > +   mdio-mux {
> > +   compatible = "mdio-mux", "allwinner,sun8i-h3-mdio-mux";
> 
> Drop mdio-mux.
> 
> With those fixed,
> 
> Acked-by: Rob Herring 

Will change what you ask.

Thanks
Regards


Re: KASAN: use-after-free Write in detach_if_pending

2017-10-30 Thread Eric Dumazet
On Mon, 2017-10-30 at 20:52 +0300, Dmitry Vyukov wrote:

> syz fix:   patch title
> 
> then that's doable. If we agree on this format, then I am ready to
> implement this.

Yes please.

David Miller and Linus never rebase their trees, for good reasons.




Re: [PATCH] net: ethernet: slicoss: remove redundant initialization of idx

2017-10-30 Thread Lino Sanfilippo
On 30.10.2017 19:04, Jakub Kicinski wrote:
> On Sun, 29 Oct 2017 13:38:09 +, Colin King wrote:
>> From: Colin Ian King 
>>
>> Variable idx is being initialized and later on over-written by
>> a new value in a do-loop without the initial value ever being
>> read. Hence the initializion is redundant and can be removed.
>> Cleans up clang warning:
>>
>> drivers/net/ethernet/alacritech/slicoss.c:358:15: warning: Value
>> stored to 'idx' during its initialization is never read
>>
>> Signed-off-by: Colin Ian King 
>> ---
>>  drivers/net/ethernet/alacritech/slicoss.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/ethernet/alacritech/slicoss.c 
>> b/drivers/net/ethernet/alacritech/slicoss.c
>> index 15a8096c60df..ac8004186fa8 100644
>> --- a/drivers/net/ethernet/alacritech/slicoss.c
>> +++ b/drivers/net/ethernet/alacritech/slicoss.c
>> @@ -355,7 +355,7 @@ static void slic_xmit_complete(struct slic_device *sdev)
>>  {
>>  struct slic_tx_queue *txq = >txq;
>>  struct net_device *dev = sdev->netdev;
>> -unsigned int idx = txq->done_idx;
>> +unsigned int idx;
>>  struct slic_tx_buffer *buff;
>>  unsigned int frames = 0;
>>  unsigned int bytes = 0;
> 
> You should probably reorder the variables now so they stay longest to
> shortest.
> 

I agree. While the change itself is ok, it would be nice if
we could keep the order of the variables by length. 

Regards,
Lino


[PATCH iproute2 3/3] xfrm_{state,policy}: Allow to deleteall polices/states with marks

2017-10-30 Thread Thomas Egerer
Using 'ip deleteall' with policies that have marks, fails unless you
eplicitely specify the mark values. This is very uncomfortable when
bulk-deleting policies and states. With this patch all relevant states
and policies are wiped by 'ip deleteall' regardless of their mark
values.

Signed-off-by: Thomas Egerer 
---
 ip/xfrm_policy.c |  9 +
 ip/xfrm_state.c  | 12 
 2 files changed, 21 insertions(+)

diff --git a/ip/xfrm_policy.c b/ip/xfrm_policy.c
index e2fa771..d544026 100644
--- a/ip/xfrm_policy.c
+++ b/ip/xfrm_policy.c
@@ -753,6 +753,15 @@ static int xfrm_policy_keep(const struct sockaddr_nl *who,
xpid->dir = xpinfo->dir;
xpid->index = xpinfo->index;
 
+   if (tb[XFRMA_MARK]) {
+   int r = addattr_l(new_n, xb->size, XFRMA_MARK,
+   (void *)RTA_DATA(tb[XFRMA_MARK]), 
tb[XFRMA_MARK]->rta_len);
+   if (r < 0) {
+   fprintf(stderr, "%s: XFRMA_MARK failed\n", __func__);
+   exit(1);
+   }
+   }
+
xb->offset += new_n->nlmsg_len;
xb->nlmsg_count++;
 
diff --git a/ip/xfrm_state.c b/ip/xfrm_state.c
index 3e20d6e..85d959c 100644
--- a/ip/xfrm_state.c
+++ b/ip/xfrm_state.c
@@ -1081,6 +1081,7 @@ static int xfrm_state_keep(const struct sockaddr_nl *who,
int len = n->nlmsg_len;
struct nlmsghdr *new_n;
struct xfrm_usersa_id *xsid;
+   struct rtattr *tb[XFRMA_MAX+1];
 
if (n->nlmsg_type != XFRM_MSG_NEWSA) {
fprintf(stderr, "Not a state: %08x %08x %08x\n",
@@ -1117,6 +1118,17 @@ static int xfrm_state_keep(const struct sockaddr_nl *who,
addattr_l(new_n, xb->size, XFRMA_SRCADDR, >saddr,
  sizeof(xsid->daddr));
 
+   parse_rtattr(tb, XFRMA_MAX, XFRMS_RTA(xsinfo), len);
+
+   if (tb[XFRMA_MARK]) {
+   int r = addattr_l(new_n, xb->size, XFRMA_MARK,
+   (void *)RTA_DATA(tb[XFRMA_MARK]), 
tb[XFRMA_MARK]->rta_len);
+   if (r < 0) {
+   fprintf(stderr, "%s: XFRMA_MARK failed\n", __func__);
+   exit(1);
+   }
+   }
+
xb->offset += new_n->nlmsg_len;
xb->nlmsg_count++;
 
-- 
2.6.4



[PATCH iproute2 2/3] xfrm_policy: Do not attempt to deleteall a socket policy

2017-10-30 Thread Thomas Egerer
Socket polices are added to a socket using setsockopt(2). They cannot be
deleted by iproute2. The attempt to delete them causes an error
(EINVAL).
To avoid this unnecessary error message all socket policies are skipped
in xfrm_policy_keep.

Signed-off-by: Thomas Egerer 
---
 ip/xfrm_policy.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/ip/xfrm_policy.c b/ip/xfrm_policy.c
index 7d2139e..e2fa771 100644
--- a/ip/xfrm_policy.c
+++ b/ip/xfrm_policy.c
@@ -735,6 +735,10 @@ static int xfrm_policy_keep(const struct sockaddr_nl *who,
if (!xfrm_policy_filter_match(xpinfo, ptype))
return 0;
 
+   /* can't delete socket policies */
+   if (xpinfo->dir >= XFRM_POLICY_MAX)
+   return 0;
+
if (xb->offset + NLMSG_LENGTH(sizeof(*xpid)) > xb->size)
return 0;
 
-- 
2.6.4




[PATCH iproute2 1/3] xfrm_policy: Add filter option for socket policies

2017-10-30 Thread Thomas Egerer
Listing policies on systems with a lot of socket policies can be
confusing due to the number of returned polices. Even if socket polices
are not of interest, they cannot be filtered. This patch adds an option
to filter all socket policies from the output.

Signed-off-by: Thomas Egerer 
---
 ip/xfrm.h| 1 +
 ip/xfrm_policy.c | 8 +++-
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/ip/xfrm.h b/ip/xfrm.h
index 54d80ce..8566d63 100644
--- a/ip/xfrm.h
+++ b/ip/xfrm.h
@@ -90,6 +90,7 @@ struct xfrm_filter {
__u8 action_mask;
__u32 priority_mask;
__u8 policy_flags_mask;
+   __u8 filter_socket;
 
__u8 ptype;
__u8 ptype_mask;
diff --git a/ip/xfrm_policy.c b/ip/xfrm_policy.c
index 98460a0..7d2139e 100644
--- a/ip/xfrm_policy.c
+++ b/ip/xfrm_policy.c
@@ -58,7 +58,7 @@ static void usage(void)
fprintf(stderr, "[ LIMIT-LIST ] [ TMPL-LIST ]\n");
fprintf(stderr, "Usage: ip xfrm policy { delete | get } { SELECTOR | 
index INDEX } dir DIR\n");
fprintf(stderr, "[ ctx CTX ] [ mark MARK [ mask MASK ] ] [ 
ptype PTYPE ]\n");
-   fprintf(stderr, "Usage: ip xfrm policy { deleteall | list } [ SELECTOR 
] [ dir DIR ]\n");
+   fprintf(stderr, "Usage: ip xfrm policy { deleteall | list } [ nosock ] 
[ SELECTOR ] [ dir DIR ]\n");
fprintf(stderr, "[ index INDEX ] [ ptype PTYPE ] [ action 
ACTION ] [ priority PRIORITY ]\n");
fprintf(stderr, "[ flag FLAG-LIST ]\n");
fprintf(stderr, "Usage: ip xfrm policy flush [ ptype PTYPE ]\n");
@@ -403,6 +403,9 @@ static int xfrm_policy_filter_match(struct 
xfrm_userpolicy_info *xpinfo,
if ((xpinfo->dir^filter.xpinfo.dir)_mask)
return 0;
 
+   if (filter.filter_socket && (xpinfo->dir >= XFRM_POLICY_MAX))
+   return 0;
+
if ((ptype^filter.ptype)_mask)
return 0;
 
@@ -806,6 +809,9 @@ static int xfrm_policy_list_or_deleteall(int argc, char 
**argv, int deleteall)
 
filter.policy_flags_mask = XFRM_FILTER_MASK_FULL;
 
+   } else if (strcmp(*argv, "nosock") == 0) {
+   /* filter all socket-based policies */
+   filter.filter_socket = 1;
} else {
if (selp)
invarg("unknown", *argv);
-- 
2.6.4




  1   2   3   >