[PATCH] rhashtable: don't attempt to grow when at max_size

2015-04-23 Thread Johannes Berg
From: Johannes Berg johannes.b...@intel.com

The conversion of mac80211's station table to rhashtable had a bug
that I found by accident in code review, that hadn't been found as
rhashtable apparently managed to have a maximum hash chain length
of one (!) in all our testing.

In order to test the bug and verify the fix I set my rhashtable's
max_size very low (4) in order to force getting hash collisions.

At that point, rhashtable WARNed in rhashtable_insert_rehash() but
didn't actually reject the hash table insertion. This caused it to
lose insertions - my master list of stations would have 9 entries,
but the rhashtable only had 5. This may warrant a deeper look, but
that WARN_ON() just shouldn't happen.

Fix this by not returning true from rht_grow_above_100() when the
rhashtable's max_size has been reached - in this case the user is
explicitly configuring it to be at most that big, so even if it's
now above 100% it shouldn't attempt to resize.

This fixes the lost insertion issue and consequently allows my
code to display its error (and verify my fix for it.)

Signed-off-by: Johannes Berg johannes.b...@intel.com
---
 include/linux/rhashtable.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
index e23d242d1230..dbcbcc59aa92 100644
--- a/include/linux/rhashtable.h
+++ b/include/linux/rhashtable.h
@@ -282,7 +282,8 @@ static inline bool rht_shrink_below_30(const struct 
rhashtable *ht,
 static inline bool rht_grow_above_100(const struct rhashtable *ht,
  const struct bucket_table *tbl)
 {
-   return atomic_read(ht-nelems)  tbl-size;
+   return atomic_read(ht-nelems)  tbl-size 
+   (!ht-p.max_size || tbl-size  ht-p.max_size);
 }
 
 /* The bucket lock is selected based on the hash and protects mutations
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] rhashtable: don't attempt to grow when at max_size

2015-04-23 Thread David Miller
From: Johannes Berg johan...@sipsolutions.net
Date: Thu, 23 Apr 2015 16:38:43 +0200

 From: Johannes Berg johannes.b...@intel.com
 
 The conversion of mac80211's station table to rhashtable had a bug
 that I found by accident in code review, that hadn't been found as
 rhashtable apparently managed to have a maximum hash chain length
 of one (!) in all our testing.
 
 In order to test the bug and verify the fix I set my rhashtable's
 max_size very low (4) in order to force getting hash collisions.
 
 At that point, rhashtable WARNed in rhashtable_insert_rehash() but
 didn't actually reject the hash table insertion. This caused it to
 lose insertions - my master list of stations would have 9 entries,
 but the rhashtable only had 5. This may warrant a deeper look, but
 that WARN_ON() just shouldn't happen.
 
 Fix this by not returning true from rht_grow_above_100() when the
 rhashtable's max_size has been reached - in this case the user is
 explicitly configuring it to be at most that big, so even if it's
 now above 100% it shouldn't attempt to resize.
 
 This fixes the lost insertion issue and consequently allows my
 code to display its error (and verify my fix for it.)
 
 Signed-off-by: Johannes Berg johannes.b...@intel.com

It looks fine to me, but I'll let Herbert and Thomas review this.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: CCM/GCM implementation defect

2015-04-23 Thread Paul Wouters
On 04/23/2015 07:45 AM, Steffen Klassert wrote:
 On Thu, Apr 23, 2015 at 11:26:20AM +0800, Herbert Xu wrote:
 Hi:

 It looks like our IPsec implementations of CCM and GCM are buggy
 in that they don't include the IV in the authentication calculation.
 
 Seems like crypto_rfc4106_crypt() passes the associated data it
 got from ESP directly to gcm, without chaining with the IV.
 

 This definitely breaks interoperability with anyone who implements
 them correctly.  The fact that there have been no reports on this
 probably means that nobody has run into this in the field yet.

 On the security side, this is probably not a big deal for CCM
 because it always verifies the authentication tag after decryption.
 But for GCM this may be a DoS issue as an attacker could modify
 the IV without triggering the authentication check and thus cause
 an unnecessary decryption.  For both CCM and GCM this will result
 in random data injected as a packet into the network stack which
 hopefully will be dropped.

 In order to fix this without breaking backwards compatibility,
 my plan is to introduce new templates such as rfc4106v2 which
 implement the RFC correctly.  The existing templates will be
 retained so that current users aren't broken by the fix.
 
 Adding a second template for the correct implementation is
 probaply the only thing that we can do if we don't want to
 break backwards compatibility. But maybe we can add a warning
 to the old implementation, such that users notice that they
 use a broken version.

Unless we have a cryptographer indicate to us how that this mistake does not 
significantly reduce or break the confidentiality and authentication, I do not
think we should keep a known broken implementation around. Lets say this brings 
GCM security down to ROT13, then it just needs to die without keeping
interoperability.

Additionally, looking at how long we suffered and still suffer from defaulting 
to something broken (sha2_256 truncation) I'm really not in favour of keeping
broken crypto implementations around.

Paul

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2] ibmveth: Fix off-by-one error in ibmveth_change_mtu()

2015-04-23 Thread David Miller
From: David Gibson da...@gibson.dropbear.id.au
Date: Thu, 23 Apr 2015 14:43:05 +1000

 AFAIK the PAPR document which defines the virtual device interface used by
 the ibmveth driver doesn't specify a specific maximum MTU.  So, in the
 ibmveth driver, the maximum allowed MTU is determined by the maximum
 allocated buffer size of 64k (corresponding to one page in the common case)
 minus the per-buffer overhead IBMVETH_BUFF_OH (which has value 22 for 14
 bytes of ethernet header, plus 8 bytes for an opaque handle).
 
 This suggests a maximum allowable MTU of 65514 bytes, but in fact the
 driver only permits a maximum MTU of 65513.  This is because there is a 
 instead of an = in ibmveth_change_mtu(), which only permits an MTU which
 is strictly smaller than the buffer size, rather than allowing the buffer
 to be completely filled.
 
 This patch fixes the buglet.
 
 Signed-off-by: David Gibson da...@gibson.dropbear.id.au
 Acked-by: Thomas Falcon tlfal...@linux.vnet.ibm.com

Applied, thank you.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] rhashtable: don't attempt to grow when at max_size

2015-04-23 Thread Daniel Borkmann

On 04/23/2015 06:09 PM, Johannes Berg wrote:

On Thu, 2015-04-23 at 11:59 -0400, David Miller wrote:


This fixes the lost insertion issue and consequently allows my
code to display its error (and verify my fix for it.)



It looks fine to me, but I'll let Herbert and Thomas review this.


Oh, sorry, didn't know Herbert was also involved with this.


Would be good to add Fixes tags, commit in question seems to be:
ccd57b1bd324 (rhashtable: Add immediate rehash during insertion)


Anyway, that's a good idea. I really went for this looks obvious and
patched it, and it started working for me thereafter. I don't know
anything about side effects here.

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fix tcp fin memory accounting

2015-04-23 Thread David Miller
From: Eric Dumazet eric.duma...@gmail.com
Date: Thu, 23 Apr 2015 07:02:47 -0700

 + if (!tcp_send_head(sk)) {
 + tp-snd_nxt++;
 + return;
 + }

I'm not so sure about this.  Why is this needed?

Otherwise patch looks fine to me.

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] rhashtable: don't attempt to grow when at max_size

2015-04-23 Thread Johannes Berg
On Thu, 2015-04-23 at 11:59 -0400, David Miller wrote:

  This fixes the lost insertion issue and consequently allows my
  code to display its error (and verify my fix for it.)

 It looks fine to me, but I'll let Herbert and Thomas review this.

Oh, sorry, didn't know Herbert was also involved with this.

Anyway, that's a good idea. I really went for this looks obvious and
patched it, and it started working for me thereafter. I don't know
anything about side effects here.

johannes


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC] openvswitch: add support for netpoll

2015-04-23 Thread Konstantin Khlebnikov
This patch simply forwards unicast netpoll packets via one of physical
interface in datapath depending on source mac address from the skb.

It seems possible to use common net flow classification for netpoll but
there is no way to guarantee presence of right flow in kernel cache.

Signed-off-by: Konstantin Khlebnikov khlebni...@yandex-team.ru
---
 net/openvswitch/vport-internal_dev.c |   74 ++
 net/openvswitch/vport-netdev.c   |   63 -
 net/openvswitch/vport-netdev.h   |   15 +++
 3 files changed, 148 insertions(+), 4 deletions(-)

diff --git a/net/openvswitch/vport-internal_dev.c 
b/net/openvswitch/vport-internal_dev.c
index 6a55f7105505..d1eb09ac09e8 100644
--- a/net/openvswitch/vport-internal_dev.c
+++ b/net/openvswitch/vport-internal_dev.c
@@ -23,6 +23,7 @@
 #include linux/etherdevice.h
 #include linux/ethtool.h
 #include linux/skbuff.h
+#include linux/netpoll.h
 
 #include net/dst.h
 #include net/xfrm.h
@@ -66,11 +67,77 @@ static struct rtnl_link_stats64 
*internal_dev_get_stats(struct net_device *netde
return stats;
 }
 
+#ifdef CONFIG_NET_POLL_CONTROLLER
+
+static void internal_dev_poll_controller(struct net_device *dev)
+{
+}
+
+static struct netdev_vport *get_local_netdev_vport(struct net_device *dev)
+{
+   struct datapath *dp = internal_dev_priv(dev)-vport-dp;
+
+   return netdev_vport_priv(ovs_lookup_vport(dp, OVSP_LOCAL));
+}
+
+static int internal_dev_netpoll_setup(struct net_device *internal_dev,
+ struct netpoll_info *info)
+{
+   struct netdev_vport *local = get_local_netdev_vport(internal_dev);
+   struct netdev_vport *lower;
+   struct list_head *iter;
+   int ret = -EOPNOTSUPP;
+
+   ASSERT_RTNL();
+   /* succeed if at least one lower device can handle netpoll */
+   netdev_for_each_lower_private(local-dev, lower, iter)
+   if (!ovs_netdev_netpoll_enable(lower))
+   ret = 0;
+   /* enable netpoll on local device as mark for new devices */
+   if (!ret  local-dev != internal_dev)
+   ovs_netdev_netpoll_enable(local);
+   return ret;
+}
+
+static void internal_dev_netpoll_cleanup(struct net_device *internal_dev)
+{
+   struct netdev_vport *local = get_local_netdev_vport(internal_dev);
+   struct netdev_vport *lower;
+   struct list_head *iter;
+
+   /* FIXME needs reference counting for more than one netpoll in dp */
+
+   ASSERT_RTNL();
+   netdev_for_each_lower_private(local-dev, lower, iter)
+   ovs_netdev_netpoll_disable(lower);
+   ovs_netdev_netpoll_disable(local);
+}
+
+static void internal_dev_netpoll_xmit(struct sk_buff *skb,
+ struct net_device *internal_dev)
+{
+   struct netdev_vport *local = get_local_netdev_vport(internal_dev);
+   struct netdev_vport *lower;
+   struct list_head *iter;
+
+   netdev_for_each_lower_private_rcu(local-dev, lower, iter)
+   if (!ovs_netdev_netpoll_send(lower, skb))
+   return; /* Unicast only, first gets all. */
+   dev_kfree_skb_irq(skb);
+}
+
+#endif /* CONFIG_NET_POLL_CONTROLLER */
+
 /* Called with rcu_read_lock_bh. */
 static int internal_dev_xmit(struct sk_buff *skb, struct net_device *netdev)
 {
rcu_read_lock();
-   ovs_vport_receive(internal_dev_priv(netdev)-vport, skb, NULL);
+#ifdef CONFIG_NET_POLL_CONTROLLER
+   if (unlikely(netpoll_tx_running(netdev)))
+   internal_dev_netpoll_xmit(skb, netdev);
+   else
+#endif
+   ovs_vport_receive(internal_dev_priv(netdev)-vport, skb, NULL);
rcu_read_unlock();
return 0;
 }
@@ -122,6 +189,11 @@ static const struct net_device_ops internal_dev_netdev_ops 
= {
.ndo_set_mac_address = eth_mac_addr,
.ndo_change_mtu = internal_dev_change_mtu,
.ndo_get_stats64 = internal_dev_get_stats,
+#ifdef CONFIG_NET_POLL_CONTROLLER
+   .ndo_poll_controller = internal_dev_poll_controller,
+   .ndo_netpoll_setup = internal_dev_netpoll_setup,
+   .ndo_netpoll_cleanup = internal_dev_netpoll_cleanup,
+#endif
 };
 
 static struct rtnl_link_ops internal_dev_link_ops __read_mostly = {
diff --git a/net/openvswitch/vport-netdev.c b/net/openvswitch/vport-netdev.c
index 4776282c6417..324fb078d32a 100644
--- a/net/openvswitch/vport-netdev.c
+++ b/net/openvswitch/vport-netdev.c
@@ -26,6 +26,7 @@
 #include linux/rtnetlink.h
 #include linux/skbuff.h
 #include linux/openvswitch.h
+#include linux/netpoll.h
 
 #include net/llc.h
 
@@ -86,10 +87,59 @@ static struct net_device *get_dpdev(const struct datapath 
*dp)
return netdev_vport_priv(local)-dev;
 }
 
+#ifdef CONFIG_NET_POLL_CONTROLLER
+int ovs_netdev_netpoll_enable(struct netdev_vport *netdev_vport)
+{
+   struct netpoll *np;
+   int err;
+
+   if (netdev_vport-np)
+   return 0;
+
+   np = kzalloc(sizeof(*np), GFP_KERNEL);

Re: [PATCH] net: phy: micrel: don't do clock-mode-select if we got NULL clock

2015-04-23 Thread David Miller
From: Niklas Cassel niklas.cas...@axis.com
Date: Thu, 23 Apr 2015 15:37:11 +0200

 Signed-off-by: Niklas Cassel nikl...@axis.com
 ---
  drivers/net/phy/micrel.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/drivers/net/phy/micrel.c b/drivers/net/phy/micrel.c
 index 1190fd8..a422036 100644
 --- a/drivers/net/phy/micrel.c
 +++ b/drivers/net/phy/micrel.c
 @@ -548,7 +548,7 @@ static int kszphy_probe(struct phy_device *phydev)
   }
  
   clk = devm_clk_get(phydev-dev, rmii-ref);
 - if (!IS_ERR(clk)) {
 + if (!IS_ERR_OR_NULL(clk)) {
   unsigned long rate = clk_get_rate(clk);
   bool rmii_ref_clk_sel_25_mhz;
  

I do not see anyone in any other networking driver checking for a NULL
return from devm_clk_get().

So either everyone else is wrong, or your change is.  I want to find
out which before applying anything.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net 0/3] tipc: three bug fixes

2015-04-23 Thread David Miller
From: Jon Maloy jon.ma...@ericsson.com
Date: Thu, 23 Apr 2015 09:37:37 -0400

 A set of unrelated corrections; one for the tipc netns implementation,
 one regarding problems with random link resets, and one removing a
 an erroneous refcount decrement when reading link statistsics via
 netlink.

Series applied, thanks Jon.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fix tcp fin memory accounting

2015-04-23 Thread Eric Dumazet
On Thu, 2015-04-23 at 11:54 -0400, David Miller wrote:
 From: Eric Dumazet eric.duma...@gmail.com
 Date: Thu, 23 Apr 2015 07:02:47 -0700
 
  +   if (!tcp_send_head(sk)) {
  +   tp-snd_nxt++;
  +   return;
  +   }
 
 I'm not so sure about this.  Why is this needed?
 
 Otherwise patch looks fine to me.
 

I guess I need to add a comment then ;)

If we want to pretend FIN was sent, we also need to tweak tp-snd_nxt to
match new tskb-end_seq (or tp-write_seq).

I tested following packetdrill script and confirmed that if I do not
tweak snd_nxt, last packet sent is incorrect :

 . 5001:5001(0) ack 2

This might be because our stack relies that we never coalesce something
on one already sent skb (we do this check in tcp_sendmsg() for example)

Also note the funny thing : The FIN that is finally sent also has a Push
flag. I believe its fine to leave this, as it is not incorrect at
protocol perspective, and removing the push would add some logic when we
split the packet to remove the ACKed part.


// Initialize a server socket.
0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0 bind(3, ..., ...) = 0
+0 listen(3, 1) = 0

+0 S 0:0(0) win 32792 mss 1000,sackOK,nop,nop,nop,wscale 7
+0 S. 0:0(0) ack 1 mss 1460,nop,nop,sackOK,nop,wscale 6
+.010  . 1:1(0) ack 1 win 257

+0 accept(3, ..., ...) = 4

+0.010 write(4, ..., 5000) = 5000
+0 P. 1:5001(5000) ack 1

// this assumes kernel is not able to allocate fresh skb to 
// hold FIN
+.010 shutdown(4, SHUT_RDWR) = 0
+0 . 1:1(0) ack 5001 win 257

// Yeah ! note the FIN that we 'retransmit' also gets a Push
+0.209 FP. 5001:5001(0) ack 1

+0 read(4, ..., 1000) = 0
+0 write(4, ..., 1000) = -1 EPIPE (Broken pipe)

+.010 close(4) = 0

// TLP at 2*RTT after the original send of FIN.
+.201  FP. 5001:5001(0) ack 1

// Receive an ACK for the remaining outstanding data.
+.010  . 1:1(0) ack 5002 win 257

// Receive a FIN.
+.010  F. 1:1(0) ack 5002 win 257
// Make sure final packet is correct
+0 . 5002:5002(0) ack 2


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NAPI: Polling function requesting extra calls - difference between 3.18.11 and 4.0.0

2015-04-23 Thread Eric Dumazet
On Thu, 2015-04-23 at 20:18 +0200, Rafał Miłecki wrote:

 
 Can you help us with this, please? Does bgmac use NAPI incorrectly?
 Were there some important changes in 3.19 or 4.0?
 

Right they were important changes in NAPI indeed :

3b47d30396ba net: gro: add a per device gro flush timer
d75b1ade567f net: less interrupt masking in NAPI
bc9ad166e38a net: introduce napi_schedule_irqoff()

This requested some fixes in various drivers :

commit f31ec95fa19e07a8beebcc0297284f23aa57967e
commit 24e579c8898aa641ede3149234906982290934e5
commit 6088beef3f7517717bd21d90b379714dd0837079
commit f104fedc0da126abe93dd0f4a9fa13e5133bf9df
commit 7a05dc64e2e4c611d89007b125b20c0d2a4d31a5
commit 001ce546bb537bb5b7821f05633556a0c9787e32
commit 3079c652141f9d6377417a7e8fd650c9948df65e
commit 8acdf999accfd95093db17f33a58429a38782060
commit 6a6dc08ff6395f58be3ee568cb970ea956f16819


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] ethernet: myri10ge: use arch_phys_wc_add()

2015-04-23 Thread David Miller
From: Luis R. Rodriguez mcg...@do-not-panic.com
Date: Tue, 21 Apr 2015 13:09:45 -0700

 From: Luis R. Rodriguez mcg...@suse.com
 
 This driver already uses ioremap_wc() on the same range
 so when write-combining is available that will be used
 instead.
 
 ...
 Signed-off-by: Luis R. Rodriguez mcg...@suse.com

I can't wait forever for the driver maintainers to review this, so
I'm applying it.

Thanks Luis.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fix tcp fin memory accounting

2015-04-23 Thread Eric Dumazet
On Thu, 2015-04-23 at 09:16 -0700, Eric Dumazet wrote:
 On Thu, 2015-04-23 at 11:54 -0400, David Miller wrote:
  From: Eric Dumazet eric.duma...@gmail.com
  Date: Thu, 23 Apr 2015 07:02:47 -0700
  
   + if (!tcp_send_head(sk)) {
   + tp-snd_nxt++;
   + return;
   + }
  
  I'm not so sure about this.  Why is this needed?
  
  Otherwise patch looks fine to me.
  
 
 I guess I need to add a comment then ;)
 
 If we want to pretend FIN was sent, we also need to tweak tp-snd_nxt to
 match new tskb-end_seq (or tp-write_seq).
 
 I tested following packetdrill script and confirmed that if I do not
 tweak snd_nxt, last packet sent is incorrect :
 
. 5001:5001(0) ack 2
 
 This might be because our stack relies that we never coalesce something
 on one already sent skb (we do this check in tcp_sendmsg() for example)

Well, real reason is that tp-snd_nxt is not touched in retransmit
paths, but when new data is sent (tcp_event_new_data_sent())




--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: net: unix: garbage: fixed several comment and whitespace style issues

2015-04-23 Thread David Miller
From: Jason Eastman eastman.jason.li...@gmail.com
Date: Wed, 22 Apr 2015 00:56:42 -0600

 fixed several comment and whitespace style issues
 
 Signed-off-by: Jason Eastman eastman.jason.li...@gmail.com

Applied, thanks.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Suspicious RCU usage in bridge with Linux v4.0-9362-g1fc149933fd4

2015-04-23 Thread Sudeep Holla
On Thu, Apr 23, 2015 at 6:07 PM, Josh Boyer jwbo...@fedoraproject.org wrote:
 Hi All,

 We've had a user report the following backtrace from the bridge module
 with a recent Linus' tree.  Has anything like this been reported yet?
 If you have any questions on setup, the user is CC'd.


I too observed similar backtrace once(not able to reproduce it again)
while I was trying to check inconsistent lock state[1] with lockdep enabled.

Regards,
Sudeep

[1] https://lkml.org/lkml/2015/4/23/329


---8

===
[ INFO: suspicious RCU usage. ]
4.0.0 #269 Not tainted
---
include/trace/events/ipi.h:68 suspicious rcu_dereference_check() usage!

other info that might help us debug this:


RCU used illegally from idle CPU!
rcu_scheduler_active = 1, debug_locks = 1
RCU used illegally from extended quiescent state!
no locks held by swapper/0/0.

stack backtrace:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.0.0 #269
Hardware name: ARM-Versatile Express
[c00151f1] (unwind_backtrace) from [c0011971] (show_stack+0x11/0x14)
[c0011971] (show_stack) from [c05627c7] (dump_stack+0x73/0x8c)
[c05627c7] (dump_stack) from [c0013e63] (handle_IPI+0x257/0x410)
[c0013e63] (handle_IPI) from [c000932b] (gic_handle_irq+0x4f/0x50)
[c000932b] (gic_handle_irq) from [c00121ff] (__irq_svc+0x3f/0x64)
Exception stack(0xc083bf30 to 0xc083bf78)
bf20: 0001 0001  c0841938
bf40: c083a000 c083d5fc c08ba5c8 c083d598   c08b938f c056afc8
bf60: 0008 c083bf78 c005d157 c000f288 4133 
[c00121ff] (__irq_svc) from [c000f288] (arch_cpu_idle+0x30/0x34)
[c000f288] (arch_cpu_idle) from [c00555d5] (cpu_startup_entry+0x339/0x404)
[c00555d5] (cpu_startup_entry) from [c07d1a4f] (start_kernel+0x32f/0x338)
[c07d1a4f] (start_kernel) from [8000807f] (0x8000807f)

===
[ INFO: suspicious RCU usage. ]
4.0.0 #269 Not tainted
---
include/trace/events/ipi.h:84 suspicious rcu_dereference_check() usage!

other info that might help us debug this:


RCU used illegally from idle CPU!
rcu_scheduler_active = 1, debug_locks = 1
RCU used illegally from extended quiescent state!
no locks held by swapper/0/0.

stack backtrace:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.0.0 #269
Hardware name: ARM-Versatile Express
[c00151f1] (unwind_backtrace) from [c0011971] (show_stack+0x11/0x14)
[c0011971] (show_stack) from [c05627c7] (dump_stack+0x73/0x8c)
[c05627c7] (dump_stack) from [c0013f85] (handle_IPI+0x379/0x410)
[c0013f85] (handle_IPI) from [c000932b] (gic_handle_irq+0x4f/0x50)
[c000932b] (gic_handle_irq) from [c00121ff] (__irq_svc+0x3f/0x64)
Exception stack(0xc083bf30 to 0xc083bf78)
bf20: 0001 0001  c0841938
bf40: c083a000 c083d5fc c08ba5c8 c083d598   c08b938f c056afc8
bf60: 0008 c083bf78 c005d157 c000f288 4133 
[c00121ff] (__irq_svc) from [c000f288] (arch_cpu_idle+0x30/0x34)
[c000f288] (arch_cpu_idle) from [c00555d5] (cpu_startup_entry+0x339/0x404)
[c00555d5] (cpu_startup_entry) from [c07d1a4f] (start_kernel+0x32f/0x338)
[c07d1a4f] (start_kernel) from [8000807f] (0x8000807f)
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] ethernet: amd: AMD_XGBE should depend on HAS_DMA

2015-04-23 Thread David Miller
From: Geert Uytterhoeven ge...@linux-m68k.org
Date: Thu, 23 Apr 2015 19:59:33 +0200

 If NO_DMA=y:
 ...
 Signed-off-by: Geert Uytterhoeven ge...@linux-m68k.org

Applied.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NAPI: Polling function requesting extra calls - difference between 3.18.11 and 4.0.0

2015-04-23 Thread Rafał Miłecki
On 23 April 2015 at 20:33, Eric Dumazet eric.duma...@gmail.com wrote:
 On Thu, 2015-04-23 at 11:28 -0700, Eric Dumazet wrote:
 On Thu, 2015-04-23 at 20:18 +0200, Rafał Miłecki wrote:

 
  Can you help us with this, please? Does bgmac use NAPI incorrectly?
  Were there some important changes in 3.19 or 4.0?
 

 Right they were important changes in NAPI indeed :

 3b47d30396ba net: gro: add a per device gro flush timer
 d75b1ade567f net: less interrupt masking in NAPI
 bc9ad166e38a net: introduce napi_schedule_irqoff()

 This requested some fixes in various drivers :

 commit f31ec95fa19e07a8beebcc0297284f23aa57967e
 commit 24e579c8898aa641ede3149234906982290934e5
 commit 6088beef3f7517717bd21d90b379714dd0837079
 commit f104fedc0da126abe93dd0f4a9fa13e5133bf9df
 commit 7a05dc64e2e4c611d89007b125b20c0d2a4d31a5
 commit 001ce546bb537bb5b7821f05633556a0c9787e32
 commit 3079c652141f9d6377417a7e8fd650c9948df65e
 commit 8acdf999accfd95093db17f33a58429a38782060
 commit 6a6dc08ff6395f58be3ee568cb970ea956f16819


 It looks like following fix is needed :

 diff --git a/drivers/net/ethernet/broadcom/bgmac.c 
 b/drivers/net/ethernet/broadcom/bgmac.c
 index de77d3a74abc..21e3c38c7c75 100644
 --- a/drivers/net/ethernet/broadcom/bgmac.c
 +++ b/drivers/net/ethernet/broadcom/bgmac.c
 @@ -1260,7 +1260,7 @@ static int bgmac_poll(struct napi_struct *napi, int 
 weight)

 /* Poll again if more events arrived in the meantime */
 if (bgmac_read(bgmac, BGMAC_INT_STATUS)  (BGMAC_IS_TX0 | 
 BGMAC_IS_RX))
 -   return handled;
 +   return weight;

 if (handled  weight) {
 napi_complete(napi);



Yeah, I'm flashing my device with the same patch applied right now :)


-- 
Rafał
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] ethernet: arc: ARC_EMAC and EMAC_ROCKCHIP should depend on HAS_DMA

2015-04-23 Thread Geert Uytterhoeven
If NO_DMA=y:

drivers/built-in.o: In function `arc_emac_tx_clean':
emac_main.c:(.text+0x2decde): undefined reference to `dma_unmap_single'
drivers/built-in.o: In function `arc_emac_rx':
emac_main.c:(.text+0x2dee1c): undefined reference to `dma_unmap_single'
emac_main.c:(.text+0x2dee72): undefined reference to `dma_map_single'
emac_main.c:(.text+0x2dee7e): undefined reference to `dma_mapping_error'
drivers/built-in.o: In function `arc_emac_probe':
(.text+0x2df2ee): undefined reference to `dmam_alloc_coherent'
drivers/built-in.o: In function `arc_emac_open':
emac_main.c:(.text+0x2df6d8): undefined reference to `dma_map_single'
emac_main.c:(.text+0x2df6e4): undefined reference to `dma_mapping_error'
drivers/built-in.o: In function `arc_emac_tx':
emac_main.c:(.text+0x2df9e4): undefined reference to `dma_map_single'
emac_main.c:(.text+0x2df9f0): undefined reference to `dma_mapping_error'

Signed-off-by: Geert Uytterhoeven ge...@linux-m68k.org
---
 drivers/net/ethernet/arc/Kconfig | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/arc/Kconfig b/drivers/net/ethernet/arc/Kconfig
index 8e262e2b39b63fc5..dea29ee24da4a28c 100644
--- a/drivers/net/ethernet/arc/Kconfig
+++ b/drivers/net/ethernet/arc/Kconfig
@@ -25,8 +25,7 @@ config ARC_EMAC_CORE
 config ARC_EMAC
tristate ARC EMAC support
select ARC_EMAC_CORE
-   depends on OF_IRQ
-   depends on OF_NET
+   depends on OF_IRQ  OF_NET  HAS_DMA
---help---
  On some legacy ARC (Synopsys) FPGA boards such as ARCAngel4/ML50x
  non-standard on-chip ethernet device ARC EMAC 10/100 is used.
@@ -35,7 +34,7 @@ config ARC_EMAC
 config EMAC_ROCKCHIP
tristate Rockchip EMAC support
select ARC_EMAC_CORE
-   depends on OF_IRQ  OF_NET  REGULATOR
+   depends on OF_IRQ  OF_NET  REGULATOR  HAS_DMA
---help---
  Support for Rockchip RK3066/RK3188 EMAC ethernet controllers.
  This selects Rockchip SoC glue layer support for the
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] ethernet: amd: AMD_XGBE should depend on HAS_DMA

2015-04-23 Thread Geert Uytterhoeven
If NO_DMA=y:

drivers/built-in.o: In function `xgbe_probe':
xgbe-main.c:(.text+0x2def0a): undefined reference to `dma_set_mask'
xgbe-main.c:(.text+0x2def20): undefined reference to `dma_supported'
drivers/built-in.o: In function `xgbe_rx_poll':
xgbe-drv.c:(.text+0x2e0320): undefined reference to 
`dma_sync_single_for_cpu'
xgbe-drv.c:(.text+0x2e035e): undefined reference to 
`dma_sync_single_for_cpu'
drivers/built-in.o: In function `xgbe_unmap_rdata':
xgbe-desc.c:(.text+0x2e5fe4): undefined reference to `dma_unmap_page'
xgbe-desc.c:(.text+0x2e5ffa): undefined reference to `dma_unmap_single'
xgbe-desc.c:(.text+0x2e604a): undefined reference to `dma_unmap_page'
xgbe-desc.c:(.text+0x2e6084): undefined reference to `dma_unmap_page'
drivers/built-in.o: In function `xgbe_alloc_pages':
xgbe-desc.c:(.text+0x2e6156): undefined reference to `dma_map_page'
xgbe-desc.c:(.text+0x2e6164): undefined reference to `dma_mapping_error'
drivers/built-in.o: In function `xgbe_free_ring':
xgbe-desc.c:(.text+0x2e63d4): undefined reference to `dma_unmap_page'
xgbe-desc.c:(.text+0x2e640e): undefined reference to `dma_unmap_page'
xgbe-desc.c:(.text+0x2e644a): undefined reference to `dma_free_coherent'
drivers/built-in.o: In function `xgbe_init_ring':
xgbe-desc.c:(.text+0x2e64d4): undefined reference to `dma_alloc_coherent'
drivers/built-in.o: In function `xgbe_map_tx_skb':
xgbe-desc.c:(.text+0x2e6628): undefined reference to `dma_map_single'
xgbe-desc.c:(.text+0x2e6638): undefined reference to `dma_mapping_error'
xgbe-desc.c:(.text+0x2e66b2): undefined reference to `dma_map_single'
xgbe-desc.c:(.text+0x2e66c2): undefined reference to `dma_mapping_error'
xgbe-desc.c:(.text+0x2e6762): undefined reference to `dma_map_page'
xgbe-desc.c:(.text+0x2e6772): undefined reference to `dma_mapping_error'

Signed-off-by: Geert Uytterhoeven ge...@linux-m68k.org
---
 drivers/net/ethernet/amd/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/amd/Kconfig b/drivers/net/ethernet/amd/Kconfig
index c638c85f3954bc68..089c269637b725da 100644
--- a/drivers/net/ethernet/amd/Kconfig
+++ b/drivers/net/ethernet/amd/Kconfig
@@ -179,7 +179,7 @@ config SUNLANCE
 
 config AMD_XGBE
tristate AMD 10GbE Ethernet driver
-   depends on (OF_NET || ACPI)  HAS_IOMEM
+   depends on (OF_NET || ACPI)  HAS_IOMEM  HAS_DMA
select PHYLIB
select AMD_XGBE_PHY
select BITREVERSE
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] ethernet: arc: ARC_EMAC and EMAC_ROCKCHIP should depend on HAS_DMA

2015-04-23 Thread David Miller
From: Geert Uytterhoeven ge...@linux-m68k.org
Date: Thu, 23 Apr 2015 19:59:34 +0200

 If NO_DMA=y:
 ...
 Signed-off-by: Geert Uytterhoeven ge...@linux-m68k.org

Applied.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] can: CAN_GRCAN should depend on HAS_DMA

2015-04-23 Thread David Miller
From: Geert Uytterhoeven ge...@linux-m68k.org
Date: Thu, 23 Apr 2015 20:04:51 +0200

 If NO_DMA=y:
 ...
 Signed-off-by: Geert Uytterhoeven ge...@linux-m68k.org

Applied.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 2/3] tc: deprecate TC_ACT_QUEUED

2015-04-23 Thread Daniel Borkmann

On 04/23/2015 08:12 PM, Cong Wang wrote:

On Wed, Apr 22, 2015 at 7:46 PM, Alexei Starovoitov a...@plumgrid.com wrote:

...

TC_ACT_QUEUED cannot be removed.
Only deprecated with backwards compatibility the way this patch did it.
That should have been obvious.


It is at least the third time I have to repeat that: we really don't care
about out-of-tree modules.

Everyone MUST read Documentation/stable_api_nonsense.txt.


Seriously, what are you talking about ???

$ git grep -n TC_ACT_QUEUED include/uapi/
include/uapi/linux/pkt_cls.h:105:#define TC_ACT_QUEUED  5

This is *UAPI*, no-one is even remotely talking about out-of-tree
kernel modules.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 2/3] tc: deprecate TC_ACT_QUEUED

2015-04-23 Thread Cong Wang
On Thu, Apr 23, 2015 at 11:21 AM, Daniel Borkmann dan...@iogearbox.net wrote:
 On 04/23/2015 08:12 PM, Cong Wang wrote:

 On Wed, Apr 22, 2015 at 7:46 PM, Alexei Starovoitov a...@plumgrid.com
 wrote:

 ...

 TC_ACT_QUEUED cannot be removed.
 Only deprecated with backwards compatibility the way this patch did it.
 That should have been obvious.


 It is at least the third time I have to repeat that: we really don't care
 about out-of-tree modules.

 Everyone MUST read Documentation/stable_api_nonsense.txt.


 Seriously, what are you talking about ???

 $ git grep -n TC_ACT_QUEUED include/uapi/
 include/uapi/linux/pkt_cls.h:105:#define TC_ACT_QUEUED  5

 This is *UAPI*, no-one is even remotely talking about out-of-tree
 kernel modules.

I am not stupid. You should figure out we can just leave its definition
by removing it, leaving the default as stolen.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NAPI: Polling function requesting extra calls - difference between 3.18.11 and 4.0.0

2015-04-23 Thread Eric Dumazet
On Thu, 2015-04-23 at 11:28 -0700, Eric Dumazet wrote:
 On Thu, 2015-04-23 at 20:18 +0200, Rafał Miłecki wrote:
 
  
  Can you help us with this, please? Does bgmac use NAPI incorrectly?
  Were there some important changes in 3.19 or 4.0?
  
 
 Right they were important changes in NAPI indeed :
 
 3b47d30396ba net: gro: add a per device gro flush timer
 d75b1ade567f net: less interrupt masking in NAPI
 bc9ad166e38a net: introduce napi_schedule_irqoff()
 
 This requested some fixes in various drivers :
 
 commit f31ec95fa19e07a8beebcc0297284f23aa57967e
 commit 24e579c8898aa641ede3149234906982290934e5
 commit 6088beef3f7517717bd21d90b379714dd0837079
 commit f104fedc0da126abe93dd0f4a9fa13e5133bf9df
 commit 7a05dc64e2e4c611d89007b125b20c0d2a4d31a5
 commit 001ce546bb537bb5b7821f05633556a0c9787e32
 commit 3079c652141f9d6377417a7e8fd650c9948df65e
 commit 8acdf999accfd95093db17f33a58429a38782060
 commit 6a6dc08ff6395f58be3ee568cb970ea956f16819
 

It looks like following fix is needed :

diff --git a/drivers/net/ethernet/broadcom/bgmac.c 
b/drivers/net/ethernet/broadcom/bgmac.c
index de77d3a74abc..21e3c38c7c75 100644
--- a/drivers/net/ethernet/broadcom/bgmac.c
+++ b/drivers/net/ethernet/broadcom/bgmac.c
@@ -1260,7 +1260,7 @@ static int bgmac_poll(struct napi_struct *napi, int 
weight)
 
/* Poll again if more events arrived in the meantime */
if (bgmac_read(bgmac, BGMAC_INT_STATUS)  (BGMAC_IS_TX0 | BGMAC_IS_RX))
-   return handled;
+   return weight;
 
if (handled  weight) {
napi_complete(napi);


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Patch net] igb: pass the correct maxlen for eth_get_headlen()

2015-04-23 Thread Alexander Duyck
On 04/23/2015 11:06 AM, Cong Wang wrote:
 On Wed, Apr 22, 2015 at 8:40 PM, Alexander Duyck
 alexander.du...@gmail.com wrote:
 On 04/22/2015 04:23 PM, Cong Wang wrote:
 On Wed, Apr 22, 2015 at 3:34 PM, Alexander Duyck
 alexander.du...@gmail.com wrote:
 On 04/22/2015 02:56 PM, Cong Wang wrote:
 On Wed, Apr 22, 2015 at 2:42 PM, Alexander Duyck
 alexander.du...@gmail.com wrote:
 On 04/22/2015 01:33 PM, Cong Wang wrote:
 First, make sure you don't miss the TSIP case right above:

 The frag starting pointer and its size are advanced by:

 skb_frag_size_sub(frag, IGB_TS_HDR_LEN);
 ...
 va += IGB_TS_HDR_LEN;

 even though we unlikely pull header longer than
 IGB_RX_HDR_LEN - IGB_TS_HDR_LEN either.
 So I believe this is a possible bug, one heck of a corner case to get
 into though.  It requires timestamp in packet, size 240 - 256, and a
 malformed header.

 The proper fix would probably be to pull the timestamp out of the packet
 before we add it to the frame.  I'll submit a patch to address this.

 Huh? Doesn't my patch already fix this? skb_frag_size() is always
 up to date. Or you mean another different problem?
 Your patch has other issues.  I am still NAKing your patch, but there is
 an issue with igb that you have pointed out.  The proper fix would be to
 deal with the timestamp before we add the page fragment to the skb.

 If the first frag is always 2K, then this is not a problem either.
 IGB_TS_HDR_LEN + IGB_RX_HDR_LEN  2K.
 The problem is skb-tail will get screwed up.

 Why? We don't copy timestamp into skb header, instead it is saved
 in shared_info, why skb-tail matters here in TSIP case?

TSIP only matters if you have a network header with bad data in it that
indicates the size is actually larger than it actually is.

 The first frag will be 2K in size only if there are multiple frags.  The

 Or it doesn't have frags at all since we just copy it to skb header, right?

That is moot since this code only gets called if the skb is nonlinear

 problem in the TSIP case is that we will put the size reported by the
 descriptor into the fragment, and then try to pull the size we

 That size is saved when adding the frag, in TSIP case we just sub it
 by IGB_TS_HDR_LEN, which seems correct.


Yes, the size is saved.  But with your solution we could pull the whole
fragment but not free it which isn't correct as we shouldn't be left
with any 0 sized fragments.

 determined via eth_get_headlen.  So there is a window where that value
 can be wrong and result in data_len going negative and tail being moved
 beyond the end of the actual data.
 This sounds like a different problem if you are saying we should sub
 the size in the descriptor too. Therefore I don't see why your patch
 could replace mine.

Your patches sort-of fixed the problem, but they introduced other issues
in the process.

The thing I don't think you are getting is that the code was meant to be
mutually exclusive w/ the copy-break code.  Either the frag is less than
IGB_RX_HDR_SIZE in which case we copy-break and don't attached the
fragment, or we should be pulling the header and leaving at least 1 byte
in the fragment.  The problem with your solution is that you potentially
pull the entire fragment in, but you don't free it if the size drops to
0.  That is why in the other mail I told you the better solution for igb
would likely be to just pass IGB_RX_HDR_SIZE - IGB_TS_HDR_SIZE as that
way you end up with at least 1 byte left in the fragment after you pull
the header.

- Alex
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fix tcp fin memory accounting

2015-04-23 Thread David Miller
From: Eric Dumazet eric.duma...@gmail.com
Date: Thu, 23 Apr 2015 09:48:12 -0700

 On Thu, 2015-04-23 at 09:16 -0700, Eric Dumazet wrote:
 On Thu, 2015-04-23 at 11:54 -0400, David Miller wrote:
  From: Eric Dumazet eric.duma...@gmail.com
  Date: Thu, 23 Apr 2015 07:02:47 -0700
  
   +if (!tcp_send_head(sk)) {
   +tp-snd_nxt++;
   +return;
   +}
  
  I'm not so sure about this.  Why is this needed?
  
  Otherwise patch looks fine to me.
  
 
 I guess I need to add a comment then ;)
 
 If we want to pretend FIN was sent, we also need to tweak tp-snd_nxt to
 match new tskb-end_seq (or tp-write_seq).
 
 I tested following packetdrill script and confirmed that if I do not
 tweak snd_nxt, last packet sent is incorrect :
 
   . 5001:5001(0) ack 2
 
 This might be because our stack relies that we never coalesce something
 on one already sent skb (we do this check in tcp_sendmsg() for example)
 
 Well, real reason is that tp-snd_nxt is not touched in retransmit
 paths, but when new data is sent (tcp_event_new_data_sent())

That makes sense, thanks for explaining.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net] tcp: avoid looping in tcp_send_fin()

2015-04-23 Thread Eric Dumazet
From: Eric Dumazet eduma...@google.com

Presence of an unbound loop in tcp_send_fin() had always been hard
to explain when analyzing crash dumps involving gigantic dying processes
with millions of sockets.

Lets try a different strategy :

In case of memory pressure, try to add the FIN flag to last packet
in write queue, even if packet was already sent. TCP stack will
be able to deliver this FIN after a timeout event. Note that this
FIN being delivered by a retransmit, it also carries a Push flag
given our current implementation.

By checking sk_under_memory_pressure(), we anticipate that cooking
many FIN packets might deplete tcp memory.

In the case we could not allocate a packet, even with __GFP_WAIT
allocation, then not sending a FIN seems quite reasonable if it allows
to get rid of this socket, free memory, and not block the process from
eventually doing other useful work.

Signed-off-by: Eric Dumazet eduma...@google.com
---
 net/ipv4/tcp_output.c |   50 +++-
 1 file changed, 29 insertions(+), 21 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 2ade67b7cdb0..a369e8a70b2c 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2814,7 +2814,8 @@ begin_fwd:
 
 /* We allow to exceed memory limits for FIN packets to expedite
  * connection tear down and (memory) recovery.
- * Otherwise tcp_send_fin() could loop forever.
+ * Otherwise tcp_send_fin() could be tempted to either delay FIN
+ * or even be forced to close flow without any FIN.
  */
 static void sk_forced_wmem_schedule(struct sock *sk, int size)
 {
@@ -2827,33 +2828,40 @@ static void sk_forced_wmem_schedule(struct sock *sk, 
int size)
sk_memory_allocated_add(sk, amt, status);
 }
 
-/* Send a fin.  The caller locks the socket for us.  This cannot be
- * allowed to fail queueing a FIN frame under any circumstances.
+/* Send a FIN. The caller locks the socket for us.
+ * We should try to send a FIN packet really hard, but eventually give up.
  */
 void tcp_send_fin(struct sock *sk)
 {
+   struct sk_buff *skb, *tskb = tcp_write_queue_tail(sk);
struct tcp_sock *tp = tcp_sk(sk);
-   struct sk_buff *skb = tcp_write_queue_tail(sk);
-   int mss_now;
 
-   /* Optimization, tack on the FIN if we have a queue of
-* unsent frames.  But be careful about outgoing SACKS
-* and IP options.
+   /* Optimization, tack on the FIN if we have one skb in write queue and
+* this skb was not yet sent, or we are under memory pressure.
+* Note: in the latter case, FIN packet will be sent after a timeout,
+* as TCP stack thinks it has already been transmitted.
 */
-   mss_now = tcp_current_mss(sk);
-
-   if (tcp_send_head(sk)) {
-   TCP_SKB_CB(skb)-tcp_flags |= TCPHDR_FIN;
-   TCP_SKB_CB(skb)-end_seq++;
+   if (tskb  (tcp_send_head(sk) || sk_under_memory_pressure(sk))) {
+coalesce:
+   TCP_SKB_CB(tskb)-tcp_flags |= TCPHDR_FIN;
+   TCP_SKB_CB(tskb)-end_seq++;
tp-write_seq++;
+   if (!tcp_send_head(sk)) {
+   /* This means tskb was already sent.
+* Pretend we included the FIN on previous transmit.
+* We need to set tp-snd_nxt to the value it would have
+* if FIN had been sent. This is because retransmit path
+* does not change tp-snd_nxt.
+*/
+   tp-snd_nxt++;
+   return;
+   }
} else {
-   /* Socket is locked, keep trying until memory is available. */
-   for (;;) {
-   skb = alloc_skb_fclone(MAX_TCP_HEADER,
-  sk-sk_allocation);
-   if (skb)
-   break;
-   yield();
+   skb = alloc_skb_fclone(MAX_TCP_HEADER, sk-sk_allocation);
+   if (unlikely(!skb)) {
+   if (tskb)
+   goto coalesce;
+   return;
}
skb_reserve(skb, MAX_TCP_HEADER);
sk_forced_wmem_schedule(sk, skb-truesize);
@@ -2862,7 +2870,7 @@ void tcp_send_fin(struct sock *sk)
 TCPHDR_ACK | TCPHDR_FIN);
tcp_queue_skb(sk, skb);
}
-   __tcp_push_pending_frames(sk, mss_now, TCP_NAGLE_OFF);
+   __tcp_push_pending_frames(sk, tcp_current_mss(sk), TCP_NAGLE_OFF);
 }
 
 /* We get here when a process closes a file descriptor (either due to


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] netdev_alloc_pcpu_stats: use less common iterator variable

2015-04-23 Thread David Miller
From: Johannes Berg johan...@sipsolutions.net
Date: Thu, 23 Apr 2015 12:06:30 +0200

 From: Johannes Berg johannes.b...@intel.com
 
 With the CPU iteration variable called 'i', it's relatively easy
 to have variable shadowing which sparse will warn about. Avoid
 that by renaming the variable to __cpu which is less likely to
 be used in the surrounding context.
 
 Signed-off-by: Johannes Berg johannes.b...@intel.com

Applied, thanks.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Suspicious RCU usage in bridge with Linux v4.0-9362-g1fc149933fd4

2015-04-23 Thread Josh Boyer
Hi All,

We've had a user report the following backtrace from the bridge module
with a recent Linus' tree.  Has anything like this been reported yet?
If you have any questions on setup, the user is CC'd.

josh

[   29.382235] br0: port 1(tap0) entered forwarding state

[   29.382286] ===
[   29.382315] [ INFO: suspicious RCU usage. ]
[   29.382344] 4.1.0-0.rc0.git11.1.fc23.x86_64 #1 Not tainted
[   29.382380] ---
[   29.382409] net/bridge/br_private.h:626 suspicious
rcu_dereference_check() usage!
[   29.382455]
   other info that might help us debug this:

[   29.382507]
   rcu_scheduler_active = 1, debug_locks = 0
[   29.382549] 2 locks held by swapper/0/0:
[   29.382576]  #0:  (((p-forward_delay_timer))){+.-...}, at:
[81139f75] call_timer_fn+0x5/0x4f0
[   29.382660]  #1:  ((br-lock)-rlock){+.-...}, at:
[a0450dc1] br_forward_delay_timer_expired+0x31/0x140
[bridge]
[   29.382754]
   stack backtrace:
[   29.382787] CPU: 0 PID: 0 Comm: swapper/0 Not tainted
4.1.0-0.rc0.git11.1.fc23.x86_64 #1
[   29.382838] Hardware name: LENOVO 422916G/LENOVO, BIOS A1KT53AUS 04/07/2015
[   29.382882]   3ebfc20364115825 88003c48
81892d4b
[   29.382943]   81e124e0 88003c78
8110bcd7
[   29.383004]  8800785c9d00 88065485ac58 880c62002800
880c5fc88ac0
[   29.383065] Call Trace:
[   29.383084]  IRQ  [81892d4b] dump_stack+0x4c/0x65
[   29.383130]  [8110bcd7] lockdep_rcu_suspicious+0xe7/0x120
[   29.383178]  [a04520f9] br_fill_ifinfo+0x4a9/0x6a0 [bridge]
[   29.383225]  [a045266b] br_ifinfo_notify+0x11b/0x4b0 [bridge]
[   29.383271]  [a0450d90] ? br_hold_timer_expired+0x70/0x70 [bridge]
[   29.383320]  [a0450de8]
br_forward_delay_timer_expired+0x58/0x140 [bridge]
[   29.383371]  [a0450d90] ? br_hold_timer_expired+0x70/0x70 [bridge]
[   29.383416]  [8113a033] call_timer_fn+0xc3/0x4f0
[   29.383454]  [81139f75] ? call_timer_fn+0x5/0x4f0
[   29.383493]  [8110a90f] ? lock_release_holdtime.part.29+0xf/0x200
[   29.383541]  [a0450d90] ? br_hold_timer_expired+0x70/0x70 [bridge]
[   29.383587]  [8113a6a4] run_timer_softirq+0x244/0x490
[   29.383629]  [810b68cc] __do_softirq+0xec/0x670
[   29.383666]  [810b70d5] irq_exit+0x145/0x150
[   29.383703]  [8189f506] smp_apic_timer_interrupt+0x46/0x60
[   29.383744]  [8189d523] apic_timer_interrupt+0x73/0x80
[   29.383782]  EOI  [816f131f] ? cpuidle_enter_state+0x5f/0x2f0
[   29.383832]  [816f131b] ? cpuidle_enter_state+0x5b/0x2f0
[   29.383873]  [816f15e7] cpuidle_enter+0x17/0x20
[   29.383908]  [81103c6f] cpu_startup_entry+0x36f/0x5f0
[   29.383949]  [81887cbd] rest_init+0x13d/0x150
[   29.383986]  [821c905b] start_kernel+0x4d2/0x4f3
[   29.384023]  [821c8120] ? early_idt_handlers+0x120/0x120
[   29.384064]  [821c8339] x86_64_start_reservations+0x2a/0x2c
[   29.384105]  [821c8485] x86_64_start_kernel+0x14a/0x16d
[   36.100347] IN=br0 OUT=
MAC=33:33:00:00:00:02:52:54:00:12:01:00:86:dd
SRC=fe80::::5054:00ff:fe12:0100
DST=ff02:::::::0002 LEN=56 TC=0 HOPLIMIT=255
FLOWLBL=0 PROTO=ICMPv6 TYPE=133 CODE=0
[   40.113744] IN=br0 OUT=
MAC=33:33:00:00:00:02:52:54:00:12:01:00:86:dd
SRC=fe80::::5054:00ff:fe12:0100
DST=ff02:::::::0002 LEN=56 TC=0 HOPLIMIT=255
FLOWLBL=0 PROTO=ICMPv6 TYPE=133 CODE=0
[   44.119727] IN=br0 OUT=
MAC=33:33:00:00:00:02:52:54:00:12:01:00:86:dd
SRC=fe80::::5054:00ff:fe12:0100
DST=ff02:::::::0002 LEN=56 TC=0 HOPLIMIT=255
FLOWLBL=0 PROTO=ICMPv6 TYPE=133 CODE=0
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] can: CAN_GRCAN should depend on HAS_DMA

2015-04-23 Thread Geert Uytterhoeven
If NO_DMA=y:

drivers/built-in.o: In function `grcan_free_dma_buffers':
grcan.c:(.text+0x2d7716): undefined reference to `dma_free_coherent'
drivers/built-in.o: In function `grcan_allocate_dma_buffers':
grcan.c:(.text+0x2d779c): undefined reference to `dma_alloc_coherent'

Signed-off-by: Geert Uytterhoeven ge...@linux-m68k.org
---
 drivers/net/can/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/can/Kconfig b/drivers/net/can/Kconfig
index 58808f6514520c86..e8c96b8e86f48e66 100644
--- a/drivers/net/can/Kconfig
+++ b/drivers/net/can/Kconfig
@@ -112,7 +112,7 @@ config PCH_CAN
 
 config CAN_GRCAN
tristate Aeroflex Gaisler GRCAN and GRHCAN CAN devices
-   depends on OF
+   depends on OF  HAS_DMA
---help---
  Say Y here if you want to use Aeroflex Gaisler GRCAN or GRHCAN.
  Note that the driver supports little endian, even though little
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Patch net] igb: pass the correct maxlen for eth_get_headlen()

2015-04-23 Thread Cong Wang
On Wed, Apr 22, 2015 at 8:40 PM, Alexander Duyck
alexander.du...@gmail.com wrote:
 On 04/22/2015 04:23 PM, Cong Wang wrote:
 On Wed, Apr 22, 2015 at 3:34 PM, Alexander Duyck
 alexander.du...@gmail.com wrote:
 On 04/22/2015 02:56 PM, Cong Wang wrote:
 On Wed, Apr 22, 2015 at 2:42 PM, Alexander Duyck
 alexander.du...@gmail.com wrote:
 On 04/22/2015 01:33 PM, Cong Wang wrote:
 First, make sure you don't miss the TSIP case right above:

 The frag starting pointer and its size are advanced by:

 skb_frag_size_sub(frag, IGB_TS_HDR_LEN);
 ...
 va += IGB_TS_HDR_LEN;

 even though we unlikely pull header longer than
 IGB_RX_HDR_LEN - IGB_TS_HDR_LEN either.
 So I believe this is a possible bug, one heck of a corner case to get
 into though.  It requires timestamp in packet, size 240 - 256, and a
 malformed header.

 The proper fix would probably be to pull the timestamp out of the packet
 before we add it to the frame.  I'll submit a patch to address this.

 Huh? Doesn't my patch already fix this? skb_frag_size() is always
 up to date. Or you mean another different problem?
 Your patch has other issues.  I am still NAKing your patch, but there is
 an issue with igb that you have pointed out.  The proper fix would be to
 deal with the timestamp before we add the page fragment to the skb.

 If the first frag is always 2K, then this is not a problem either.
 IGB_TS_HDR_LEN + IGB_RX_HDR_LEN  2K.

 The problem is skb-tail will get screwed up.


Why? We don't copy timestamp into skb header, instead it is saved
in shared_info, why skb-tail matters here in TSIP case?


[...]


 The first frag will be 2K in size only if there are multiple frags.  The


Or it doesn't have frags at all since we just copy it to skb header, right?


 problem in the TSIP case is that we will put the size reported by the
 descriptor into the fragment, and then try to pull the size we


That size is saved when adding the frag, in TSIP case we just sub it
by IGB_TS_HDR_LEN, which seems correct.


 determined via eth_get_headlen.  So there is a window where that value
 can be wrong and result in data_len going negative and tail being moved
 beyond the end of the actual data.

This sounds like a different problem if you are saying we should sub
the size in the descriptor too. Therefore I don't see why your patch
could replace mine.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


NAPI: Polling function requesting extra calls - difference between 3.18.11 and 4.0.0

2015-04-23 Thread Rafał Miłecki
Hi,

Recently Felix improved bgmac driver to be smarter in situation where
new packets were Tx/Rx-ed during bgmac_poll execution. It was handled
in:
eb64e29 bgmac: leave interrupts disabled as long as there is work to do
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=eb64e2923a886441c7b322f138b36029f3fa6a36
With above change, after handling all skb-s in bgmac_poll, bgmac
checks if there are new sbk-s to be handled. If so, it doesn't call
napi_complete  it doesn't enable interrupts.

Above commit was merged for 4.1 kernel, but I work with OpenWrt based
on older releases, so I have all bgmac changes backported to 3.8.11
and 4.0.


With 3.8.11 Felix's change seems to be working fine, I did some debugging:

[   44.97] [bgmac_interrupt] int_status  int_mask = 0x0100

[   44.98] [bgmac_poll] START weight:64
[   44.99] [bgmac_poll] Read handled:0 skb-s
[   44.99] [bgmac_poll] Read IRQ status again BGMAC_IS_TX0:0 BGMAC_IS_RX:0
[   45.00] [bgmac_poll] Read skb-s  weight, calling napi_complete, en IRQs
[   45.01] [bgmac_poll] END returning handled:0

[   48.04] [bgmac_interrupt] int_status  int_mask = 0x0100

[   48.05] [bgmac_poll] START weight:64
[   48.06] [bgmac_poll] Read handled:0 skb-s
[   48.06] [bgmac_poll] Read IRQ status again BGMAC_IS_TX0:0 BGMAC_IS_RX:0
[   48.07] [bgmac_poll] Read skb-s  weight, calling napi_complete, en IRQs
[   48.08] [bgmac_poll] END returning handled:0

[   48.10] [bgmac_poll] START weight:64
[   48.11] [bgmac_poll] Read handled:1 skb-s
[   48.11] [bgmac_poll] Read IRQ status again BGMAC_IS_TX0:1 BGMAC_IS_RX:1
[   48.12] [bgmac_poll] END more skb-s to handle, returning handled:1

[   48.13] [bgmac_poll] START weight:64
[   48.13] [bgmac_poll] Read handled:1 skb-s
[   48.14] [bgmac_poll] Read IRQ status again BGMAC_IS_TX0:0 BGMAC_IS_RX:0
[   48.15] [bgmac_poll] Read skb-s  weight, calling napi_complete, en IRQs
[   48.16] [bgmac_poll] END returning handled:1

Please note that at 48.12 bgmac decided bgmac_poll should be
called again and it didn't call napi_complete  didn't enable IRQs. It
simply returned 1.
NAPI behaved just like we expected, bgmac_poll was called again a
moment later. It handled skb-s that appeared meanwhile and ended
calling napi_complete and returning 1.


Now, it fails badly for me at all when using kernel 4.0.0:

[   52.80] [bgmac_interrupt] int_status  int_mask = 0x0100

[   52.80] [bgmac_poll] START weight:64
[   52.81] [bgmac_poll] Read handled:0 skb-s
[   52.81] [bgmac_poll] Read IRQ status again BGMAC_IS_TX0:0 BGMAC_IS_RX:0
[   52.82] [bgmac_poll] Read skb-s  weight, calling napi_complete, en IRQs
[   52.83] [bgmac_poll] END returning handled:0

[   54.61] [bgmac_interrupt] int_status  int_mask = 0x0100

[   54.62] [bgmac_poll] START weight:64
[   54.63] [bgmac_poll] Read handled:0 skb-s
[   54.63] [bgmac_poll] Read IRQ status again BGMAC_IS_TX0:0 BGMAC_IS_RX:0
[   54.64] [bgmac_poll] Read skb-s  weight, calling napi_complete, en IRQs
[   54.65] [bgmac_poll] END returning handled:0

[   55.90] [bgmac_interrupt] int_status  int_mask = 0x0100

[   55.91] [bgmac_poll] START weight:64
[   55.92] [bgmac_poll] Read handled:1 skb-s
[   55.92] [bgmac_poll] Read IRQ status again BGMAC_IS_TX0:1 BGMAC_IS_RX:1
[   55.93] [bgmac_poll] END more skb-s to handle, returning handled:1

As you can see, at 55.93 bgmac also decided bgmac_poll should be
called again. It didn't call napi_complete, didn't enable IRQs. It
simply returned 1. Just like with kernel 3.8.11.
It this case however, NAPI never called bgmac_poll again. It resulted
in bgmac hanging (IRQs are off, bgmac expects bgmac_poll to be
called).


Can you help us with this, please? Does bgmac use NAPI incorrectly?
Were there some important changes in 3.19 or 4.0?

-- 
Rafał
diff --git a/drivers/net/ethernet/broadcom/bgmac.c 
b/drivers/net/ethernet/broadcom/bgmac.c
index 4929932..40ce34d 100644
--- a/drivers/net/ethernet/broadcom/bgmac.c
+++ b/drivers/net/ethernet/broadcom/bgmac.c
@@ -1219,6 +1219,7 @@ static irqreturn_t bgmac_interrupt(int irq, void *dev_id)
struct bgmac *bgmac = netdev_priv(dev_id);
 
u32 int_status = bgmac_read(bgmac, BGMAC_INT_STATUS);
+   pr_info([%s] int_status:0x%08x  int_mask:0x%08x = 0x%08x\n, 
__FUNCTION__, int_status, bgmac-int_mask, int_status  bgmac-int_mask);
int_status = bgmac-int_mask;
 
if (!int_status)
@@ -1240,22 +1241,31 @@ static int bgmac_poll(struct napi_struct *napi, int 
weight)
 {
struct bgmac *bgmac = container_of(napi, struct bgmac, napi);
int handled = 0;
+   u32 int_status;
 
+   pr_info([%s] START weight:%d\n, __FUNCTION__, weight);
/* Ack */
bgmac_write(bgmac, BGMAC_INT_STATUS, ~0);
 
bgmac_dma_tx_free(bgmac, bgmac-tx_ring[0]);
handled += bgmac_dma_rx_read(bgmac, 

Re: [PATCH][FIX 4.1] bgmac: fix requests for extra polling calls from NAPI

2015-04-23 Thread Eric Dumazet
On Thu, 2015-04-23 at 20:56 +0200, Rafał Miłecki wrote:
 After d75b1ade567f (net: less interrupt masking in NAPI) polling
 function has to return whole budget when it wants NAPI to call it again.
 
 Signed-off-by: Rafał Miłecki zaj...@gmail.com
 Cc: Felix Fietkau n...@openwrt.org
 Fixes: eb64e2923a886 (bgmac: leave interrupts disabled as long as there is 
 work to do)
 ---
  drivers/net/ethernet/broadcom/bgmac.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

Signed-off-by: Eric Dumazet eduma...@google.com

Thanks !


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH][FIX 4.1] bgmac: fix requests for extra polling calls from NAPI

2015-04-23 Thread Rafał Miłecki
After d75b1ade567f (net: less interrupt masking in NAPI) polling
function has to return whole budget when it wants NAPI to call it again.

Signed-off-by: Rafał Miłecki zaj...@gmail.com
Cc: Felix Fietkau n...@openwrt.org
Fixes: eb64e2923a886 (bgmac: leave interrupts disabled as long as there is 
work to do)
---
 drivers/net/ethernet/broadcom/bgmac.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/broadcom/bgmac.c 
b/drivers/net/ethernet/broadcom/bgmac.c
index de77d3a..21e3c38 100644
--- a/drivers/net/ethernet/broadcom/bgmac.c
+++ b/drivers/net/ethernet/broadcom/bgmac.c
@@ -1260,7 +1260,7 @@ static int bgmac_poll(struct napi_struct *napi, int 
weight)
 
/* Poll again if more events arrived in the meantime */
if (bgmac_read(bgmac, BGMAC_INT_STATUS)  (BGMAC_IS_TX0 | BGMAC_IS_RX))
-   return handled;
+   return weight;
 
if (handled  weight) {
napi_complete(napi);
-- 
1.8.4.5

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] altera tse: add support for fixed-links.

2015-04-23 Thread David Miller
From: Andreas Oetken ennoerlan...@googlemail.com
Date: Wed, 22 Apr 2015 15:22:22 +0200

 + /* check if a fixed-link is defined in device-tree */
 + if (of_phy_is_fixed_link(priv-device-of_node)) {
 + if (of_phy_register_fixed_link(priv-device-of_node)
 +  0) {

You should store the return value of of_phy_register_fixed_link() and return
that as the error code when it fails.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Intel-wired-lan] [Patch net] igb: pass the correct maxlen for eth_get_headlen()

2015-04-23 Thread Alexander Duyck

On 04/23/2015 12:30 PM, Cong Wang wrote:

(Off-topic...)

On Wed, Apr 22, 2015 at 4:23 PM, Cong Wang cw...@twopensource.com wrote:

The code looks correct to me now, except it is suspicious skb-len
is not updated after skb_copy_to_linear_data() while skb-tail is
advanced already. I need to think more before submitting a patch.

I feel like we need the following patch, maybe skb-len is updated somewhere
else by skb-tail - skb-head, otherwise we are screwed?


Maybe in skb_add_rx_frag?  You might take a look at it.



diff --git a/drivers/net/ethernet/intel/igb/igb_main.c
b/drivers/net/ethernet/intel/igb/igb_main.c
index a0a9b1f..66e6fb6 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -6843,7 +6843,6 @@ static void igb_pull_tail(struct igb_ring *rx_ring,
 skb_frag_size_sub(frag, IGB_TS_HDR_LEN);
 frag-page_offset += IGB_TS_HDR_LEN;
 skb-data_len -= IGB_TS_HDR_LEN;
-   skb-len -= IGB_TS_HDR_LEN;

 /* move va to start of packet data */
 va += IGB_TS_HDR_LEN;
@@ -6856,12 +6855,12 @@ static void igb_pull_tail(struct igb_ring *rx_ring,

 /* align pull length to size of long to optimize memcpy performance */
 skb_copy_to_linear_data(skb, va, ALIGN(pull_len, sizeof(long)));
+   __skb_put(skb, pull_len);

 /* update all of the pointers */
 skb_frag_size_sub(frag, pull_len);
 frag-page_offset += pull_len;
 skb-data_len -= pull_len;
-   skb-tail += pull_len;
  }

  /**


Seriously?  Are you even reading the code?  The fragment is already a 
part of the skb, as such it is already included in skb-len. By 
re-adding the header you are adding bytes that aren't there.  All we 
were doing is moving data from a fragment to the linear portion.


No offense but your starting to waste my time with these silly patch 
ideas.  The patches I submitted to intel-wired-lan fix the issue that 
you found, and there aren't any new issues that it creates so the issue 
is resolved.  And like I said if you need to fix this in stable just 
subtract IGB_TS_HDR_LEN from the header length scanned in 
eth_get_headlen and it will resolve the issue you reported and that way 
we can fix the issue and avoid pulling a fragment down to size 0.


- Alex
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/2] checkpatch: suggest using eth_zero_addr() and eth_broadcast_addr()

2015-04-23 Thread Mateusz Kulikowski
On 22.04.2015 00:27, Joe Perches wrote:
 On Tue, 2015-04-21 at 23:44 +0200, Mateusz Kulikowski wrote:
 On 21.04.2015 23:22, Joe Perches wrote:
 On Tue, 2015-04-21 at 22:57 +0200, Mateusz Kulikowski wrote:
 (...)
(...)
 True, True; If you prefer $line and ability to --fix - I'll use that in v3
 
 I suppose you could do both $line and $stat
 and the fix would only work when it's on a
 single line.
 
 Perhaps something like this would work:
 
   if ($line =~ /whatever/ ||
   (defined($stat)  $stat =~ /whatever/)) {
   if (WARN(...) 
   $fix) {
   fixed[$fixlinenr] =~ s/whatever/appropriate/;
   }
   }

Isn't it enough to just match $stat and do fix for line (that in 
some cases will just not match)?


One more thing
I noticed funny behavior about $stat matches - 
it reports the same error several times (including as scope whole file)
Is it feature or feature or I missed something?

Ex. file:
-- cut
int foo(void)
{
baz();
memset(a, b, 0);
bar();
}
-- cut

Output of (@master) 

-- cut
$ scripts/checkpatch.pl -f test42.c --types MEMSET
ERROR: memset to 0's uses 0 as the 2nd argument, not the 3rd
#1: FILE: test42.c:1:
+int foo(void)
 {
 baz();
 memset(a, b, 0);
 bar();
 }

ERROR: memset to 0's uses 0 as the 2nd argument, not the 3rd
#2: FILE: test42.c:2:
+{
 baz();
 memset(a, b, 0);
 bar();
 }

ERROR: memset to 0's uses 0 as the 2nd argument, not the 3rd
#4: FILE: test42.c:4:
+memset(a, b, 0);

total: 3 errors, 0 warnings, 6 lines checked

NOTE: Used message types: MEMSET

test42.c has style problems, please review.

If any of these errors are false positives, please report
them to the maintainer, see CHECKPATCH in MAINTAINERS.
--cut

Regards,
Mateusz
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] rhashtable: don't attempt to grow when at max_size

2015-04-23 Thread Johannes Berg
On Thu, 2015-04-23 at 21:46 +0100, Thomas Graf wrote:
 On 04/23/15 at 04:38pm, Johannes Berg wrote:
  From: Johannes Berg johannes.b...@intel.com
  
  The conversion of mac80211's station table to rhashtable had a bug
  that I found by accident in code review, that hadn't been found as
  rhashtable apparently managed to have a maximum hash chain length
  of one (!) in all our testing.
 
 This is the desired chain length ;-)

Sure. But I had a bug in my handling of collisions, so I explicitly
wanted to test them. After all, they are in some ways expected in a hash
table :)

  At that point, rhashtable WARNed in rhashtable_insert_rehash() but
  didn't actually reject the hash table insertion. This caused it to
  lose insertions - my master list of stations would have 9 entries,
  but the rhashtable only had 5. This may warrant a deeper look, but
  that WARN_ON() just shouldn't happen.
 
 The warning got fixed recently (51bb8e331b) and
 rhashtable_insert_rehash() now only allows a single rehash if at
 max_size already. It will now return -EBUSY.
 
 Insertions may still fail while the table is above 100% utilization
 so this fix is absolutely needed though.

Yeah just failing would be a bit strange.

  Fix this by not returning true from rht_grow_above_100() when the
  rhashtable's max_size has been reached - in this case the user is
  explicitly configuring it to be at most that big, so even if it's
  now above 100% it shouldn't attempt to resize.
 
 Good catch. I wonder whether we want to trigger a periodic rehash
 in an interval in this situation or just leave this up to the user
 to setup a timer himself.

You could just document it that it's probably useful if max_size is set?
I'm just going to be setting max_size for debug purposes, so don't
really care all that much.

johannes

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NAPI: Polling function requesting extra calls - difference between 3.18.11 and 4.0.0

2015-04-23 Thread Eric Dumazet
On Thu, 2015-04-23 at 21:08 +0200, Rafał Miłecki wrote:

 As you obviously noticed, I fixed bgmac now and submitted the fix.
 
 Just wanted to say a thanks for helping with that!
 

Sure, it was a pleasure ;)


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] rhashtable: don't attempt to grow when at max_size

2015-04-23 Thread Thomas Graf
On 04/23/15 at 04:38pm, Johannes Berg wrote:
 From: Johannes Berg johannes.b...@intel.com
 
 The conversion of mac80211's station table to rhashtable had a bug
 that I found by accident in code review, that hadn't been found as
 rhashtable apparently managed to have a maximum hash chain length
 of one (!) in all our testing.

This is the desired chain length ;-)

 In order to test the bug and verify the fix I set my rhashtable's
 max_size very low (4) in order to force getting hash collisions.
 
 At that point, rhashtable WARNed in rhashtable_insert_rehash() but
 didn't actually reject the hash table insertion. This caused it to
 lose insertions - my master list of stations would have 9 entries,
 but the rhashtable only had 5. This may warrant a deeper look, but
 that WARN_ON() just shouldn't happen.

The warning got fixed recently (51bb8e331b) and
rhashtable_insert_rehash() now only allows a single rehash if at
max_size already. It will now return -EBUSY.

Insertions may still fail while the table is above 100% utilization
so this fix is absolutely needed though.

 Fix this by not returning true from rht_grow_above_100() when the
 rhashtable's max_size has been reached - in this case the user is
 explicitly configuring it to be at most that big, so even if it's
 now above 100% it shouldn't attempt to resize.

Good catch. I wonder whether we want to trigger a periodic rehash
in an interval in this situation or just leave this up to the user
to setup a timer himself.

 This fixes the lost insertion issue and consequently allows my
 code to display its error (and verify my fix for it.)
 
 Signed-off-by: Johannes Berg johannes.b...@intel.com

Acked-by: Thomas Graf tg...@suug.ch
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NAPI: Polling function requesting extra calls - difference between 3.18.11 and 4.0.0

2015-04-23 Thread Rafał Miłecki
On 23 April 2015 at 20:28, Eric Dumazet eric.duma...@gmail.com wrote:
 On Thu, 2015-04-23 at 20:18 +0200, Rafał Miłecki wrote:


 Can you help us with this, please? Does bgmac use NAPI incorrectly?
 Were there some important changes in 3.19 or 4.0?


 Right they were important changes in NAPI indeed :

 3b47d30396ba net: gro: add a per device gro flush timer
 d75b1ade567f net: less interrupt masking in NAPI
 bc9ad166e38a net: introduce napi_schedule_irqoff()

As you obviously noticed, I fixed bgmac now and submitted the fix.

Just wanted to say a thanks for helping with that!

-- 
Rafał
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Patch net] igb: pass the correct maxlen for eth_get_headlen()

2015-04-23 Thread Cong Wang
(Off-topic...)

On Wed, Apr 22, 2015 at 4:23 PM, Cong Wang cw...@twopensource.com wrote:

 The code looks correct to me now, except it is suspicious skb-len
 is not updated after skb_copy_to_linear_data() while skb-tail is
 advanced already. I need to think more before submitting a patch.

I feel like we need the following patch, maybe skb-len is updated somewhere
else by skb-tail - skb-head, otherwise we are screwed?

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c
b/drivers/net/ethernet/intel/igb/igb_main.c
index a0a9b1f..66e6fb6 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -6843,7 +6843,6 @@ static void igb_pull_tail(struct igb_ring *rx_ring,
skb_frag_size_sub(frag, IGB_TS_HDR_LEN);
frag-page_offset += IGB_TS_HDR_LEN;
skb-data_len -= IGB_TS_HDR_LEN;
-   skb-len -= IGB_TS_HDR_LEN;

/* move va to start of packet data */
va += IGB_TS_HDR_LEN;
@@ -6856,12 +6855,12 @@ static void igb_pull_tail(struct igb_ring *rx_ring,

/* align pull length to size of long to optimize memcpy performance */
skb_copy_to_linear_data(skb, va, ALIGN(pull_len, sizeof(long)));
+   __skb_put(skb, pull_len);

/* update all of the pointers */
skb_frag_size_sub(frag, pull_len);
frag-page_offset += pull_len;
skb-data_len -= pull_len;
-   skb-tail += pull_len;
 }

 /**
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] net: netcp: remove call to netif_carrier_(on/off) for MAC to Phy interface

2015-04-23 Thread Murali Karicheri
Currently when interface type is MAC to Phy, netif_carrier_(on/off)
is called which is not needed as Phy lib already updates the carrier
status to net stack. This is needed only for other interface types

Signed-off-by: Murali Karicheri m-kariche...@ti.com
---
 drivers/net/ethernet/ti/netcp_ethss.c |8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/ti/netcp_ethss.c 
b/drivers/net/ethernet/ti/netcp_ethss.c
index 2bef655..0a1ef2e 100644
--- a/drivers/net/ethernet/ti/netcp_ethss.c
+++ b/drivers/net/ethernet/ti/netcp_ethss.c
@@ -1765,7 +1765,9 @@ static void netcp_ethss_link_state_action(struct gbe_priv 
*gbe_dev,
 ALE_PORT_STATE,
 ALE_PORT_STATE_FORWARD);
 
-   if (ndev  slave-open)
+   if (ndev  slave-open 
+   ((slave-link_interface != SGMII_LINK_MAC_PHY) 
+   (slave-link_interface != XGMII_LINK_MAC_PHY)))
netif_carrier_on(ndev);
} else {
writel(mac_control, GBE_REG_ADDR(slave, emac_regs,
@@ -1773,7 +1775,9 @@ static void netcp_ethss_link_state_action(struct gbe_priv 
*gbe_dev,
cpsw_ale_control_set(gbe_dev-ale, slave-port_num,
 ALE_PORT_STATE,
 ALE_PORT_STATE_DISABLE);
-   if (ndev)
+   if (ndev 
+   ((slave-link_interface != SGMII_LINK_MAC_PHY) 
+   (slave-link_interface != XGMII_LINK_MAC_PHY)))
netif_carrier_off(ndev);
}
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Suspicious RCU usage in bridge with Linux v4.0-9362-g1fc149933fd4

2015-04-23 Thread Cong Wang
On Thu, Apr 23, 2015 at 10:07 AM, Josh Boyer jwbo...@fedoraproject.org wrote:
 [   29.382235] br0: port 1(tap0) entered forwarding state

 [   29.382286] ===
 [   29.382315] [ INFO: suspicious RCU usage. ]
 [   29.382344] 4.1.0-0.rc0.git11.1.fc23.x86_64 #1 Not tainted
 [   29.382380] ---
 [   29.382409] net/bridge/br_private.h:626 suspicious
 rcu_dereference_check() usage!
 [   29.382455]
other info that might help us debug this:

 [   29.382507]
rcu_scheduler_active = 1, debug_locks = 0
 [   29.382549] 2 locks held by swapper/0/0:
 [   29.382576]  #0:  (((p-forward_delay_timer))){+.-...}, at:
 [81139f75] call_timer_fn+0x5/0x4f0
 [   29.382660]  #1:  ((br-lock)-rlock){+.-...}, at:
 [a0450dc1] br_forward_delay_timer_expired+0x31/0x140
 [bridge]
 [   29.382754]
stack backtrace:
 [   29.382787] CPU: 0 PID: 0 Comm: swapper/0 Not tainted
 4.1.0-0.rc0.git11.1.fc23.x86_64 #1
 [   29.382838] Hardware name: LENOVO 422916G/LENOVO, BIOS A1KT53AUS 04/07/2015
 [   29.382882]   3ebfc20364115825 88003c48
 81892d4b
 [   29.382943]   81e124e0 88003c78
 8110bcd7
 [   29.383004]  8800785c9d00 88065485ac58 880c62002800
 880c5fc88ac0
 [   29.383065] Call Trace:
 [   29.383084]  IRQ  [81892d4b] dump_stack+0x4c/0x65
 [   29.383130]  [8110bcd7] lockdep_rcu_suspicious+0xe7/0x120
 [   29.383178]  [a04520f9] br_fill_ifinfo+0x4a9/0x6a0 [bridge]
 [   29.383225]  [a045266b] br_ifinfo_notify+0x11b/0x4b0 [bridge]
 [   29.383271]  [a0450d90] ? br_hold_timer_expired+0x70/0x70 
 [bridge]
 [   29.383320]  [a0450de8]
 br_forward_delay_timer_expired+0x58/0x140 [bridge]
 [   29.383371]  [a0450d90] ? br_hold_timer_expired+0x70/0x70 
 [bridge]
 [   29.383416]  [8113a033] call_timer_fn+0xc3/0x4f0
 [   29.383454]  [81139f75] ? call_timer_fn+0x5/0x4f0
 [   29.383493]  [8110a90f] ? lock_release_holdtime.part.29+0xf/0x200
 [   29.383541]  [a0450d90] ? br_hold_timer_expired+0x70/0x70 
 [bridge]
 [   29.383587]  [8113a6a4] run_timer_softirq+0x244/0x490
 [   29.383629]  [810b68cc] __do_softirq+0xec/0x670
 [   29.383666]  [810b70d5] irq_exit+0x145/0x150
 [   29.383703]  [8189f506] smp_apic_timer_interrupt+0x46/0x60
 [   29.383744]  [8189d523] apic_timer_interrupt+0x73/0x80
 [   29.383782]  EOI  [816f131f] ? cpuidle_enter_state+0x5f/0x2f0
 [   29.383832]  [816f131b] ? cpuidle_enter_state+0x5b/0x2f0
 [   29.383873]  [816f15e7] cpuidle_enter+0x17/0x20
 [   29.383908]  [81103c6f] cpu_startup_entry+0x36f/0x5f0
 [   29.383949]  [81887cbd] rest_init+0x13d/0x150
 [   29.383986]  [821c905b] start_kernel+0x4d2/0x4f3
 [   29.384023]  [821c8120] ? early_idt_handlers+0x120/0x120
 [   29.384064]  [821c8339] x86_64_start_reservations+0x2a/0x2c
 [   29.384105]  [821c8485] x86_64_start_kernel+0x14a/0x16d

We are dereferencing a RCU pointer with rtnl assert, but in the timer context
we only have br-lock. It looks like we need to take RCU read lock on that path
rather than asserting rtnl lock.

Thanks for the report.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 2/3] tc: deprecate TC_ACT_QUEUED

2015-04-23 Thread Cong Wang
On Thu, Apr 23, 2015 at 3:13 PM, Alexei Starovoitov a...@plumgrid.com wrote:
 On 4/23/15 1:45 PM, Jamal Hadi Salim wrote:


 1) the _XMIT semantics are useful on the egress side because in fact
 we do have queues and they can be attached to qdiscs etc.
 The TC_ACT_XXX codes were _intentional_ since ingress works as a
 classifier shell.


 then it is worse mess than I thought :(
 Why call it _qdisc_ then? and have special and convoluted handling for
 it in qdisc_create, qdisc_graft and other places?

It needs to glue into qdisc layer, unify the abstractions.


 Are you planning to queue things on ingress?

 I thought that was the whole purpose of ingress qdisc.

There is no queue for ingress qdisc.

 why then we have dev-ingress_queue?

Glue layer, qdisc ties too much with txq, which is what I want to solve.

We have struct netdev_rx_queue too, but look at it, there is nothing
but some sysfs stuffs needed by RPS.


 If queueing was never a goal, may be we should kill ingress qdisc
 and replace it with a simple shim that only does cls/act.
 The code overall will get much simpler and faster.


As I said in response to your patch for skb-data, we need to align
ingress with egress, rather than making more differences, this includes
qdisc's too. We do have per-cpu queues for ingress, at least in theory
we have a queue for ingress to use.

Of course, RX is naturally different from TX, which is the root cause
why we have so many differences here.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 2/3] tc: deprecate TC_ACT_QUEUED

2015-04-23 Thread Jamal Hadi Salim

On 04/23/15 18:13, Alexei Starovoitov wrote:

On 4/23/15 1:45 PM, Jamal Hadi Salim wrote:



then it is worse mess than I thought :(
Why call it _qdisc_ then? and have special and convoluted handling for
it in qdisc_create, qdisc_graft and other places?


Convinience for tooling and using existing abstractions is the
historical. You can attach qdiscs to netdevs.
You can then use all sorts of well understood tools.


  Are you planning to queue things on ingress?

I thought that was the whole purpose of ingress qdisc.
why then we have dev-ingress_queue?



So you are planning to add queues? If you are that is a different
discussion (and the use case needs some clarity).


If queueing was never a goal, may be we should kill ingress qdisc
and replace it with a simple shim that only does cls/act.
The code overall will get much simpler and faster.



Attaching to the device was considered simpler based on the tooling
thought process. On whether we should queue on ingress - it was not
considered useful if the packets were going to the host i.e we want to
proceed to completion likely queueing to some socket layer further
upstream.
For packets being forwarded we already had egress qdiscs which had
queues so it didnt seem to make sense to enqueue on ingress for such
use cases.
There was a use case later where multiple ingress ports had to be shared
and ifb was born - which is pseudo temporary enqueueing on ingress.



Feels like falling into rabbit hole.


The fact that qdiscs dealt with these codes directly
allows for specialized handling. Moving them to a generic function
seems to defeat that purpose. So I am siding with Cong on this.


that's not what patch 1 is doing. It is still doing specialized
handling... but in light of what you said above, it looks like much
bigger cleanup is needed. We'll continue arguing when I refactor
this set and resubmit when net-next reopens.



Sorry - i was refereing to the patch where you agregated things after
the qdisc invokes a classifier.

cheers,
jamal
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 2/3] tc: deprecate TC_ACT_QUEUED

2015-04-23 Thread Florian Westphal
Jamal Hadi Salim j...@mojatatu.com wrote:
 2) the ACT_QUEUED vs STOLEN was supposed to have semantics of something
 that was stolen (eg redirection should definetely have been returning
 STOLEN not QUEUED); something that queues for later re-injection
 (with any/all metadata) was intended to use QUEUED. I believe netfilter
 may have  followed suit and introduced similar codes (so it would be
 interesting to see how they use them).

Hooks (Targets) don't queue themselves, i.e. NF_QUEUE tells the
netfilter core that the skb is to be handed off to nf_queue machinery,
while NF_STOLEN is the more obvious don't touch this skb ever again.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Intel-wired-lan] [Patch net] igb: pass the correct maxlen for eth_get_headlen()

2015-04-23 Thread Cong Wang
On Thu, Apr 23, 2015 at 12:45 PM, Alexander Duyck
alexander.h.du...@redhat.com wrote:
 On 04/23/2015 12:30 PM, Cong Wang wrote:

 (Off-topic...)

 On Wed, Apr 22, 2015 at 4:23 PM, Cong Wang cw...@twopensource.com wrote:

 The code looks correct to me now, except it is suspicious skb-len
 is not updated after skb_copy_to_linear_data() while skb-tail is
 advanced already. I need to think more before submitting a patch.

 I feel like we need the following patch, maybe skb-len is updated
 somewhere
 else by skb-tail - skb-head, otherwise we are screwed?


 Maybe in skb_add_rx_frag?  You might take a look at it.

I saw that.



 diff --git a/drivers/net/ethernet/intel/igb/igb_main.c
 b/drivers/net/ethernet/intel/igb/igb_main.c
 index a0a9b1f..66e6fb6 100644
 --- a/drivers/net/ethernet/intel/igb/igb_main.c
 +++ b/drivers/net/ethernet/intel/igb/igb_main.c
 @@ -6843,7 +6843,6 @@ static void igb_pull_tail(struct igb_ring *rx_ring,
  skb_frag_size_sub(frag, IGB_TS_HDR_LEN);
  frag-page_offset += IGB_TS_HDR_LEN;
  skb-data_len -= IGB_TS_HDR_LEN;
 -   skb-len -= IGB_TS_HDR_LEN;

  /* move va to start of packet data */
  va += IGB_TS_HDR_LEN;
 @@ -6856,12 +6855,12 @@ static void igb_pull_tail(struct igb_ring
 *rx_ring,

  /* align pull length to size of long to optimize memcpy
 performance */
  skb_copy_to_linear_data(skb, va, ALIGN(pull_len, sizeof(long)));
 +   __skb_put(skb, pull_len);

  /* update all of the pointers */
  skb_frag_size_sub(frag, pull_len);
  frag-page_offset += pull_len;
  skb-data_len -= pull_len;
 -   skb-tail += pull_len;
   }

   /**


 Seriously?  Are you even reading the code?  The fragment is already a part
 of the skb, as such it is already included in skb-len. By re-adding the
 header you are adding bytes that aren't there.  All we were doing is moving
 data from a fragment to the linear portion.

Hmm, I thought frags are counted only in skb-data_len, I misunderstood
skb_headlen() and skb_pagelen().


 No offense but your starting to waste my time with these silly patch ideas.


You can always ignore any off-topic discussion, which I already warned you
in the beginning...

 The patches I submitted to intel-wired-lan fix the issue that you found, and
 there aren't any new issues that it creates so the issue is resolved.  And
 like I said if you need to fix this in stable just subtract IGB_TS_HDR_LEN
 from the header length scanned in eth_get_headlen and it will resolve the
 issue you reported and that way we can fix the issue and avoid pulling a
 fragment down to size 0.


Please keep non-off-topic discussion from off-topic ones. I will reply there.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 2/3] tc: deprecate TC_ACT_QUEUED

2015-04-23 Thread Alexei Starovoitov

On 4/23/15 1:45 PM, Jamal Hadi Salim wrote:


1) the _XMIT semantics are useful on the egress side because in fact
we do have queues and they can be attached to qdiscs etc.
The TC_ACT_XXX codes were _intentional_ since ingress works as a
classifier shell.


then it is worse mess than I thought :(
Why call it _qdisc_ then? and have special and convoluted handling for
it in qdisc_create, qdisc_graft and other places?

 Are you planning to queue things on ingress?

I thought that was the whole purpose of ingress qdisc.
why then we have dev-ingress_queue?

If queueing was never a goal, may be we should kill ingress qdisc
and replace it with a simple shim that only does cls/act.
The code overall will get much simpler and faster.

Feels like falling into rabbit hole.


The fact that qdiscs dealt with these codes directly
allows for specialized handling. Moving them to a generic function
seems to defeat that purpose. So I am siding with Cong on this.


that's not what patch 1 is doing. It is still doing specialized
handling... but in light of what you said above, it looks like much
bigger cleanup is needed. We'll continue arguing when I refactor
this set and resubmit when net-next reopens.

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 2/3] tc: deprecate TC_ACT_QUEUED

2015-04-23 Thread Cong Wang
On Wed, Apr 22, 2015 at 7:46 PM, Alexei Starovoitov a...@plumgrid.com wrote:

 TC_ACT_QUEUED cannot be removed.
 Only deprecated with backwards compatibility the way this patch did it.
 That should have been obvious.


It is at least the third time I have to repeat that: we really don't care
about out-of-tree modules.

Everyone MUST read Documentation/stable_api_nonsense.txt.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 2/3] tc: deprecate TC_ACT_QUEUED

2015-04-23 Thread Cong Wang
On Thu, Apr 23, 2015 at 5:59 PM, Alexei Starovoitov a...@plumgrid.com wrote:
 On 4/23/15 3:51 PM, Jamal Hadi Salim wrote:


 So you are planning to add queues? If you are that is a different
 discussion (and the use case needs some clarity).


 nope. I wasn't planning to do that.

 For packets being forwarded we already had egress qdiscs which had
 queues so it didnt seem to make sense to enqueue on ingress for such
 use cases.
 There was a use case later where multiple ingress ports had to be
 shared
 and ifb was born - which is pseudo temporary enqueueing on ingress.

 agree. imo ifb approach is more flexible, since it has full hierarchy
 of qdiscs. As you're saying above and from the old ifb logs I thought
 that ifb is _temporary_ and long term plan is to use ingress_queue,
 but looks like this is not the case. Also not too long ago we decided
 that we don't want another ingress qdisc. Therefore I think it makes
 sense to simplify the code and boost performance.

Which performance problem did you see? Did you hit the spinlock
bottleneck? That spinlock should be eliminated before you
try to do more, and it can be replaced by RCU read lock after
John's work (I am still waiting for his patch btw).


 I'm not proposing to change tooling, of course.
 From iproute2 point of view we'll still have ingress qdisc.
 Right now we're pointlessly allocating memory for it and for
 ingress_queue, whereas we only need to call cls/act.
 I'm proposing to kill them and used tcf_proto in net_device instead.
 Right now to reach cls in critical path on ingress we do:
 rxq = skb-dev-ingress_queue
 sch = rxq-qdisc
 sch-enqueue
 sch-filter_list
 with a bunch of 'if' conditions and useless memory accesses in-between.
 If we add 'ingress_filter_list' directly to net_device,
 it will be just:
 skb-dev-ingress_filter_list
 which is huge performance boost. Code size will shrink as well.
 iproute2 and all existing tools will work as-is. Looks as win-win to me.

Nope, it breaks Qdisc abstraction, filters never attach to netdevice
directly. You should stop, since you really don't understand the code.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH next 1/3] ipvlan: Defer multicast / broadcast processing to a work-queue

2015-04-23 Thread David Miller
From: Mahesh Bandewar mahe...@google.com
Date: Thu, 23 Apr 2015 19:54:29 -0700

 On Thu, Apr 23, 2015 at 5:32 PM, Eric Dumazet eric.duma...@gmail.com wrote:
 On Thu, 2015-04-23 at 14:29 -0700, Mahesh Bandewar wrote:

 +static void ipvlan_multicast_enqueue(struct ipvl_port *port,
 +  struct sk_buff *skb)
 +{
 + if (skb-protocol == htons(ETH_P_PAUSE))
 + return;

 But what happens to this packet ? It seems leaked ?

 Hmm, will take care of it in v2.

Please also provide a proper [PATCH 0/3]  introduction posting
stating what the patch series is doing, and why.

Thanks.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][FIX 4.1] bgmac: fix requests for extra polling calls from NAPI

2015-04-23 Thread Felix Fietkau
On 2015-04-23 20:56, Rafał Miłecki wrote:
 After d75b1ade567f (net: less interrupt masking in NAPI) polling
 function has to return whole budget when it wants NAPI to call it again.
 
 Signed-off-by: Rafał Miłecki zaj...@gmail.com
 Cc: Felix Fietkau n...@openwrt.org
 Fixes: eb64e2923a886 (bgmac: leave interrupts disabled as long as there is 
 work to do)
Acked-by: Felix Fietkau n...@openwrt.org

- Felix
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Patch net] igb: pass the correct maxlen for eth_get_headlen()

2015-04-23 Thread Cong Wang
On Thu, Apr 23, 2015 at 11:40 AM, Alexander Duyck
alexander.du...@gmail.com wrote:
 On 04/23/2015 11:06 AM, Cong Wang wrote:
 On Wed, Apr 22, 2015 at 8:40 PM, Alexander Duyck
 alexander.du...@gmail.com wrote:
 On 04/22/2015 04:23 PM, Cong Wang wrote:
 On Wed, Apr 22, 2015 at 3:34 PM, Alexander Duyck
 alexander.du...@gmail.com wrote:
 On 04/22/2015 02:56 PM, Cong Wang wrote:
 On Wed, Apr 22, 2015 at 2:42 PM, Alexander Duyck
 alexander.du...@gmail.com wrote:
 On 04/22/2015 01:33 PM, Cong Wang wrote:
 First, make sure you don't miss the TSIP case right above:

 The frag starting pointer and its size are advanced by:

 skb_frag_size_sub(frag, IGB_TS_HDR_LEN);
 ...
 va += IGB_TS_HDR_LEN;

 even though we unlikely pull header longer than
 IGB_RX_HDR_LEN - IGB_TS_HDR_LEN either.
 So I believe this is a possible bug, one heck of a corner case to get
 into though.  It requires timestamp in packet, size 240 - 256, and a
 malformed header.

 The proper fix would probably be to pull the timestamp out of the packet
 before we add it to the frame.  I'll submit a patch to address this.

 Huh? Doesn't my patch already fix this? skb_frag_size() is always
 up to date. Or you mean another different problem?
 Your patch has other issues.  I am still NAKing your patch, but there is
 an issue with igb that you have pointed out.  The proper fix would be to
 deal with the timestamp before we add the page fragment to the skb.

 If the first frag is always 2K, then this is not a problem either.
 IGB_TS_HDR_LEN + IGB_RX_HDR_LEN  2K.
 The problem is skb-tail will get screwed up.

 Why? We don't copy timestamp into skb header, instead it is saved
 in shared_info, why skb-tail matters here in TSIP case?

 TSIP only matters if you have a network header with bad data in it that
 indicates the size is actually larger than it actually is.


OK.


 The first frag will be 2K in size only if there are multiple frags.  The

 Or it doesn't have frags at all since we just copy it to skb header, right?

 That is moot since this code only gets called if the skb is nonlinear

 problem in the TSIP case is that we will put the size reported by the
 descriptor into the fragment, and then try to pull the size we

 That size is saved when adding the frag, in TSIP case we just sub it
 by IGB_TS_HDR_LEN, which seems correct.


 Yes, the size is saved.  But with your solution we could pull the whole
 fragment but not free it which isn't correct as we shouldn't be left
 with any 0 sized fragments.


Good point, then TSIP case should check frag_size == 0 case,
the rest handles 0-size case correctly, see below.


 determined via eth_get_headlen.  So there is a window where that value
 can be wrong and result in data_len going negative and tail being moved
 beyond the end of the actual data.
 This sounds like a different problem if you are saying we should sub
 the size in the descriptor too. Therefore I don't see why your patch
 could replace mine.

 Your patches sort-of fixed the problem, but they introduced other issues
 in the process.

 The thing I don't think you are getting is that the code was meant to be
 mutually exclusive w/ the copy-break code.  Either the frag is less than
 IGB_RX_HDR_SIZE in which case we copy-break and don't attached the
 fragment, or we should be pulling the header and leaving at least 1 byte
 in the fragment.  The problem with your solution is that you potentially
 pull the entire fragment in, but you don't free it if the size drops to


Only if it is a malformed packet, and it is still safe as long as we don't
run over the buffer size. Also I have a upper limit check for
IGB_RX_HDR_SIZE after eth_get_headlen().

 0.  That is why in the other mail I told you the better solution for igb

I am sure eth_get_headlen() handles 0 case correctly, it simply returns 0.

 would likely be to just pass IGB_RX_HDR_SIZE - IGB_TS_HDR_SIZE as that
 way you end up with at least 1 byte left in the fragment after you pull
 the header.


The code should already handle pull_len==0 correctly.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: CCM/GCM implementation defect

2015-04-23 Thread Herbert Xu
On Thu, Apr 23, 2015 at 03:24:59PM +0200, Martin Willi wrote:

 Do you have any pointer for me where this is defined? Why is it needed,
 given that GCM implicitly authenticates the IV by using it in Y0?

The IV if present must be covered by the ICV.  This is required
by RFC4303 (section 2).  But really it's quite obvious.  If you
don't authenticate the IV, then I can easily inject random crap
into your network by changing the IV.

Cheers,
-- 
Email: Herbert Xu herb...@gondor.apana.org.au
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Resource usages in Linux drivers

2015-04-23 Thread Francois Romieu
Sergei Shtylyov sergei.shtyl...@cogentembedded.com :
 On 4/23/2015 1:08 PM, Jia-Ju Bai wrote:
[...]
 I also find many drivers do not use these managed APIs, especially in 
 ethernet
 card drivers (like e100, r8169). Is it possible to change them?
 
Patches welcome. :-)

I respectfully disagree.

If someone believes basic resouce management to be too hard or error-prone
to handle, he should imvho seriously rise the bar and reconsider the way
he wants to contribute to the kernel.

I may hope he who reads e100.c to think about DMA api, bql or rx ring
holes avoidance to quote a few ones. Managed API ? Mildly...

-- 
Ueimor
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/2] checkpatch: suggest using eth_zero_addr() and eth_broadcast_addr()

2015-04-23 Thread Joe Perches
On Thu, 2015-04-23 at 16:54 -0700, Joe Perches wrote:
 On Thu, 2015-04-23 at 21:53 +0200, Mateusz Kulikowski wrote:
  I noticed funny behavior about $stat matches - 
  it reports the same error several times (including as scope whole file)
  Is it feature or feature or I missed something?
 
 You have to make sure the first character of $stat is a +
 
   if ($stat =~ /\+(?:.*)\bmemfoo

Make that

if ($stat =~ /^\+(?:.*)\bmemfoo


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH next 1/3] ipvlan: Defer multicast / broadcast processing to a work-queue

2015-04-23 Thread Mahesh Bandewar
Processing multicast / broadcast in fast path is performance draining
and having more links means more clonning and bringing performance
down further.

Broadcast; in particular, need to be given to all the virtual links.
Earlier tricks of enabling broadcast bit for IPv4 only interfaces are not
really working since it fails autoconf. Which means enabling braodcast
for all the links if protocol specific hacks do not have to be added into
the driver.

This patch defers all (incoming as well as outgoing) multicast traffic to
a work-queue leaving only the unicast traffic in the fast-path. Now if we
need to apply any additional tricks to further reduce the impact of this
(multicast / broadcast) type of traffic, it can be implemented while
processing this work without affecting the fast-path.

Signed-off-by: Mahesh Bandewar mahe...@google.com
---
 drivers/net/ipvlan/ipvlan.h  |   5 ++
 drivers/net/ipvlan/ipvlan_core.c | 134 +--
 drivers/net/ipvlan/ipvlan_main.c |   5 ++
 3 files changed, 96 insertions(+), 48 deletions(-)

diff --git a/drivers/net/ipvlan/ipvlan.h b/drivers/net/ipvlan/ipvlan.h
index 54549a6223dd..953a97492fab 100644
--- a/drivers/net/ipvlan/ipvlan.h
+++ b/drivers/net/ipvlan/ipvlan.h
@@ -39,6 +39,8 @@
 #define IPVLAN_MAC_FILTER_SIZE (1  IPVLAN_MAC_FILTER_BITS)
 #define IPVLAN_MAC_FILTER_MASK (IPVLAN_MAC_FILTER_SIZE - 1)
 
+#define IPVLAN_QBACKLOG_LIMIT  1000
+
 typedef enum {
IPVL_IPV6 = 0,
IPVL_ICMPV6,
@@ -93,6 +95,8 @@ struct ipvl_port {
struct hlist_head   hlhead[IPVLAN_HASH_SIZE];
struct list_headipvlans;
struct rcu_head rcu;
+   struct work_struct  wq;
+   struct sk_buff_head backlog;
int count;
u16 mode;
 };
@@ -112,6 +116,7 @@ void ipvlan_set_port_mode(struct ipvl_port *port, u32 nval);
 void ipvlan_init_secret(void);
 unsigned int ipvlan_mac_hash(const unsigned char *addr);
 rx_handler_result_t ipvlan_handle_frame(struct sk_buff **pskb);
+void ipvlan_process_multicast(struct work_struct *work);
 int ipvlan_queue_xmit(struct sk_buff *skb, struct net_device *dev);
 void ipvlan_ht_addr_add(struct ipvl_dev *ipvlan, struct ipvl_addr *addr);
 struct ipvl_addr *ipvlan_find_addr(const struct ipvl_dev *ipvlan,
diff --git a/drivers/net/ipvlan/ipvlan_core.c b/drivers/net/ipvlan/ipvlan_core.c
index c30b5c300c05..58891666088c 100644
--- a/drivers/net/ipvlan/ipvlan_core.c
+++ b/drivers/net/ipvlan/ipvlan_core.c
@@ -189,64 +189,85 @@ unsigned int ipvlan_mac_hash(const unsigned char *addr)
return hash  IPVLAN_MAC_FILTER_MASK;
 }
 
-static void ipvlan_multicast_frame(struct ipvl_port *port, struct sk_buff *skb,
-  const struct ipvl_dev *in_dev, bool local)
+void ipvlan_process_multicast(struct work_struct *work)
 {
-   struct ethhdr *eth = eth_hdr(skb);
+   struct ipvl_port *port = container_of(work, struct ipvl_port, wq);
+   struct ethhdr *ethh;
struct ipvl_dev *ipvlan;
-   struct sk_buff *nskb;
+   struct sk_buff *skb, *nskb;
+   struct sk_buff_head list;
unsigned int len;
unsigned int mac_hash;
int ret;
+   u8 pkt_type;
+   bool hlocal, dlocal;
 
-   if (skb-protocol == htons(ETH_P_PAUSE))
-   return;
-
-   rcu_read_lock();
-   list_for_each_entry_rcu(ipvlan, port-ipvlans, pnode) {
-   if (local  (ipvlan == in_dev))
-   continue;
+   __skb_queue_head_init(list);
 
-   mac_hash = ipvlan_mac_hash(eth-h_dest);
-   if (!test_bit(mac_hash, ipvlan-mac_filters))
-   continue;
+   spin_lock_bh(port-backlog.lock);
+   skb_queue_splice_tail_init(port-backlog, list);
+   spin_unlock_bh(port-backlog.lock);
 
-   ret = NET_RX_DROP;
-   len = skb-len + ETH_HLEN;
-   nskb = skb_clone(skb, GFP_ATOMIC);
-   if (!nskb)
-   goto mcast_acct;
+   while ((skb = __skb_dequeue(list)) != NULL) {
+   ethh = eth_hdr(skb);
+   hlocal = ether_addr_equal(ethh-h_source, port-dev-dev_addr);
+   mac_hash = ipvlan_mac_hash(ethh-h_dest);
 
-   if (ether_addr_equal(eth-h_dest, ipvlan-phy_dev-broadcast))
-   nskb-pkt_type = PACKET_BROADCAST;
+   if (ether_addr_equal(ethh-h_dest, port-dev-broadcast))
+   pkt_type = PACKET_BROADCAST;
else
-   nskb-pkt_type = PACKET_MULTICAST;
+   pkt_type = PACKET_MULTICAST;
+
+   dlocal = false;
+   rcu_read_lock();
+   list_for_each_entry_rcu(ipvlan, port-ipvlans, pnode) {
+   if (hlocal  (ipvlan-dev == skb-dev)) {
+   dlocal = true;
+   continue;
+   }
+

[PATCH next 3/3] ipvlan: Always set broadcast bit in multicast filter

2015-04-23 Thread Mahesh Bandewar
Earlier tricks of setting broadcast bit only when IPv4 address is added
onto interface are not good enough especially when autoconf comes in play.
Setting them on always is performance drag but now that multicast /
broadcast is not processed in fast-path; enabling broadcast will let
autoconf work correctly without affecting performance characteristics of
the device.

Signed-off-by: Mahesh Bandewar mahe...@google.com
---
 drivers/net/ipvlan/ipvlan_main.c | 20 ++--
 1 file changed, 6 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c
index a16d3017fdc3..8fe9d5ff331e 100644
--- a/drivers/net/ipvlan/ipvlan_main.c
+++ b/drivers/net/ipvlan/ipvlan_main.c
@@ -218,17 +218,6 @@ static void ipvlan_change_rx_flags(struct net_device *dev, 
int change)
dev_set_allmulti(phy_dev, dev-flags  IFF_ALLMULTI? 1 : -1);
 }
 
-static void ipvlan_set_broadcast_mac_filter(struct ipvl_dev *ipvlan, bool set)
-{
-   struct net_device *dev = ipvlan-dev;
-   unsigned int hashbit = ipvlan_mac_hash(dev-broadcast);
-
-   if (set  !test_bit(hashbit, ipvlan-mac_filters))
-   __set_bit(hashbit, ipvlan-mac_filters);
-   else if (!set  test_bit(hashbit, ipvlan-mac_filters))
-   __clear_bit(hashbit, ipvlan-mac_filters);
-}
-
 static void ipvlan_set_multicast_mac_filter(struct net_device *dev)
 {
struct ipvl_dev *ipvlan = netdev_priv(dev);
@@ -243,6 +232,12 @@ static void ipvlan_set_multicast_mac_filter(struct 
net_device *dev)
netdev_for_each_mc_addr(ha, dev)
__set_bit(ipvlan_mac_hash(ha-addr), mc_filters);
 
+   /* Turn-on broadcast bit irrespective of address family,
+* since broadcast is deferred to a work-queue, hence no
+* impact on unicast fast-path processing.
+*/
+   __set_bit(ipvlan_mac_hash(dev-broadcast), mc_filters);
+
bitmap_copy(ipvlan-mac_filters, mc_filters,
IPVLAN_MAC_FILTER_SIZE);
}
@@ -710,7 +705,6 @@ static int ipvlan_add_addr4(struct ipvl_dev *ipvlan, struct 
in_addr *ip4_addr)
 */
if (netif_running(ipvlan-dev))
ipvlan_ht_addr_add(ipvlan, addr);
-   ipvlan_set_broadcast_mac_filter(ipvlan, true);
 
return 0;
 }
@@ -727,8 +721,6 @@ static void ipvlan_del_addr4(struct ipvl_dev *ipvlan, 
struct in_addr *ip4_addr)
list_del(addr-anode);
ipvlan-ipv4cnt--;
WARN_ON(ipvlan-ipv4cnt  0);
-   if (!ipvlan-ipv4cnt)
-   ipvlan_set_broadcast_mac_filter(ipvlan, false);
kfree_rcu(addr, rcu);
 
return;
-- 
2.2.0.rc0.207.ga3a616c

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH next 2/3] ipvlan: Process fragmented multicast frames correctly

2015-04-23 Thread Mahesh Bandewar
Multicast processing in IPvlan was faulty as is. Eric Dumazet
pointed out that fragmented packets wont be processed correctly
unless defrag step is introduced.

This patch adds the defrag step before driver attempts to process
multicast frame(s).

Signed-off-by: Mahesh Bandewar mahe...@google.com
---
 drivers/net/ipvlan/ipvlan_core.c | 8 
 include/net/ip.h | 1 +
 2 files changed, 9 insertions(+)

diff --git a/drivers/net/ipvlan/ipvlan_core.c b/drivers/net/ipvlan/ipvlan_core.c
index 58891666088c..282d32b16a87 100644
--- a/drivers/net/ipvlan/ipvlan_core.c
+++ b/drivers/net/ipvlan/ipvlan_core.c
@@ -532,6 +532,10 @@ static int ipvlan_xmit_mode_l2(struct sk_buff *skb, struct 
net_device *dev)
return dev_forward_skb(ipvlan-phy_dev, skb);
 
} else if (is_multicast_ether_addr(eth-h_dest)) {
+   skb = ip_check_defrag(skb, IP_DEFRAG_IPVLAN);
+   if (!skb)
+   return RX_HANDLER_CONSUMED;
+
ipvlan_multicast_enqueue(ipvlan-port, skb);
return NET_XMIT_SUCCESS;
}
@@ -617,6 +621,10 @@ static rx_handler_result_t ipvlan_handle_mode_l2(struct 
sk_buff **pskb,
int addr_type;
 
if (is_multicast_ether_addr(eth-h_dest)) {
+   skb = ip_check_defrag(skb, IP_DEFRAG_IPVLAN);
+   if (!skb)
+   return RX_HANDLER_CONSUMED;
+
if (ipvlan_external_frame(skb, port)) {
ipvlan_multicast_enqueue(port, skb);
return RX_HANDLER_CONSUMED;
diff --git a/include/net/ip.h b/include/net/ip.h
index d14af7edd197..1fb432b346a8 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -476,6 +476,7 @@ enum ip_defrag_users {
IP_DEFRAG_VS_FWD,
IP_DEFRAG_AF_PACKET,
IP_DEFRAG_MACVLAN,
+   IP_DEFRAG_IPVLAN,
 };
 
 int ip_defrag(struct sk_buff *skb, u32 user);
-- 
2.2.0.rc0.207.ga3a616c

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/2] checkpatch: suggest using eth_zero_addr() and eth_broadcast_addr()

2015-04-23 Thread Joe Perches
On Thu, 2015-04-23 at 21:53 +0200, Mateusz Kulikowski wrote:
 On 22.04.2015 00:27, Joe Perches wrote:
  On Tue, 2015-04-21 at 23:44 +0200, Mateusz Kulikowski wrote:
  On 21.04.2015 23:22, Joe Perches wrote:
  On Tue, 2015-04-21 at 22:57 +0200, Mateusz Kulikowski wrote:
  (...)
 (...)
  True, True; If you prefer $line and ability to --fix - I'll use that in v3
  
  I suppose you could do both $line and $stat
  and the fix would only work when it's on a
  single line.
  
  Perhaps something like this would work:
  
  if ($line =~ /whatever/ ||
  (defined($stat)  $stat =~ /whatever/)) {
  if (WARN(...) 
  $fix) {
  fixed[$fixlinenr] =~ s/whatever/appropriate/;
  }
  }
 
 Isn't it enough to just match $stat and do fix for line (that in 
 some cases will just not match)?

Yeah, that'd work too.

 One more thing
 I noticed funny behavior about $stat matches - 
 it reports the same error several times (including as scope whole file)
 Is it feature or feature or I missed something?

You have to make sure the first character of $stat is a +

if ($stat =~ /\+(?:.*)\bmemfoo



--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Resource usages in Linux drivers

2015-04-23 Thread David Miller
From: Florian Fainelli f.faine...@gmail.com
Date: Thu, 23 Apr 2015 17:40:12 -0700

 Well, for one, we could have a device managed register_netdev()
 which cleans up resources in case of failures and calls
 free_netdev() automatically, but is that adding much value?

I don't think it's worth it at all.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 2/3] tc: deprecate TC_ACT_QUEUED

2015-04-23 Thread Alexei Starovoitov

On 4/23/15 3:51 PM, Jamal Hadi Salim wrote:


So you are planning to add queues? If you are that is a different
discussion (and the use case needs some clarity).


nope. I wasn't planning to do that.

 For packets being forwarded we already had egress qdiscs which had
 queues so it didnt seem to make sense to enqueue on ingress for such
 use cases.
 There was a use case later where multiple ingress ports had to be
 shared
 and ifb was born - which is pseudo temporary enqueueing on ingress.

agree. imo ifb approach is more flexible, since it has full hierarchy
of qdiscs. As you're saying above and from the old ifb logs I thought
that ifb is _temporary_ and long term plan is to use ingress_queue,
but looks like this is not the case. Also not too long ago we decided
that we don't want another ingress qdisc. Therefore I think it makes
sense to simplify the code and boost performance.
I'm not proposing to change tooling, of course.
From iproute2 point of view we'll still have ingress qdisc.
Right now we're pointlessly allocating memory for it and for
ingress_queue, whereas we only need to call cls/act.
I'm proposing to kill them and used tcf_proto in net_device instead.
Right now to reach cls in critical path on ingress we do:
rxq = skb-dev-ingress_queue
sch = rxq-qdisc
sch-enqueue
sch-filter_list
with a bunch of 'if' conditions and useless memory accesses in-between.
If we add 'ingress_filter_list' directly to net_device,
it will be just:
skb-dev-ingress_filter_list
which is huge performance boost. Code size will shrink as well.
iproute2 and all existing tools will work as-is. Looks as win-win to me.


The fact that qdiscs dealt with these codes directly
allows for specialized handling. Moving them to a generic function
seems to defeat that purpose. So I am siding with Cong on this.


that's not what patch 1 is doing. It is still doing specialized
handling... but in light of what you said above, it looks like much
bigger cleanup is needed. We'll continue arguing when I refactor
this set and resubmit when net-next reopens.


Sorry - i was refereing to the patch where you agregated things after
the qdisc invokes a classifier.


hmm. There I also didn't convert all qdiscs into single helper.
Only those that have exactly the same logic after tc_classify.
All other qdiscs have custom handling.
No worries, it's hard to review things while traveling. Been there :)

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net] inet: fix possible panic in reqsk_queue_unlink()

2015-04-23 Thread Eric Dumazet
From: Eric Dumazet eduma...@google.com

[ 3897.923145] BUG: unable to handle kernel NULL pointer dereference at
 0080
[ 3897.931025] IP: [a9f27686] reqsk_timer_handler+0x1a6/0x243

There is a race when reqsk_timer_handler() and tcp_check_req() call
inet_csk_reqsk_queue_unlink() on the same req at the same time.

Before commit fa76ce7328b2 (inet: get rid of central tcp/dccp listener
timer), listener spinlock was held and race could not happen.

To solve this bug, we change reqsk_queue_unlink() to not assume req
must be found, and we return a status, to conditionally release a
refcount on the request sock.

This also means tcp_check_req() in non fastopen case might or not
consume req refcount, so tcp_v6_hnd_req()  tcp_v4_hnd_req() have
to properly handle this.

(Same remark for dccp_check_req() and its callers)

inet_csk_reqsk_queue_drop() is now too big to be inlined, as it is
called 4 times in tcp and 3 times in dccp.

Fixes: fa76ce7328b2 (inet: get rid of central tcp/dccp listener timer)
Signed-off-by: Eric Dumazet eduma...@google.com
Reported-by: Yuchung Cheng ych...@google.com
---
 include/net/inet_connection_sock.h |   20 ---
 include/net/request_sock.h |   18 --
 net/dccp/ipv4.c|3 +-
 net/dccp/ipv6.c|3 +-
 net/dccp/minisocks.c   |3 --
 net/ipv4/inet_connection_sock.c|   34 +++
 net/ipv4/tcp_ipv4.c|3 +-
 net/ipv4/tcp_minisocks.c   |7 +++--
 net/ipv6/tcp_ipv6.c|3 +-
 9 files changed, 48 insertions(+), 46 deletions(-)

diff --git a/include/net/inet_connection_sock.h 
b/include/net/inet_connection_sock.h
index 7b5887cd1172..48a815823587 100644
--- a/include/net/inet_connection_sock.h
+++ b/include/net/inet_connection_sock.h
@@ -279,12 +279,6 @@ static inline void inet_csk_reqsk_queue_add(struct sock 
*sk,
 void inet_csk_reqsk_queue_hash_add(struct sock *sk, struct request_sock *req,
   unsigned long timeout);
 
-static inline void inet_csk_reqsk_queue_removed(struct sock *sk,
-   struct request_sock *req)
-{
-   reqsk_queue_removed(inet_csk(sk)-icsk_accept_queue, req);
-}
-
 static inline void inet_csk_reqsk_queue_added(struct sock *sk,
  const unsigned long timeout)
 {
@@ -306,19 +300,7 @@ static inline int inet_csk_reqsk_queue_is_full(const 
struct sock *sk)
return reqsk_queue_is_full(inet_csk(sk)-icsk_accept_queue);
 }
 
-static inline void inet_csk_reqsk_queue_unlink(struct sock *sk,
-  struct request_sock *req)
-{
-   reqsk_queue_unlink(inet_csk(sk)-icsk_accept_queue, req);
-}
-
-static inline void inet_csk_reqsk_queue_drop(struct sock *sk,
-struct request_sock *req)
-{
-   inet_csk_reqsk_queue_unlink(sk, req);
-   inet_csk_reqsk_queue_removed(sk, req);
-   reqsk_put(req);
-}
+void inet_csk_reqsk_queue_drop(struct sock *sk, struct request_sock *req);
 
 void inet_csk_destroy_sock(struct sock *sk);
 void inet_csk_prepare_forced_close(struct sock *sk);
diff --git a/include/net/request_sock.h b/include/net/request_sock.h
index fe41f3ceb008..9f4265ce8892 100644
--- a/include/net/request_sock.h
+++ b/include/net/request_sock.h
@@ -212,24 +212,6 @@ static inline int reqsk_queue_empty(struct 
request_sock_queue *queue)
return queue-rskq_accept_head == NULL;
 }
 
-static inline void reqsk_queue_unlink(struct request_sock_queue *queue,
- struct request_sock *req)
-{
-   struct listen_sock *lopt = queue-listen_opt;
-   struct request_sock **prev;
-
-   spin_lock(queue-syn_wait_lock);
-
-   prev = lopt-syn_table[req-rsk_hash];
-   while (*prev != req)
-   prev = (*prev)-dl_next;
-   *prev = req-dl_next;
-
-   spin_unlock(queue-syn_wait_lock);
-   if (del_timer(req-rsk_timer))
-   reqsk_put(req);
-}
-
 static inline void reqsk_queue_add(struct request_sock_queue *queue,
   struct request_sock *req,
   struct sock *parent,
diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c
index 2b4f21d34df6..ccf4c5629b3c 100644
--- a/net/dccp/ipv4.c
+++ b/net/dccp/ipv4.c
@@ -453,7 +453,8 @@ static struct sock *dccp_v4_hnd_req(struct sock *sk, struct 
sk_buff *skb)
   iph-saddr, iph-daddr);
if (req) {
nsk = dccp_check_req(sk, skb, req);
-   reqsk_put(req);
+   if (!nsk)
+   reqsk_put(req);
return nsk;
}
nsk = inet_lookup_established(sock_net(sk), dccp_hashinfo,
diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c
index 9d0551092c6c..5165571f397a 100644
--- a/net/dccp/ipv6.c
+++ b/net/dccp/ipv6.c
@@ -301,7 

Re: CCM/GCM implementation defect

2015-04-23 Thread Herbert Xu
On Thu, Apr 23, 2015 at 01:45:34PM +0200, Steffen Klassert wrote:

 Adding a second template for the correct implementation is
 probaply the only thing that we can do if we don't want to
 break backwards compatibility. But maybe we can add a warning
 to the old implementation, such that users notice that they
 use a broken version.

If we are going to do a warning I think the place to do it would
be in xfrm_algo.c.  We could add an insecure/warning flag and if
then print a warning if said algorithm is used.

Cheers,
-- 
Email: Herbert Xu herb...@gondor.apana.org.au
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Resource usages in Linux drivers

2015-04-23 Thread Florian Fainelli
On 23/04/15 16:19, Francois Romieu wrote:
 Sergei Shtylyov sergei.shtyl...@cogentembedded.com :
 On 4/23/2015 1:08 PM, Jia-Ju Bai wrote:
 [...]
 I also find many drivers do not use these managed APIs, especially in 
 ethernet
 card drivers (like e100, r8169). Is it possible to change them?

Patches welcome. :-)
 
 I respectfully disagree.

Me too, most of the device managed conversions we have seen were bogus
because they were done in a semi-automated way without understanding the
peculiarities of the network devices, that is the separation between
init/open/close that most other device drivers do not have.

A typical example is this:
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=6892b41d9701283085b655c6086fb57a5d63fa47

 
 If someone believes basic resouce management to be too hard or error-prone
 to handle, he should imvho seriously rise the bar and reconsider the way
 he wants to contribute to the kernel.

Well, for one, we could have a device managed register_netdev() which
cleans up resources in case of failures and calls free_netdev()
automatically, but is that adding much value?

 
 I may hope he who reads e100.c to think about DMA api, bql or rx ring
 holes avoidance to quote a few ones. Managed API ? Mildly...
 


-- 
Florian
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fix tcp fin memory accounting

2015-04-23 Thread Josh Hunt

On 04/21/2015 07:09 PM, Eric Dumazet wrote:


Note that this patch adds a deadlock possibility in some stress
situations.

If a process owning some tcp socket dies, and tcp_mem[2] is already hit,
all sk_stream_alloc_skb() can return NULL and we loop in tcp_send_fin(),
making no progress because we can not free any tcp memory.


Ugh. Thanks for fixing this Eric!

Josh
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fix tcp fin memory accounting

2015-04-23 Thread Eric Dumazet
On Thu, 2015-04-23 at 08:48 -0500, Josh Hunt wrote:
 On 04/21/2015 07:09 PM, Eric Dumazet wrote:
 
  Note that this patch adds a deadlock possibility in some stress
  situations.
 
  If a process owning some tcp socket dies, and tcp_mem[2] is already hit,
  all sk_stream_alloc_skb() can return NULL and we loop in tcp_send_fin(),
  making no progress because we can not free any tcp memory.
 
 Ugh. Thanks for fixing this Eric!

No problem ;)

For the record, I've tested this followup patch that I'll formally
submit when net-next reopens :

If there is one already sent skb in write queue
(we look at the tail of course), and :
- We are under TCP memory pressure, 
- or allocation of a fresh skb using GFP_KERNEL failed,

- we add the FIN flag that will eventually be sent later after a
timeout.

 net/ipv4/tcp_output.c |   41 
 1 file changed, 21 insertions(+), 20 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 2ade67b7cdb0..fe6558eb64f3 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2827,33 +2827,34 @@ static void sk_forced_wmem_schedule(struct sock *sk, 
int size)
sk_memory_allocated_add(sk, amt, status);
 }
 
-/* Send a fin.  The caller locks the socket for us.  This cannot be
- * allowed to fail queueing a FIN frame under any circumstances.
+/* Send a FIN. The caller locks the socket for us.
+ * We should try to send a FIN packet really hard, but eventually give up.
  */
 void tcp_send_fin(struct sock *sk)
 {
+   struct sk_buff *skb, *tskb = tcp_write_queue_tail(sk);
struct tcp_sock *tp = tcp_sk(sk);
-   struct sk_buff *skb = tcp_write_queue_tail(sk);
-   int mss_now;
 
-   /* Optimization, tack on the FIN if we have a queue of
-* unsent frames.  But be careful about outgoing SACKS
-* and IP options.
+   /* Optimization, tack on the FIN if we have one skb in write queue and
+* this skb was not yet sent, or we are under memory pressure.
+* Note: in the latter case, FIN packet will be sent after a timeout,
+* as TCP stack thinks it has been transmitted once.
 */
-   mss_now = tcp_current_mss(sk);
-
-   if (tcp_send_head(sk)) {
-   TCP_SKB_CB(skb)-tcp_flags |= TCPHDR_FIN;
-   TCP_SKB_CB(skb)-end_seq++;
+   if (tskb  (tcp_send_head(sk) || sk_under_memory_pressure(sk))) {
+coalesce:
+   TCP_SKB_CB(tskb)-tcp_flags |= TCPHDR_FIN;
+   TCP_SKB_CB(tskb)-end_seq++;
tp-write_seq++;
+   if (!tcp_send_head(sk)) {
+   tp-snd_nxt++;
+   return;
+   }
} else {
-   /* Socket is locked, keep trying until memory is available. */
-   for (;;) {
-   skb = alloc_skb_fclone(MAX_TCP_HEADER,
-  sk-sk_allocation);
-   if (skb)
-   break;
-   yield();
+   skb = alloc_skb_fclone(MAX_TCP_HEADER, sk-sk_allocation);
+   if (unlikely(!skb)) {
+   if (tskb)
+   goto coalesce;
+   return;
}
skb_reserve(skb, MAX_TCP_HEADER);
sk_forced_wmem_schedule(sk, skb-truesize);
@@ -2862,7 +2863,7 @@ void tcp_send_fin(struct sock *sk)
 TCPHDR_ACK | TCPHDR_FIN);
tcp_queue_skb(sk, skb);
}
-   __tcp_push_pending_frames(sk, mss_now, TCP_NAGLE_OFF);
+   __tcp_push_pending_frames(sk, tcp_current_mss(sk), TCP_NAGLE_OFF);
 }
 
 /* We get here when a process closes a file descriptor (either due to


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next v2] be2net: log link status

2015-04-23 Thread Ivan Vecera

On 04/23/2015 08:31 AM, Sathya Perla wrote:

-Original Message-
From: Ivan Vecera [mailto:ivec...@redhat.com]

The driver unlike other drivers does not log link state changes.

v2: added current link speed to log message


Ivan, I disagree with the v2 change. I think your original intention
was just to log a message when the link goes up or down
asynchronously (i.e., without any user intervention.)
After alerting the user, if the user wants to know other link
properties like speed, duplex etc then the ethtool cmd needs
to be used; there is no need to log a message with those details.
Yes, this was my original intention... to see these async link events in 
the system log.

In this case the v1 should be used.

Ivan

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH net-next v2] be2net: log link status

2015-04-23 Thread Sathya Perla
 -Original Message-
 From: Ivan Vecera [mailto:ivec...@redhat.com]
 
 The driver unlike other drivers does not log link state changes.
 
 v2: added current link speed to log message
 
Ivan, I disagree with the v2 change. I think your original intention
was just to log a message when the link goes up or down 
asynchronously (i.e., without any user intervention.)
After alerting the user, if the user wants to know other link
properties like speed, duplex etc then the ethtool cmd needs 
to be used; there is no need to log a message with those details.

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 2/3] tc: deprecate TC_ACT_QUEUED

2015-04-23 Thread Daniel Borkmann

On 04/23/2015 04:46 AM, Alexei Starovoitov wrote:
...

The other two threads degenerated into non-technical comments.


Yep. :-/


Anyway, this set was RFC to answer my main question whether I should
continue with tc cleanup or stop right here. I got my answer.


I think it's worth proceeding with this, it's long overdue.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch added to the 3.12 stable tree] net: ethernet: pcnet32: Setup the SRAM and NOUFLO on Am79C97{3, 5}

2015-04-23 Thread Jiri Slaby
From: Markos Chandras markos.chand...@imgtec.com

This patch has been added to the 3.12 stable tree. If you have any
objections, please let us know.

===

commit 87f966d97b89774162df04d2106c6350c8fe4cb3 upstream.

On a MIPS Malta board, tons of fifo underflow errors have been observed
when using u-boot as bootloader instead of YAMON. The reason for that
is that YAMON used to set the pcnet device to SRAM mode but u-boot does
not. As a result, the default Tx threshold (64 bytes) is now too small to
keep the fifo relatively used and it can result to Tx fifo underflow errors.
As a result of which, it's best to setup the SRAM on supported controllers
so we can always use the NOUFLO bit.

Cc: netdev@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Cc: Don Fry pcne...@frontier.com
Signed-off-by: Markos Chandras markos.chand...@imgtec.com
Signed-off-by: David S. Miller da...@davemloft.net
Signed-off-by: Jiri Slaby jsl...@suse.cz
---
 drivers/net/ethernet/amd/pcnet32.c | 31 +--
 1 file changed, 29 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/amd/pcnet32.c 
b/drivers/net/ethernet/amd/pcnet32.c
index 2d8e28819779..048743573230 100644
--- a/drivers/net/ethernet/amd/pcnet32.c
+++ b/drivers/net/ethernet/amd/pcnet32.c
@@ -1516,7 +1516,7 @@ pcnet32_probe1(unsigned long ioaddr, int shared, struct 
pci_dev *pdev)
 {
struct pcnet32_private *lp;
int i, media;
-   int fdx, mii, fset, dxsuflo;
+   int fdx, mii, fset, dxsuflo, sram;
int chip_version;
char *chipname;
struct net_device *dev;
@@ -1553,7 +1553,7 @@ pcnet32_probe1(unsigned long ioaddr, int shared, struct 
pci_dev *pdev)
}
 
/* initialize variables */
-   fdx = mii = fset = dxsuflo = 0;
+   fdx = mii = fset = dxsuflo = sram = 0;
chip_version = (chip_version  12)  0x;
 
switch (chip_version) {
@@ -1586,6 +1586,7 @@ pcnet32_probe1(unsigned long ioaddr, int shared, struct 
pci_dev *pdev)
chipname = PCnet/FAST III 79C973; /* PCI */
fdx = 1;
mii = 1;
+   sram = 1;
break;
case 0x2626:
chipname = PCnet/Home 79C978; /* PCI */
@@ -1609,6 +1610,7 @@ pcnet32_probe1(unsigned long ioaddr, int shared, struct 
pci_dev *pdev)
chipname = PCnet/FAST III 79C975; /* PCI */
fdx = 1;
mii = 1;
+   sram = 1;
break;
case 0x2628:
chipname = PCnet/PRO 79C976;
@@ -1637,6 +1639,31 @@ pcnet32_probe1(unsigned long ioaddr, int shared, struct 
pci_dev *pdev)
dxsuflo = 1;
}
 
+   /*
+* The Am79C973/Am79C975 controllers come with 12K of SRAM
+* which we can use for the Tx/Rx buffers but most importantly,
+* the use of SRAM allow us to use the BCR18:NOUFLO bit to avoid
+* Tx fifo underflows.
+*/
+   if (sram) {
+   /*
+* The SRAM is being configured in two steps. First we
+* set the SRAM size in the BCR25:SRAM_SIZE bits. According
+* to the datasheet, each bit corresponds to a 512-byte
+* page so we can have at most 24 pages. The SRAM_SIZE
+* holds the value of the upper 8 bits of the 16-bit SRAM size.
+* The low 8-bits start at 0x00 and end at 0xff. So the
+* address range is from 0x up to 0x17ff. Therefore,
+* the SRAM_SIZE is set to 0x17. The next step is to set
+* the BCR26:SRAM_BND midway through so the Tx and Rx
+* buffers can share the SRAM equally.
+*/
+   a-write_bcr(ioaddr, 25, 0x17);
+   a-write_bcr(ioaddr, 26, 0xc);
+   /* And finally enable the NOUFLO bit */
+   a-write_bcr(ioaddr, 18, a-read_bcr(ioaddr, 18) | (1  11));
+   }
+
dev = alloc_etherdev(sizeof(*lp));
if (!dev) {
ret = -ENOMEM;
-- 
2.3.5

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH net-next v2] be2net: log link status

2015-04-23 Thread Sathya Perla
 -Original Message-
 From: Joe Perches [mailto:j...@perches.com]
 
 On Wed, 2015-04-22 at 15:43 +0200, Ivan Vecera wrote:
  The driver unlike other drivers does not log link state changes.
 
...
 
 Question for the emulex folk:
 
 Is the dom argument to link_status_query necessary?
 It's currently always 0.

It's needed only if the PF wants to query the link_status of a VF, which
it doesn't do currently...
Yes, its un-necessary and can be removed (in a separate patch I guess :-)).

thanks,
-Sathya
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH net-next] be2net: log link status

2015-04-23 Thread Sathya Perla
 -Original Message-
 From: Ivan Vecera [mailto:ivec...@redhat.com]
 
 The driver unlike other drivers does not log link state changes.
 
 Cc: Sathya Perla sathya.pe...@emulex.com
 Cc: Subbu Seetharaman subbu.seethara...@emulex.com
 Cc: Ajit Khaparde ajit.khapa...@emulex.com
 Signed-off-by: Ivan Vecera ivec...@redhat.com
 ---
  drivers/net/ethernet/emulex/benet/be_main.c | 2 ++
  1 file changed, 2 insertions(+)
 
 diff --git a/drivers/net/ethernet/emulex/benet/be_main.c 
 b/drivers/net/ethernet/emulex/benet/be_main.c
 index fb0bc3c..e349131 100644
 --- a/drivers/net/ethernet/emulex/benet/be_main.c
 +++ b/drivers/net/ethernet/emulex/benet/be_main.c
 @@ -662,6 +662,8 @@ void be_link_status_update(struct be_adapter *adapter, u8 
 link_status)
   netif_carrier_on(netdev);
   else
   netif_carrier_off(netdev);
 +
 + netdev_info(netdev, Link is %s\n, link_status ? Up : Down);
  }
 
Acked-by: Sathya Perla sathya.pe...@emulex.com
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: CCM/GCM implementation defect

2015-04-23 Thread Horia Geantă
On 4/23/2015 6:26 AM, Herbert Xu wrote:
 Hi:

 It looks like our IPsec implementations of CCM and GCM are buggy
This applies also to GMAC (rfc4543), right?

 in that they don't include the IV in the authentication calculation.

 This definitely breaks interoperability with anyone who implements
 them correctly.  The fact that there have been no reports on this
 probably means that nobody has run into this in the field yet.
Does this mean that even the test vectors (crypto/testmgr.h) are broken?

Thanks,
Horia


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Resource usages in Linux drivers

2015-04-23 Thread Jia-Ju Bai

Thank you!

On 04/23/2015 05:19 PM, Sergei Shtylyov wrote:

Hello.

On 4/23/2015 9:45 AM, Jia-Ju Bai wrote:


Dear Sir,



I am very sorry to trouble you.


I find that resource management is error-prone when writing Linux 
drivers, and

many problems may occur, such as resource leaks.
Meanwhile, I find that many applied patches in the kernel mailing 
list focus

on releasing allocated resources, especially in error-handling paths.



Therefore, I have a question: is it possible to automatically release
allocated resources in drivers before unloading and in error-handling 
paths?


   Yes, there's managed device API, look for functions starting with 
devm_.
There's one limitation though: it can be used only in the driver's 
probe() method, so can't be used when e.g. network device is being 
opened.
I think many APIs, such as kmalloc, can also be managed like garbage 
collection in Java.

Maybe the performance is a matter.




I am looking forward to your reply, thanks!


   Such questions should actually be asked on the mailing lists, not 
personally.


I am sorry for that, and I will cc to the mailing lists and other 
maintainers.




--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 1/3] tc: fix return values of ingress qdisc

2015-04-23 Thread Thomas Graf
On 04/22/15 at 04:29pm, Cong Wang wrote:
 On Wed, Apr 22, 2015 at 3:04 PM, Alexei Starovoitov a...@plumgrid.com wrote:
  On 4/21/15 9:59 PM, Cong Wang wrote:
 
  On Tue, Apr 21, 2015 at 12:27 PM, Alexei Starovoitov a...@plumgrid.com
  wrote:
 
  ingress qdisc should return NET_XMIT_* values just like all other qdiscs.
 
 
  XMIT already means egress...
 
 
  may be then it should be renamed as well.
  from include/linux/netdevice.h:
  /* qdisc -enqueue() return codes. */
  #define NET_XMIT_SUCCESS0x00
  ...
 
  the point is that qdisc-enqeue() must return NET_XMIT_* values.
  ingress qdisc is violating this and therefore should be fixed.
 
 XMIT is non-sense for ingress, you really need to pick another
 name for it if TC_ACT_OK isn't okay for you (it is okay for me).

You transmit into a qdisc. If that terminology doesn't suit
you then rename it to NET_QUEUE_* but moving away from returning
TC_ACT_* is definitely the right thing to do here.
 
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 3/3] tc: cleanup tc_classify

2015-04-23 Thread Thomas Graf
On 04/22/15 at 04:38pm, Cong Wang wrote:
 On Wed, Apr 22, 2015 at 3:27 PM, Alexei Starovoitov a...@plumgrid.com wrote:
  On 4/21/15 10:05 PM, Cong Wang wrote:
 
  On Tue, Apr 21, 2015 at 12:27 PM, Alexei Starovoitov a...@plumgrid.com
  wrote:
 
  introduce tc_classify_act() and qdisc_drop_bypass() helper functions to
  reduce
  copy-paste among different qdiscs

I like this cleanup. It aligns all skb dropping in qdiscs to a
qdisc_drop*() function.

  I don't think qdisc_drop_bypass() is more readable than without it,
  maybe you need a better name, or just leave the code as it is.
 
 
  what would be a better name? I'm open to suggestions.
 
 My reading for qdisc_drop_bypass() is it bypasses packet
 dropping for some case, apparently doesn't match its definition.
 
 I can't think out a better name therefore I don't think it deserves
 a function, just leave as it is.

Interesting logic ;-)
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: CCM/GCM implementation defect

2015-04-23 Thread Herbert Xu
On Thu, Apr 23, 2015 at 12:03:38PM +0300, Horia Geantă wrote:
 This applies also to GMAC (rfc4543), right?

No RFC4543 appears to be correctly implemented.

 Does this mean that even the test vectors (crypto/testmgr.h) are broken?

Indeed.  The test vectors appear to be generated either through
our implementation or by one that is identical to us.

Cheers,
-- 
Email: Herbert Xu herb...@gondor.apana.org.au
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] netdev_alloc_pcpu_stats: use less common iterator variable

2015-04-23 Thread Johannes Berg
From: Johannes Berg johannes.b...@intel.com

With the CPU iteration variable called 'i', it's relatively easy
to have variable shadowing which sparse will warn about. Avoid
that by renaming the variable to __cpu which is less likely to
be used in the surrounding context.

Signed-off-by: Johannes Berg johannes.b...@intel.com
---
 include/linux/netdevice.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index bcbde799ec69..6719041d8851 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2021,10 +2021,10 @@ struct pcpu_sw_netstats {
 ({ \
typeof(type) __percpu *pcpu_stats = alloc_percpu(type); \
if (pcpu_stats) {   \
-   int i;  \
-   for_each_possible_cpu(i) {  \
+   int __cpu;  \
+   for_each_possible_cpu(__cpu) {  \
typeof(type) *stat; \
-   stat = per_cpu_ptr(pcpu_stats, i);  \
+   stat = per_cpu_ptr(pcpu_stats, __cpu);  \
u64_stats_init(stat-syncp);   \
}   \
}   \
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Resource usages in Linux drivers

2015-04-23 Thread Jia-Ju Bai

On 04/23/2015 05:29 PM, Jia-Ju Bai wrote:

Thank you!

On 04/23/2015 05:19 PM, Sergei Shtylyov wrote:

Hello.

On 4/23/2015 9:45 AM, Jia-Ju Bai wrote:


Dear Sir,



I am very sorry to trouble you.


I find that resource management is error-prone when writing Linux 
drivers, and

many problems may occur, such as resource leaks.
Meanwhile, I find that many applied patches in the kernel mailing 
list focus

on releasing allocated resources, especially in error-handling paths.



Therefore, I have a question: is it possible to automatically release
allocated resources in drivers before unloading and in 
error-handling paths?


   Yes, there's managed device API, look for functions starting with 
devm_.
There's one limitation though: it can be used only in the driver's 
probe() method, so can't be used when e.g. network device is being 
opened.
I think many APIs, such as kmalloc, can also be managed like garbage 
collection in Java.

Maybe the performance is a matter.




I am looking forward to your reply, thanks!


   Such questions should actually be asked on the mailing lists, not 
personally.


I am sorry for that, and I will cc to the mailing lists and other 
maintainers.





I find that some common APIs are not managed, such as napi_enable and 
napi_start_queue. Is it possible to provide managed APIs for them?
I also find many drivers do not use these managed APIs, especially in 
ethernet card drivers (like e100, r8169). Is it possible to change them?



Best wishes,
Jia-Ju Bai

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: CCM/GCM implementation defect

2015-04-23 Thread Martin Willi
Hi Herbert,

  Does this mean that even the test vectors (crypto/testmgr.h) are broken?
 
 Indeed.  The test vectors appear to be generated either through
 our implementation or by one that is identical to us.

I'm not sure about that. RFC4106 refers to [1] for test vectors, which
is still available at web.archive.org [2].

When looking for example at Test Case 3, this is the same as in a newer
revision of the document [3]. That looks exactly the same as
aes_gcm_enc_tv_template[2] from testmgr.h.

We by the way use test vectors in userland from the same document to
verify our own GCM backend, our OpenSSL backend and an AESNI/PCLMULQD
backend. And I've never heard of any incompatibilities.

Regards
Martin

[1]http://csrc.nist.gov/CryptoToolkit/modes/proposedmodes/gcm/gcm-spec.pdf
[2]http://web.archive.org/web/20070712195408/http://csrc.nist.gov/CryptoToolkit/modes/proposedmodes/gcm/gcm-spec.pdf
[3]http://csrc.nist.gov/groups/ST/toolkit/BCM/documents/proposedmodes/gcm/gcm-revised-spec.pdf

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: CCM/GCM implementation defect

2015-04-23 Thread Herbert Xu
On Thu, Apr 23, 2015 at 11:58:52AM +0200, Martin Willi wrote:

 I'm not sure about that. RFC4106 refers to [1] for test vectors, which
 is still available at web.archive.org [2].
 
 When looking for example at Test Case 3, this is the same as in a newer
 revision of the document [3]. That looks exactly the same as
 aes_gcm_enc_tv_template[2] from testmgr.h.

These are GCM test vectors, not RFC4106 test vectors so they are
of no use when you're testing whether the IPsec IV (which is not
the same thing as the GCM IV) is included in the authentication
or not.

AFAIK GCM itself is implemented correctly.  It's only the IPsec
wrapper around it (rfc4106 in particular) that's broken.

Cheers,
-- 
Email: Herbert Xu herb...@gondor.apana.org.au
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Resource usages in Linux drivers

2015-04-23 Thread Sergei Shtylyov

On 4/23/2015 1:08 PM, Jia-Ju Bai wrote:


I am very sorry to trouble you.



I find that resource management is error-prone when writing Linux drivers,
and
many problems may occur, such as resource leaks.
Meanwhile, I find that many applied patches in the kernel mailing list focus
on releasing allocated resources, especially in error-handling paths.



Therefore, I have a question: is it possible to automatically release
allocated resources in drivers before unloading and in error-handling paths?



   Yes, there's managed device API, look for functions starting with devm_.
There's one limitation though: it can be used only in the driver's probe()
method, so can't be used when e.g. network device is being opened.



I think many APIs, such as kmalloc, can also be managed like garbage
collection in Java.
Maybe the performance is a matter.



I am looking forward to your reply, thanks!



   Such questions should actually be asked on the mailing lists, not
personally.



I am sorry for that, and I will cc to the mailing lists and other maintainers.



I find that some common APIs are not managed, such as napi_enable and
napi_start_queue. Is it possible to provide managed APIs for them?


   No, they're only called from ndo_open() method IIRC. The device managed 
APIs can only be called at the device probing time.



I also find many drivers do not use these managed APIs, especially in ethernet
card drivers (like e100, r8169). Is it possible to change them?


   Patches welcome. :-)


Best wishes,
Jia-Ju Bai


WBR, Sergei

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: CCM/GCM implementation defect

2015-04-23 Thread Martin Willi
Hi Steffen,

  It looks like our IPsec implementations of CCM and GCM are buggy
  in that they don't include the IV in the authentication calculation.
 
 Seems like crypto_rfc4106_crypt() passes the associated data it
 got from ESP directly to gcm, without chaining with the IV.

Do you have any pointer for me where this is defined? Why is it needed,
given that GCM implicitly authenticates the IV by using it in Y0?

Also, I've just verified that a tunnel between the Windows Filtering
Platform and Linux 3.13 using AES128GCM16 works just fine. So if we do
something wrong, the problem does not only affect Linux.

Regards
Martin

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] net: phy: micrel: don't do clock-mode-select if we got NULL clock

2015-04-23 Thread Niklas Cassel
Signed-off-by: Niklas Cassel nikl...@axis.com
---
 drivers/net/phy/micrel.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/phy/micrel.c b/drivers/net/phy/micrel.c
index 1190fd8..a422036 100644
--- a/drivers/net/phy/micrel.c
+++ b/drivers/net/phy/micrel.c
@@ -548,7 +548,7 @@ static int kszphy_probe(struct phy_device *phydev)
}
 
clk = devm_clk_get(phydev-dev, rmii-ref);
-   if (!IS_ERR(clk)) {
+   if (!IS_ERR_OR_NULL(clk)) {
unsigned long rate = clk_get_rate(clk);
bool rmii_ref_clk_sel_25_mhz;
 
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net 0/3] tipc: three bug fixes

2015-04-23 Thread Jon Maloy
A set of unrelated corrections; one for the tipc netns implementation,
one regarding problems with random link resets, and one removing a
an erroneous refcount decrement when reading link statistsics via
netlink.

Erik Hugne (2):
  tipc: fix random link reset problem
  tipc: fix node refcount issue

Ying Xue (1):
  tipc: fix topology server broken issue

 net/tipc/link.c   | 1 -
 net/tipc/server.c | 9 +++--
 net/tipc/socket.c | 3 ++-
 3 files changed, 5 insertions(+), 8 deletions(-)

-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net 3/3] tipc: fix node refcount issue

2015-04-23 Thread Jon Maloy
From: Erik Hugne erik.hu...@ericsson.com

When link statistics is dumped over netlink, we iterate over
the list of peer nodes and append each links statistics to
the netlink msg. In the case where the dump is resumed after
filling up a nlmsg, the node refcnt is decremented without
having been incremented previously which may cause the node
reference to be freed. When this happens, the following
info/stacktrace will be generated, followed by a crash or
undefined behavior.
We fix this by removing the erroneous call to tipc_node_put
inside the loop that iterates over nodes.

[  384.312303] INFO: trying to register non-static key.
[  384.313110] the code is fine but needs lockdep annotation.
[  384.313290] turning off the locking correctness validator.
[  384.313290] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.0.0+ #13
[  384.313290] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[  384.313290]  88003c6d0290 88003cc03ca8 8170adf1 
0007
[  384.313290]  82728730 88003cc03d38 810a6a6d 
001d7200
[  384.313290]  88003c6d0ab0 88003cc03ce8 0285 
0001
[  384.313290] Call Trace:
[  384.313290]  IRQ  [8170adf1] dump_stack+0x4c/0x65
[  384.313290]  [810a6a6d] __lock_acquire+0xf3d/0xf50
[  384.313290]  [810a7375] lock_acquire+0xd5/0x290
[  384.313290]  [a0043e8c] ? link_timeout+0x1c/0x170 [tipc]
[  384.313290]  [a0043e70] ? link_state_event+0x4e0/0x4e0 [tipc]
[  384.313290]  [81712890] _raw_spin_lock_bh+0x40/0x80
[  384.313290]  [a0043e8c] ? link_timeout+0x1c/0x170 [tipc]
[  384.313290]  [a0043e8c] link_timeout+0x1c/0x170 [tipc]
[  384.313290]  [810c4698] call_timer_fn+0xb8/0x490
[  384.313290]  [810c45e0] ? process_timeout+0x10/0x10
[  384.313290]  [810c5a2c] run_timer_softirq+0x21c/0x420
[  384.313290]  [a0043e70] ? link_state_event+0x4e0/0x4e0 [tipc]
[  384.313290]  [8105a954] __do_softirq+0xf4/0x630
[  384.313290]  [8105afdd] irq_exit+0x5d/0x60
[  384.313290]  [8103ade1] smp_apic_timer_interrupt+0x41/0x50
[  384.313290]  [817144a0] apic_timer_interrupt+0x70/0x80
[  384.313290]  EOI  [8100db10] ? default_idle+0x20/0x210
[  384.313290]  [8100db0e] ? default_idle+0x1e/0x210
[  384.313290]  [8100e61a] arch_cpu_idle+0xa/0x10
[  384.313290]  [81099803] cpu_startup_entry+0x2c3/0x530
[  384.313290]  [810d2893] ? clockevents_register_device+0x113/0x200
[  384.313290]  [81038b0f] start_secondary+0x13f/0x170

Fixes: 8a0f6ebe8494 (tipc: involve reference counter for node structure)
Signed-off-by: Erik Hugne erik.hu...@ericsson.com
Signed-off-by: Jon Maloy jon.ma...@ericsson.com
---
 net/tipc/link.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/net/tipc/link.c b/net/tipc/link.c
index a6b30df..57be6e6 100644
--- a/net/tipc/link.c
+++ b/net/tipc/link.c
@@ -2143,7 +2143,6 @@ int tipc_nl_link_dump(struct sk_buff *skb, struct 
netlink_callback *cb)
err = __tipc_nl_add_node_links(net, msg, node,
   prev_link);
tipc_node_unlock(node);
-   tipc_node_put(node);
if (err)
goto out;
 
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net 1/3] tipc: fix topology server broken issue

2015-04-23 Thread Jon Maloy
From: Ying Xue ying@windriver.com

When a new topology server is launched in a new namespace, its
listening socket is inserted into the init ns namespace's socket
hash table rather than the one owned by the new namespace. Although
the socket's namespace is forcedly changed to the new namespace later,
the socket is still stored in the socket hash table of init ns
namespace. When a client created in the new namespace connects
its own topology server, the connection is failed as its server's
socket could not be found from its own namespace's socket table.

If __sock_create() instead of original sock_create_kern() is used
to create the server's socket through specifying an expected namesapce,
the socket will be inserted into the specified namespace's socket
table, thereby avoiding to the topology server broken issue.

Fixes: 76100a8a64bc (tipc: fix netns refcnt leak)

Reported-by: Erik Hugne erik.hu...@ericsson.com
Signed-off-by: Ying Xue ying@windriver.com
Signed-off-by: Jon Maloy jon.ma...@ericsson.com
---
 net/tipc/server.c | 9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/net/tipc/server.c b/net/tipc/server.c
index ab6183c..77ff03e 100644
--- a/net/tipc/server.c
+++ b/net/tipc/server.c
@@ -102,7 +102,7 @@ static void tipc_conn_kref_release(struct kref *kref)
}
saddr-scope = -TIPC_NODE_SCOPE;
kernel_bind(sock, (struct sockaddr *)saddr, sizeof(*saddr));
-   sk_release_kernel(sk);
+   sock_release(sock);
con-sock = NULL;
}
 
@@ -321,12 +321,9 @@ static struct socket *tipc_create_listen_sock(struct 
tipc_conn *con)
struct socket *sock = NULL;
int ret;
 
-   ret = sock_create_kern(AF_TIPC, SOCK_SEQPACKET, 0, sock);
+   ret = __sock_create(s-net, AF_TIPC, SOCK_SEQPACKET, 0, sock, 1);
if (ret  0)
return NULL;
-
-   sk_change_net(sock-sk, s-net);
-
ret = kernel_setsockopt(sock, SOL_TIPC, TIPC_IMPORTANCE,
(char *)s-imp, sizeof(s-imp));
if (ret  0)
@@ -376,7 +373,7 @@ static struct socket *tipc_create_listen_sock(struct 
tipc_conn *con)
 
 create_err:
kernel_sock_shutdown(sock, SHUT_RDWR);
-   sk_release_kernel(sock-sk);
+   sock_release(sock);
return NULL;
 }
 
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net 2/3] tipc: fix random link reset problem

2015-04-23 Thread Jon Maloy
From: Erik Hugne erik.hu...@ericsson.com

In the function tipc_sk_rcv(), the stack variable 'err'
is only initialized to TIPC_ERR_NO_PORT for the first
iteration over the link input queue. If a chain of messages
are received from a link, failure to lookup the socket for
any but the first message will cause the message to bounce back
out on a random link.
We fix this by properly initializing err.

Signed-off-by: Erik Hugne erik.hu...@ericsson.com
Signed-off-by: Jon Maloy jon.ma...@ericsson.com
---
 net/tipc/socket.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index ee90d74..9074b5c 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -1764,13 +1764,14 @@ static int tipc_sk_enqueue(struct sk_buff_head *inputq, 
struct sock *sk,
 int tipc_sk_rcv(struct net *net, struct sk_buff_head *inputq)
 {
u32 dnode, dport = 0;
-   int err = -TIPC_ERR_NO_PORT;
+   int err;
struct sk_buff *skb;
struct tipc_sock *tsk;
struct tipc_net *tn;
struct sock *sk;
 
while (skb_queue_len(inputq)) {
+   err = -TIPC_ERR_NO_PORT;
skb = NULL;
dport = tipc_skb_peek_port(inputq, dport);
tsk = tipc_sk_lookup(net, dport);
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: CCM/GCM implementation defect

2015-04-23 Thread Steffen Klassert
On Thu, Apr 23, 2015 at 11:26:20AM +0800, Herbert Xu wrote:
 Hi:
 
 It looks like our IPsec implementations of CCM and GCM are buggy
 in that they don't include the IV in the authentication calculation.

Seems like crypto_rfc4106_crypt() passes the associated data it
got from ESP directly to gcm, without chaining with the IV.

 
 This definitely breaks interoperability with anyone who implements
 them correctly.  The fact that there have been no reports on this
 probably means that nobody has run into this in the field yet.
 
 On the security side, this is probably not a big deal for CCM
 because it always verifies the authentication tag after decryption.
 But for GCM this may be a DoS issue as an attacker could modify
 the IV without triggering the authentication check and thus cause
 an unnecessary decryption.  For both CCM and GCM this will result
 in random data injected as a packet into the network stack which
 hopefully will be dropped.
 
 In order to fix this without breaking backwards compatibility,
 my plan is to introduce new templates such as rfc4106v2 which
 implement the RFC correctly.  The existing templates will be
 retained so that current users aren't broken by the fix.

Adding a second template for the correct implementation is
probaply the only thing that we can do if we don't want to
break backwards compatibility. But maybe we can add a warning
to the old implementation, such that users notice that they
use a broken version.

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH next 1/3] ipvlan: Defer multicast / broadcast processing to a work-queue

2015-04-23 Thread Eric Dumazet
On Thu, 2015-04-23 at 14:29 -0700, Mahesh Bandewar wrote:

 +static void ipvlan_multicast_enqueue(struct ipvl_port *port,
 +  struct sk_buff *skb)
 +{
 + if (skb-protocol == htons(ETH_P_PAUSE))
 + return;

But what happens to this packet ? It seems leaked ?

 +
 + spin_lock(port-backlog.lock);
 + if (skb_queue_len(port-backlog)  IPVLAN_QBACKLOG_LIMIT) {
 + __skb_queue_tail(port-backlog, skb);
 + spin_unlock(port-backlog.lock);
 + } else {
 + spin_unlock(port-backlog.lock);
 + atomic_long_inc(skb-dev-rx_dropped);
 + kfree_skb(skb);
 + }
 + schedule_work(port-wq);

No point calling schedule_work(port-wq); if packet was dropped.

We are under pressure, so don't add extra cpu cycles ;)


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


rhashtable: Add cap on number of elements in hash table

2015-04-23 Thread Herbert Xu
On Thu, Apr 23, 2015 at 11:59:19AM -0400, David Miller wrote:
 From: Johannes Berg johan...@sipsolutions.net
 Date: Thu, 23 Apr 2015 16:38:43 +0200
 
  From: Johannes Berg johannes.b...@intel.com
  
  The conversion of mac80211's station table to rhashtable had a bug
  that I found by accident in code review, that hadn't been found as
  rhashtable apparently managed to have a maximum hash chain length
  of one (!) in all our testing.
  
  In order to test the bug and verify the fix I set my rhashtable's
  max_size very low (4) in order to force getting hash collisions.
  
  At that point, rhashtable WARNed in rhashtable_insert_rehash() but
  didn't actually reject the hash table insertion. This caused it to
  lose insertions - my master list of stations would have 9 entries,
  but the rhashtable only had 5. This may warrant a deeper look, but
  that WARN_ON() just shouldn't happen.
  
  Fix this by not returning true from rht_grow_above_100() when the
  rhashtable's max_size has been reached - in this case the user is
  explicitly configuring it to be at most that big, so even if it's
  now above 100% it shouldn't attempt to resize.
  
  This fixes the lost insertion issue and consequently allows my
  code to display its error (and verify my fix for it.)
  
  Signed-off-by: Johannes Berg johannes.b...@intel.com
 
 It looks fine to me, but I'll let Herbert and Thomas review this.

Thanks for the heads up.

It seems that I lost track somewhere along the line.  I meant
to add an explicit limit on the overall number of entries since
that was what users like netlink expected but never got around
to doing it.  Instead it seems that we're currently relying on
the rht_grow_above_100 to protect us.

So here is a patch that adds an explicit limit and fixes the
problem Johannes reported.

---8---
We currently have no limit on the number of elements in a hash table.
This is very bad especially considering that some rhashtable users
had such a limit before the conversion and relied on it for defence
against DoS attacks.

We already have a maximum hash table size limit but its enforcement
is only by luck and results in a nasty WARN_ON.

This patch adds a new paramater insecure_max_entries which becomes
the cap on the table.  If unset it defaults to max_size.  If it is
also zero it means that there is no cap on the number of elements
in the table.  However, the table will grow whenever the utilisation
hits 100% and if that growth fails, you will get ENOMEM on insertion.

As allowing 100% utilisation is potentially dangerous, the name
contains the word insecure.

Note that the cap is not a hard limit.  This is done for performance
reasons as enforcing a hard limit will result in use of atomic ops
that are heavier than the ones we currently use.

The reasoning is that we're only guarding against a gross over-
subscription of the table, rather than a small breach of the limit.

Reported-by: Johannes Berg johannes.b...@intel.com
Signed-off-by: Herbert Xu herb...@gondor.apana.org.au

diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
index e23d242..95b5a36 100644
--- a/include/linux/rhashtable.h
+++ b/include/linux/rhashtable.h
@@ -17,6 +17,7 @@
 #ifndef _LINUX_RHASHTABLE_H
 #define _LINUX_RHASHTABLE_H
 
+#include linux/atomic.h
 #include linux/compiler.h
 #include linux/errno.h
 #include linux/jhash.h
@@ -100,6 +101,7 @@ struct rhashtable;
  * @key_len: Length of key
  * @key_offset: Offset of key in struct to be hashed
  * @head_offset: Offset of rhash_head in struct to be hashed
+ * @insecure_max_entries: Maximum number of entries (may be exceeded)
  * @max_size: Maximum size while expanding
  * @min_size: Minimum size while shrinking
  * @nulls_base: Base value to generate nulls marker
@@ -115,6 +117,7 @@ struct rhashtable_params {
size_t  key_len;
size_t  key_offset;
size_t  head_offset;
+   unsigned intinsecure_max_entries;
unsigned intmax_size;
unsigned intmin_size;
u32 nulls_base;
@@ -282,7 +285,20 @@ static inline bool rht_shrink_below_30(const struct 
rhashtable *ht,
 static inline bool rht_grow_above_100(const struct rhashtable *ht,
  const struct bucket_table *tbl)
 {
-   return atomic_read(ht-nelems)  tbl-size;
+   return atomic_read(ht-nelems)  tbl-size 
+  (!ht-p.max_size || tbl-size  ht-p.max_size);
+}
+
+/**
+ * rht_grow_above_max - returns true if table is above maximum
+ * @ht:hash table
+ * @tbl:   current table
+ */
+static inline bool rht_grow_above_max(const struct rhashtable *ht,
+ const struct bucket_table *tbl)
+{
+   return ht-p.insecure_max_entries 
+  atomic_read(ht-nelems) = ht-p.insecure_max_entries;
 }
 
 /* The bucket lock is selected based on the hash and protects mutations
@@ -611,6 +627,10 @@ 

[PATCH] ehea: Fix memory hook reference counting crashes

2015-04-23 Thread Michael Ellerman
The recent commit to only register the EHEA memory hotplug hooks on
adapter probe has a few problems.

Firstly the reference counting is wrong for multiple adapters, in that
the hooks are registered multiple times. Secondly the check in the tear
down path is backward. Finally the error path doesn't decrement the
count.

The multiple registration of the hooks is the biggest problem, as it
leads to oopses when the system is rebooted, and/or errors during memory
hotplug, eg:

  $ ./mem-on-off-test.sh -r 2
  ...
  ehea: memory is going offline
  ehea: LPAR memory changed - re-initializing driver
  ehea: re-initializing driver complete
  ehea: memory is going offline
  ehea: LPAR memory changed - re-initializing driver
  ehea: opcode=26c ret=fffc arg1=8303 arg2=0 
arg3=7006d600 arg4=3fded arg5=200 arg6=0 arg7=0
  ehea: register_rpage_mr failed
  ehea: registering mr failed
  ehea: register MR failed - driver inoperable!
  ehea: memory is going offline

Fixes: aa183323312d (ehea: Register memory hotplug, reboot and crash hooks on 
adapter probe)
Signed-off-by: Michael Ellerman m...@ellerman.id.au
---
 drivers/net/ethernet/ibm/ehea/ehea_main.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ehea/ehea_main.c 
b/drivers/net/ethernet/ibm/ehea/ehea_main.c
index c05e50759621..00d86be0c831 100644
--- a/drivers/net/ethernet/ibm/ehea/ehea_main.c
+++ b/drivers/net/ethernet/ibm/ehea/ehea_main.c
@@ -3347,7 +3347,7 @@ static int ehea_register_memory_hooks(void)
 {
int ret = 0;
 
-   if (atomic_inc_and_test(ehea_memory_hooks_registered))
+   if (atomic_inc_return(ehea_memory_hooks_registered)  1)
return 0;
 
ret = ehea_create_busmap();
@@ -3381,12 +3381,14 @@ out3:
 out2:
unregister_reboot_notifier(ehea_reboot_nb);
 out:
+   atomic_dec(ehea_memory_hooks_registered);
return ret;
 }
 
 static void ehea_unregister_memory_hooks(void)
 {
-   if (atomic_read(ehea_memory_hooks_registered))
+   /* Only remove the hooks if we've registered them */
+   if (atomic_read(ehea_memory_hooks_registered) == 0)
return;
 
unregister_reboot_notifier(ehea_reboot_nb);
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net: netcp: remove call to netif_carrier_(on/off) for MAC to Phy interface

2015-04-23 Thread Mugunthan V N
On Friday 24 April 2015 12:47 AM, Murali Karicheri wrote:
 Currently when interface type is MAC to Phy, netif_carrier_(on/off)
 is called which is not needed as Phy lib already updates the carrier
 status to net stack. This is needed only for other interface types
 
 Signed-off-by: Murali Karicheri m-kariche...@ti.com
 ---

Acked-by: Mugunthan V N mugunthan...@ti.com

Regards
Mugunthan V N
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH next 1/3] ipvlan: Defer multicast / broadcast processing to a work-queue

2015-04-23 Thread Mahesh Bandewar
On Thu, Apr 23, 2015 at 9:28 PM, David Miller da...@davemloft.net wrote:
 From: Mahesh Bandewar mahe...@google.com
 Date: Thu, 23 Apr 2015 19:54:29 -0700

 On Thu, Apr 23, 2015 at 5:32 PM, Eric Dumazet eric.duma...@gmail.com wrote:
 On Thu, 2015-04-23 at 14:29 -0700, Mahesh Bandewar wrote:

 +static void ipvlan_multicast_enqueue(struct ipvl_port *port,
 +  struct sk_buff *skb)
 +{
 + if (skb-protocol == htons(ETH_P_PAUSE))
 + return;

 But what happens to this packet ? It seems leaked ?

 Hmm, will take care of it in v2.

 Please also provide a proper [PATCH 0/3]  introduction posting
 stating what the patch series is doing, and why.

Will do.

 Thanks.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] net/tg3: Release IRQs on permanent error

2015-04-23 Thread Gavin Shan
When having permanent EEH error, the PCI device will be removed
from the system. For this case, we shouldn't set pcierr_recovery
to true wrongly, which blocks the driver to release the allocated
interrupts and their handlers. Eventually, we can't disable MSI
or MSIx successfully because of the MSI or MSIx interrupts still
have associated interrupt actions, which is turned into following
stack dump.

Oops: Exception in kernel mode, sig: 5 [#1]
:
[c03b76a8] .free_msi_irqs+0x80/0x1a0 (unreliable)
[c039f388] .pci_remove_bus_device+0x98/0x110
[c00790f4] .pcibios_remove_pci_devices+0x9c/0x128
[c0077b98] .handle_eeh_events+0x2d8/0x4b0
[c00782d0] .eeh_event_handler+0x130/0x1c0
[c0022bd4] .kernel_thread+0x54/0x70

Signed-off-by: Gavin Shan gws...@linux.vnet.ibm.com
---
 drivers/net/ethernet/broadcom/tg3.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/broadcom/tg3.c 
b/drivers/net/ethernet/broadcom/tg3.c
index 1270b18..069952f 100644
--- a/drivers/net/ethernet/broadcom/tg3.c
+++ b/drivers/net/ethernet/broadcom/tg3.c
@@ -18129,7 +18129,9 @@ static pci_ers_result_t tg3_io_error_detected(struct 
pci_dev *pdev,
 
rtnl_lock();
 
-   tp-pcierr_recovery = true;
+   /* We needn't recover from permanent error */
+   if (state == pci_channel_io_frozen)
+   tp-pcierr_recovery = true;
 
/* We probably don't have netdev yet */
if (!netdev || !netif_running(netdev))
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: CCM/GCM implementation defect

2015-04-23 Thread Herbert Xu
On Thu, Apr 23, 2015 at 03:24:59PM +0200, Martin Willi wrote:

 Do you have any pointer for me where this is defined? Why is it needed,
 given that GCM implicitly authenticates the IV by using it in Y0?

Actually you're quite right.  Both GCM and CCM implicitly includes
the IV in the authentication tag.

In fact after reviewing the two relevant RFCs (4106/4309) it seems
that we are correct after all since they explicitly exclude the IV
from the AAD.

Now we just need to figre out whether we're still OK with RFC4543.

Sorry for the false alarm.

Cheers,
-- 
Email: Herbert Xu herb...@gondor.apana.org.au
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] ethernet: myri10ge: use arch_phys_wc_add()

2015-04-23 Thread Hyong-Youb Kim
On Thu, Apr 23, 2015 at 02:28:40PM -0400, David Miller wrote:
 I can't wait forever for the driver maintainers to review this, so
 I'm applying it.

FWIW, I replied to Luis's patch yesterday.  It appears on the netdev
list and patchwork.  Checked the recipient list, and it includes
netdev, but not your address.  I guess I will not make that mistake
again.

https://patchwork.ozlabs.org/patch/463481/
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   >