Re: rhashtable: how to deal with that rhashtable_lookup_insert_key return -EBUSY

2015-11-20 Thread Phil Sutter
On Fri, Nov 20, 2015 at 01:14:18PM +0800, Xin Long wrote:
> when I use rhashtable_lookup_insert_key, sometimes it will return -EBUSY.
> im not sure if there is a good way to workabout it.
> or I should just try again and again until it's inserted successfully ?
> 
> I have seen some use in kernel  by now, but it seems that no one consider
> this issue for their cases. but it indeed exists in my case.
> 
> did I use it incorrectly or something else ?

AFAIK, insert returning -EBUSY is a situation users have to be aware of
and retry the insert. I sent a patch[1] to fix this in test_rhashtable.

That patch though retried in case of -ENOMEM as well, which was
considered wrong to do and therefore it wasn't accepted. But in my test
runs, -ENOMEM happened quite frequently and it also wasn't a permanent
error. For details, see the following discussion[2].

Herbert, did you manage to reproduce the problem meanwhile? If so, was
there any progress on fixing rhashtable? Otherwise, I could respin my
patch from [1] to cover only -EBUSY case by default and add a parameter
to make non-permanent -ENOMEM visible.

Cheers, Phil

[1]: https://lkml.org/lkml/2015/8/28/197
[2]: https://lkml.org/lkml/2015/8/28/281
> 
> Thanks.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 net-next 1/5] net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem

2015-11-20 Thread Salil Mehta


On 11/18/2015 6:52 PM, David Miller wrote:

From: Salil 
Date: Wed, 18 Nov 2015 02:52:23 +0800


@@ -387,19 +409,23 @@ static void hns_rcb_ring_get_cfg(struct hnae_queue *q, 
int ring_type)
struct rcb_common_cb *rcb_common;
struct ring_pair_cb *ring_pair_cb;
u32 buf_size;
-   u16 desc_num;
-   int irq_idx;
+   u16 desc_num, mdnum_ppkt;
+   int irq_idx, is_ver1;

Please use "bool" and true/false for boolean conditions like is_ver1.

Please audit your entire submission for this problem.
Thanks for your time and comments. As per your suggestions, I have 
changed the data type of variable "is_ver" to "bool" where ever possible 
in the PATCH V3 floated yesterday.


Best Regards
Salil


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Hi ,

2015-11-20 Thread Stephane Hamelet


Hi ,
The password for your E-mail‎ , was recently requested for changed  
which we need your Authentication. Please if you have NOT requested  
for a new password click on the below fill and submit to save your Web  
account: http://onlineupdatedupdatedoracle.webeden.co.uk/


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Kernel 4.1.12 crash

2015-11-20 Thread Andrew

Hi all.

Today some BRASes on 4.1.12 kernel were crashed.

Here's crash traces: http://pastebin.com/p68hNS8R 
http://pastebin.com/36ieRAM2 http://pastebin.com/3BRTVEB6


On 3.2 kernel same hardware works OK, troubles were noticed after kernel 
upgrade.


What additional info is needed?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3 net-next 4/5] net:hns: Add support of ethtool TSO set option for Hip06 in HNS

2015-11-20 Thread Sergei Shtylyov

On 11/19/2015 11:58 PM, Salil Mehta wrote:


From: Salil 

This patch adds the support of ethtool TSO option to V1 patch,
meant to add support of Hip06 SoC to HNS

Signed-off-by: Salil Mehta 
Signed-off-by: lisheng 
---
  drivers/net/ethernet/hisilicon/hns/hns_enet.c |   47 +
  1 file changed, 47 insertions(+)

diff --git a/drivers/net/ethernet/hisilicon/hns/hns_enet.c 
b/drivers/net/ethernet/hisilicon/hns/hns_enet.c
index 055e14c..a0763ab 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_enet.c
@@ -1386,6 +1386,51 @@ static int hns_nic_change_mtu(struct net_device *ndev, 
int new_mtu)
return ret;
  }

+static int hns_nic_set_features(struct net_device *netdev,
+   netdev_features_t features)
+{
+   struct hns_nic_priv *priv = netdev_priv(netdev);
+   struct hnae_handle *h = priv->ae_handle;
+
+   switch (priv->enet_ver) {
+   case AE_VERSION_1:
+   if ((features & NETIF_F_TSO) || (features & NETIF_F_TSO6))


if ((features & (NETIF_F_TSO| NETIF_F_TSO6))


+   netdev_info(netdev, "enet v1 do not support tso!\n");
+   break;


   The *break* should have the same indentation level as *if*.


+   default:
+   if ((features & NETIF_F_TSO) || (features & NETIF_F_TSO6)) {


if ((features & (NETIF_F_TSO| NETIF_F_TSO6))


+   priv->ops.fill_desc = fill_tso_desc;
+   priv->ops.maybe_stop_tx = hns_nic_maybe_stop_tso;
+   /* The chip only support 7*4096 */
+   netif_set_gso_max_size(netdev, 7 * 4096);
+   h->dev->ops->set_tso_stats(h, 1);
+   } else {
+   priv->ops.fill_desc = fill_v2_desc;
+   priv->ops.maybe_stop_tx = hns_nic_maybe_stop_tx;
+   h->dev->ops->set_tso_stats(h, 0);
+   }
+   break;


   Same here.


+   }
+   netdev->features = features;
+   return 0;
+}
+
+static netdev_features_t hns_nic_fix_features(
+   struct net_device *netdev, netdev_features_t features)
+{
+   struct hns_nic_priv *priv = netdev_priv(netdev);
+
+   switch (priv->enet_ver) {
+   case AE_VERSION_1:
+   features &= ~(NETIF_F_TSO | NETIF_F_TSO6 |
+   NETIF_F_HW_VLAN_CTAG_FILTER);
+   break;
+   default:
+   break;
+   }


   Here it's indented correctly.


+   return features;
+}
+
  /**
   * nic_set_multicast_list - set mutl mac address
   * @netdev: net device

[...]

MBR, Sergei

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


tty,net: use-after-free in x25_asy_open_tty

2015-11-20 Thread Sasha Levin
Hi all,

While fuzzing with syzkaller inside a kvmtools guest running latest -next 
kernel, I've hit:

[  634.336761] 
==
[  634.338226] BUG: KASAN: use-after-free in x25_asy_open_tty+0x13d/0x490 at 
addr 8800a743efd0
[  634.339558] Read of size 4 by task syzkaller_execu/8981
[  634.340359] 
=
[  634.341598] BUG kmalloc-512 (Not tainted): kasan: bad access detected
[  634.342605] 
-
[  634.342605]
[  634.344196] Disabling lock debugging due to kernel taint
[  634.345046] INFO: Allocated in r3964_open+0x55/0x590 age=3 cpu=0 pid=8981
[  634.346165]  ___slab_alloc+0x434/0x5b0
[  634.346912]  __slab_alloc.isra.37+0x79/0xd0
[  634.347642]  kmem_cache_alloc_trace+0xf5/0x350
[  634.348398]  r3964_open+0x55/0x590
[  634.348952]  tty_ldisc_open.isra.2+0x8a/0xd0
[  634.349616]  tty_set_ldisc+0x344/0x910
[  634.350202]  tty_ioctl+0x1534/0x1d70
[  634.350762]  do_vfs_ioctl+0xc90/0xd40
[  634.351349]  SyS_ioctl+0x6d/0xb0
[  634.351890]  entry_SYSCALL_64_fastpath+0x35/0x9e
[  634.352548] INFO: Freed in r3964_close+0x23b/0x280 age=10 cpu=0 pid=8981
[  634.353599]  __slab_free+0x64/0x260
[  634.354151]  kfree+0x281/0x2f0
[  634.354641]  r3964_close+0x23b/0x280
[  634.355219]  tty_ldisc_close.isra.1+0xc2/0xd0
[  634.355890]  tty_set_ldisc+0x2bd/0x910
[  634.356559]  tty_ioctl+0x1534/0x1d70
[  634.357121]  do_vfs_ioctl+0xc90/0xd40
[  634.357614]  SyS_ioctl+0x6d/0xb0
[  634.358133]  entry_SYSCALL_64_fastpath+0x35/0x9e
[  634.358853] INFO: Slab 0xea00029d0f00 objects=20 used=10 
fp=0x8800a743efd0 flags=0x1f80004080
[  634.360308] INFO: Object 0x8800a743efd0 @offset=12240 
fp=0x8800a743f300
[  634.360308]
[  634.361652] Bytes b4 8800a743efc0: 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00  
[  634.363048] Object 8800a743efd0: 00 f3 43 a7 00 88 ff ff ff ff ff ff 00 
00 00 00  ..C.
[  634.364424] Object 8800a743efe0: ff ff ff ff ff ff ff ff a0 7d 41 ab ff 
ff ff ff  .}A.
[  634.365835] Object 8800a743eff0: a0 cf a8 a9 ff ff ff ff 00 00 00 00 00 
00 00 00  
[  634.367346] Object 8800a743f000: 00 e8 33 a4 ff ff ff ff 03 00 00 00 00 
00 00 00  ..3.
[  634.368721] Object 8800a743f010: 3e a2 5b 9c ff ff ff ff 80 c9 d6 b4 00 
88 ff ff  >.[.
[  634.370139] Object 8800a743f020: 00 79 7a 6b 61 6c 6c 65 00 80 50 a7 00 
88 ff ff  .yzkalle..P.
[  634.371635] Object 8800a743f030: 20 e7 50 a7 00 88 ff ff 00 00 00 00 00 
00 00 00   .P.
[  634.373000] Object 8800a743f040: 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00  
[  634.374418] Object 8800a743f050: 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00  
[  634.375843] Object 8800a743f060: 00 00 00 00 00 00 00 00 01 00 00 00 67 
6d c1 1b  gm..
[  634.377339] Object 8800a743f070: 00 00 00 00 ad 4e ad de ff ff ff ff ad 
4e ad de  .N...N..
[  634.378747] Object 8800a743f080: ff ff ff ff ff ff ff ff a0 48 2c a9 ff 
ff ff ff  .H,.
[  634.380174] Object 8800a743f090: 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00  
[  634.381584] Object 8800a743f0a0: c0 21 cd a3 ff ff ff ff 03 00 00 00 00 
00 00 00  .!..
[  634.382949] Object 8800a743f0b0: 00 00 00 00 01 00 00 00 b8 f0 43 a7 00 
88 ff ff  ..C.
[  634.384365] Object 8800a743f0c0: b8 f0 43 a7 00 88 ff ff 00 00 00 00 00 
00 00 00  ..C.
[  634.385637] Object 8800a743f0d0: 68 f0 43 a7 00 88 ff ff 60 7d 41 ab ff 
ff ff ff  h.C.`}A.
[  634.387138] Object 8800a743f0e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00  
[  634.388563] Object 8800a743f0f0: 40 e8 33 a4 ff ff ff ff 01 00 00 00 00 
00 00 00  @.3.
[  634.389977] Object 8800a743f100: 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00  
[  634.391396] Object 8800a743f110: 00 00 00 00 00 80 00 00 00 00 00 00 00 
00 00 00  
[  634.392868] Object 8800a743f120: 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00  
[  634.393649] Object 8800a743f130: c0 73 5b 9c ff ff ff ff d0 ef 43 a7 00 
88 ff ff  .s[...C.
[  634.394483] Object 8800a743f140: 00 00 00 00 ff ff ff ff ff ff ff ff 00 
00 00 00  
[  634.395281] Object 8800a743f150: 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00  
[  634.396081] Object 8800a743f160: 00 00 00 00 00 00 00 00 20 7d 41 ab ff 
ff ff ff   }A.
[  634.396928] Object 8800a743f170: b0 cd a8 a9 ff ff ff ff 00 00 00 00 00 
00 00 00  
[  634.397714] Object 8800a743f180: 80 e8 33 a4 ff ff ff ff 00 00 00 00 00 
00 00 00  ..3.
[  634.398511] Object 8800a743f190: 

Re: [PATCH net-next] bpf: add show_fdinfo handler for maps

2015-11-20 Thread Hannes Frederic Sowa
Hi Alexei,

On Fri, Nov 20, 2015, at 04:30, Alexei Starovoitov wrote:
> On Thu, Nov 19, 2015 at 09:12:30PM +0100, Hannes Frederic Sowa wrote:
> > On Thu, Nov 19, 2015, at 19:32, Alexei Starovoitov wrote:
> > > On Thu, Nov 19, 2015 at 07:19:24PM +0100, Hannes Frederic Sowa wrote:
> > > > On Thu, Nov 19, 2015, at 11:56, Daniel Borkmann wrote:
> > > > > Add a handler for show_fdinfo() to be used by the anon-inodes
> > > > > backend for eBPF maps, and dump the map specification there. Not
> > > > > only useful for admins, but also it provides a minimal way to
> > > > > compare specs from ELF vs pinned object.
> > > > > 
> > > > > Signed-off-by: Daniel Borkmann 
> > > > 
> > > > Acked-by: Hannes Frederic Sowa 
> > > > 
> > > > Does it make sense to include bpf_htab->count in case of a hashmap?
> > > 
> > > no. user space should not rely on such things. It can only be misused.
> > 
> > Sorry, I don't get it. How can it be misused? As an admin it would
> > certainly be interesting to know the pressure on the map? Do you expect
> > kmsg messages from the eBPF program?
> 
> If user space can be see both 'count' and 'max_entries', it can be very
> tempting to start assuming 'full' and 'empty' state of the map which will
> lead to race conditions and bad design.
> bpf programs and maps are inherently multi-thread and concurrent.
> If userapp wants to do the counting of elements it needs to do so on its
> own
> and shoot itself in the foot eventually.
> For the same reason I don't want to see BPF_MAP_GET_COUNT command.

Hmmm... I don't understand your argument. This is the same with memory
management in general and we still report memory statistics to user
space. I really would find it helpful to have a feeling if a map is
nearly full or nearly empty.

We can also count collisions or the load in the buckets, but some
evidence what is going on would be nice, wouldn't it?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] wireless: change cfg80211 regulatory domain info as debug messages

2015-11-20 Thread Johannes Berg
On Sun, 2015-11-15 at 19:25 +0100, Stefan Lippers-Hollmann wrote:
> Hi
> 
> On 2015-11-15, Dave Young wrote:
> > cfg80211 module prints a lot of messages like below. Actually
> > printing once is acceptable but sometimes it will print again and
> > again, it looks very annoying. It is better to change these detail
> > messages to debugging only.
> 
> It is a lot of info, easily repeated 3 times on boot, but it's also
> the only real chance to determine why you ended up with the
> regulatory domain settings you got, rather than just the values
> itself. Given that a lot (most?) of officially shipping wireless
> devices are misconfigured (wrong EEPROM regdom settings for the
> region they're sold in) and considering that the limits can even
> change at runtime (IEEE 802.11d), it is imho quite important not just
> to be able what the current restrictions (iw reg get) are, but also
> why the kernel settled on those.
> 

Hm. I kinda sympathize with both points of view here, not sure what to
do.

Maybe we could skip this for the world regdomain only? It doesn't
really change, and we typically don't care that much for it? That'd
probably get rid of most of the lines already.

Alternatively, perhaps the internal computations should be more
transparently visible through some other mechanism?

johannes
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


pull-request: wireless-drivers 2015-11-20

2015-11-20 Thread Kalle Valo
Hi Dave,

here first wireless-driver fixes for 4.4. Here there are few patches
adding new device support and a new firmware but I think they are
justified at this early stage of release cycle. Otherwise there should
not be anything special, all patches are really small. Please let me
know if you have any problems.

Kalle

The following changes since commit f1a454a37618b819f2528ccd234f77a02b3a6016:

  ipg: Remove ipg driver (2015-11-16 17:11:31 -0500)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers.git 
tags/wireless-drivers-for-davem-2015-11-20

for you to fetch changes up to eeec5d0ef7ee54a75e09e861c3cc44177b8752c7:

  rtlwifi: rtl8821ae: Fix lockups on boot (2015-11-17 15:58:53 +0200)


iwlwifi

* bump API to firmware 19 - not released yet.
* fix D3 flows (Luca)
* new device IDs (Oren)
* fix NULL pointer dereference (Avri)

ath10k

* fix invalid NSS for 4x4 devices
* add QCA9377 hw1.0 support
* fix QCA6174 regression with CE5 usage

wil6210

* new maintainer - Maya Erez

rtlwifi

* rtl8821ae: Fix lockups on boot


Avri Altman (1):
  iwlwifi: mvm: Avoid dereferencing sta if it was already flushed

Bartosz Markowski (4):
  ath10k: fix the currently supported QCA9377 target version name
  ath10k: update missing hw_params of QCA9377 hw1.1
  ath10k: introduce dev_id to hw_params
  ath10k: add QCA9377 hw1.0 support

Emmanuel Grumbach (1):
  iwlwifi: bump firmware API to 19

Kalle Valo (2):
  Merge tag 'iwlwifi-for-kalle-2015-11-15' of 
https://git.kernel.org/.../iwlwifi/iwlwifi-fixes
  Merge ath-current from ath.git

Larry Finger (1):
  rtlwifi: rtl8821ae: Fix lockups on boot

Luca Coelho (1):
  iwlwifi: mvm: don't overwrite the key indices in D3 entry

Oren Givon (1):
  iwlwifi: Add new PCI IDs for the 8260 series

Rajkumar Manoharan (2):
  ath10k: fix invalid NSS for 4x4 devices
  ath10k: poll HTT send completion when CE 5 is unused

Ryan Hsu (1):
  ath10k: override CE5 configuration for QCA6147 device

Vladimir Kondratiev (1):
  MAINTAINERS: wil6210: new maintainer - Maya Erez

 MAINTAINERS|2 +-
 drivers/net/wireless/ath/ath10k/core.c |   49 ++-
 drivers/net/wireless/ath/ath10k/core.h |1 +
 drivers/net/wireless/ath/ath10k/hw.h   |   17 +++-
 drivers/net/wireless/ath/ath10k/mac.c  |2 +-
 drivers/net/wireless/ath/ath10k/pci.c  |   53 +---
 drivers/net/wireless/iwlwifi/iwl-7000.c|2 +-
 drivers/net/wireless/iwlwifi/iwl-8000.c|2 +-
 drivers/net/wireless/iwlwifi/mvm/d3.c  |8 +-
 drivers/net/wireless/iwlwifi/mvm/mac80211.c|   11 ++-
 drivers/net/wireless/iwlwifi/mvm/sta.c |   88 +++-
 drivers/net/wireless/iwlwifi/mvm/sta.h |4 +-
 drivers/net/wireless/iwlwifi/pcie/drv.c|   19 -
 .../net/wireless/realtek/rtlwifi/rtl8821ae/hw.c|2 +-
 .../net/wireless/realtek/rtlwifi/rtl8821ae/sw.c|2 +-
 15 files changed, 189 insertions(+), 73 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 09/14] net: tcp_memcontrol: simplify linkage between socket and page counter

2015-11-20 Thread Vladimir Davydov
On Thu, Nov 12, 2015 at 06:41:28PM -0500, Johannes Weiner wrote:
> There won't be any separate counters for socket memory consumed by
> protocols other than TCP in the future. Remove the indirection and

I really want to believe you're right. And with vmpressure propagation
implemented properly you are likely to be right.

However, we might still want to account other socket protos to
memcg->memory in the unified hierarchy, e.g. UDP, or SCTP, or whatever
else. Adding new consumers should be trivial, but it will break the
legacy usecase, where only TCP sockets are supposed to be accounted.
What about adding a check to sock_update_memcg() so that it would enable
accounting only for TCP sockets in case legacy hierarchy is used?

For the same reason, I think we'd better rename memcg->tcp_mem to
something like memcg->sk_mem or we can even drop the cg_proto struct
altogether embedding its fields directly to mem_cgroup struct.

Also, I don't see any reason to have tcp_memcontrol.c file. It's tiny
and with this patch it does not depend on tcp code any more. Let's move
it to memcontrol.c?

Other than that this patch looks OK to me.

Thanks,
Vladimir

> link sockets directly to their owning memory cgroup.
> 
> Signed-off-by: Johannes Weiner 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 23/27] rt2x00: move under ralink vendor directory

2015-11-20 Thread Kalle Valo
Jakub Kicinski  writes:

> On Wed, 18 Nov 2015 16:46:02 +0200, Kalle Valo wrote:
>> Part of reorganising wireless drivers directory and Kconfig.
>> 
>> Signed-off-by: Kalle Valo 
>
> For Ralink you could probably drop the rt2x00 directory.  RaLink Tech.
> doesn't exist any more and rt2x00 contains drivers for all of their
> devices.
>
> Obviously this is just a suggestion, not a show stopper.

Like I said with a similar comment to brcm80211 I would like to do that
separately. This is 27 patches already and I don't want make these any
more complicated than necessary.

-- 
Kalle Valo
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 07/14] net: tcp_memcontrol: simplify the per-memcg limit access

2015-11-20 Thread Vladimir Davydov
On Thu, Nov 12, 2015 at 06:41:26PM -0500, Johannes Weiner wrote:
> tcp_memcontrol replicates the global sysctl_mem limit array per
> cgroup, but it only ever sets these entries to the value of the
> memory_allocated page_counter limit. Use the latter directly.
> 
> Signed-off-by: Johannes Weiner 

Reviewed-by: Vladimir Davydov 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [B.A.T.M.A.N.] [PATCH 3/3] batman-adv: Less function calls in batadv_is_ap_isolated() after error detection

2015-11-20 Thread SF Markus Elfring
>> -out:
>> +batadv_tt_global_entry_free_ref(tt_global_entry);
>> +local_entry_free:
>> +batadv_tt_local_entry_free_ref(tt_local_entry);
>> +vlan_free:
>>  batadv_softif_vlan_free_ref(vlan);
>> -if (tt_global_entry)
>> -batadv_tt_global_entry_free_ref(tt_global_entry);
>> -if (tt_local_entry)
>> -batadv_tt_local_entry_free_ref(tt_local_entry);
>>  return ret;

> if you really want to make this codestyle change, I'd suggest you to go
> through the whole batman-adv code and apply the same change where needed.

Thanks for your interest in similar source code changes.

I would prefer general acceptance for this specific update suggestion
before I might invest further software development efforts for the
affected network module.


> It does not make sense to change the codestyle in one spot only.

I agree in the way that I would be nice if more places can still be improved.


> On top of that, by going through the batman-adv code you might agree
> that the current style is actually not a bad idea.

I got the impression that the current Linux coding style convention
disagrees around the affected jump label selection to some degree,
doesn't it?

Regards,
Markus
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFT v2] sh_eth: fix kernel oops in skb_put()

2015-11-20 Thread Yasushi SHOJI
Hi Sergei,

On Fri, 20 Nov 2015 02:53:39 +0900,
Sergei Shtylyov wrote:
> 
>Shoji-san, can I push this patch to net.git? I doubt that it has
> ill effects in itself -- the reason of the slowdown you're seeing
> should be somewhere else...

Sure.  I've tested and the null access problem is gone for sure.  I'm
pretty sure that the fix won't break anything.

It's going to take, however, some more time to pin down the slow down
problem.  I'll report when I find the cause.

Thanks,
-- 
yashi
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net 1/2] net: ipmr: fix static mfc/dev leaks on table destruction

2015-11-20 Thread Nikolay Aleksandrov
From: Nikolay Aleksandrov 

When destroying an mrt table the static mfc entries and the static
devices are kept, which leads to devices that can never be destroyed
(because of refcnt taken) and leaked memory, for example:
unreferenced object 0x880034c144c0 (size 192):
  comm "mfc-broken", pid 4777, jiffies 4320349055 (age 46001.964s)
  hex dump (first 32 bytes):
98 53 f0 34 00 88 ff ff 98 53 f0 34 00 88 ff ff  .S.4.S.4
ef 0a 0a 14 01 02 03 04 00 00 00 00 01 00 00 00  
  backtrace:
[] kmemleak_alloc+0x4e/0xb0
[] kmem_cache_alloc+0x190/0x300
[] ip_mroute_setsockopt+0x5cb/0x910
[] do_ip_setsockopt.isra.11+0x105/0xff0
[] ip_setsockopt+0x30/0xa0
[] raw_setsockopt+0x33/0x90
[] sock_common_setsockopt+0x14/0x20
[] SyS_setsockopt+0x71/0xc0
[] entry_SYSCALL_64_fastpath+0x16/0x7a
[] 0x

Make sure that everything is cleaned on netns destruction.

Signed-off-by: Nikolay Aleksandrov 
---
This doesn't fix a specific commit as the behaviour seems to have been like
that since beginning of use of mroute_clean_tables to cleanup on netns
exit, but the fix can be sent back up to
acbb219d5f53 ("net: ipv4: ipmr_expire_timer causes crash when removing net
namespace")
which started cleaning up on netns destruction.

 net/ipv4/ipmr.c | 15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index 92dd4b74d513..292123bc30fa 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -134,7 +134,7 @@ static int __ipmr_fill_mroute(struct mr_table *mrt, struct 
sk_buff *skb,
  struct mfc_cache *c, struct rtmsg *rtm);
 static void mroute_netlink_event(struct mr_table *mrt, struct mfc_cache *mfc,
 int cmd);
-static void mroute_clean_tables(struct mr_table *mrt);
+static void mroute_clean_tables(struct mr_table *mrt, bool all);
 static void ipmr_expire_process(unsigned long arg);
 
 #ifdef CONFIG_IP_MROUTE_MULTIPLE_TABLES
@@ -350,7 +350,7 @@ static struct mr_table *ipmr_new_table(struct net *net, u32 
id)
 static void ipmr_free_table(struct mr_table *mrt)
 {
del_timer_sync(>ipmr_expire_timer);
-   mroute_clean_tables(mrt);
+   mroute_clean_tables(mrt, true);
kfree(mrt);
 }
 
@@ -1208,7 +1208,7 @@ static int ipmr_mfc_add(struct net *net, struct mr_table 
*mrt,
  * Close the multicast socket, and clear the vif tables etc
  */
 
-static void mroute_clean_tables(struct mr_table *mrt)
+static void mroute_clean_tables(struct mr_table *mrt, bool all)
 {
int i;
LIST_HEAD(list);
@@ -1217,8 +1217,9 @@ static void mroute_clean_tables(struct mr_table *mrt)
/* Shut down all active vif entries */
 
for (i = 0; i < mrt->maxvif; i++) {
-   if (!(mrt->vif_table[i].flags & VIFF_STATIC))
-   vif_delete(mrt, i, 0, );
+   if (!all && (mrt->vif_table[i].flags & VIFF_STATIC))
+   continue;
+   vif_delete(mrt, i, 0, );
}
unregister_netdevice_many();
 
@@ -1226,7 +1227,7 @@ static void mroute_clean_tables(struct mr_table *mrt)
 
for (i = 0; i < MFC_LINES; i++) {
list_for_each_entry_safe(c, next, >mfc_cache_array[i], 
list) {
-   if (c->mfc_flags & MFC_STATIC)
+   if (!all && (c->mfc_flags & MFC_STATIC))
continue;
list_del_rcu(>list);
mroute_netlink_event(mrt, c, RTM_DELROUTE);
@@ -1261,7 +1262,7 @@ static void mrtsock_destruct(struct sock *sk)
NETCONFA_IFINDEX_ALL,
net->ipv4.devconf_all);
RCU_INIT_POINTER(mrt->mroute_sk, NULL);
-   mroute_clean_tables(mrt);
+   mroute_clean_tables(mrt, false);
}
}
rtnl_unlock();
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net 0/2] net: ipmr, ip6mr: fix static leaks on netns destruction

2015-11-20 Thread Nikolay Aleksandrov
From: Nikolay Aleksandrov 

Hi,
While testing various ipmr scenarios I found that static mfc entries and
static devices get leaked on netns/table destruction because
mroute_clean_tables doesn't delete them. It is fine to leave the static
entries when cleaning up the mrtsock, but when destroying the table they
need to be removed.

Cheers,
 Nik

Nikolay Aleksandrov (2):
  net: ipmr: fix static mfc/dev leaks on table destruction
  net: ip6mr: fix static mfc/dev leaks on table destruction

 net/ipv4/ipmr.c  | 15 ---
 net/ipv6/ip6mr.c | 15 ---
 2 files changed, 16 insertions(+), 14 deletions(-)

-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net 2/2] net: ip6mr: fix static mfc/dev leaks on table destruction

2015-11-20 Thread Nikolay Aleksandrov
From: Nikolay Aleksandrov 

Similar to ipv4, when destroying an mrt table the static mfc entries and
the static devices are kept, which leads to devices that can never be
destroyed (because of refcnt taken) and leaked memory. Make sure that
everything is cleaned up on netns destruction.

Fixes: 8229efdaef1e ("netns: ip6mr: enable namespace support in ipv6 multicast 
forwarding code")
CC: Benjamin Thery 
Signed-off-by: Nikolay Aleksandrov 
---
 net/ipv6/ip6mr.c | 15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
index ad19136086dd..7a4a1b81dbb6 100644
--- a/net/ipv6/ip6mr.c
+++ b/net/ipv6/ip6mr.c
@@ -118,7 +118,7 @@ static void mr6_netlink_event(struct mr6_table *mrt, struct 
mfc6_cache *mfc,
  int cmd);
 static int ip6mr_rtm_dumproute(struct sk_buff *skb,
   struct netlink_callback *cb);
-static void mroute_clean_tables(struct mr6_table *mrt);
+static void mroute_clean_tables(struct mr6_table *mrt, bool all);
 static void ipmr_expire_process(unsigned long arg);
 
 #ifdef CONFIG_IPV6_MROUTE_MULTIPLE_TABLES
@@ -334,7 +334,7 @@ static struct mr6_table *ip6mr_new_table(struct net *net, 
u32 id)
 static void ip6mr_free_table(struct mr6_table *mrt)
 {
del_timer_sync(>ipmr_expire_timer);
-   mroute_clean_tables(mrt);
+   mroute_clean_tables(mrt, true);
kfree(mrt);
 }
 
@@ -1542,7 +1542,7 @@ static int ip6mr_mfc_add(struct net *net, struct 
mr6_table *mrt,
  * Close the multicast socket, and clear the vif tables etc
  */
 
-static void mroute_clean_tables(struct mr6_table *mrt)
+static void mroute_clean_tables(struct mr6_table *mrt, bool all)
 {
int i;
LIST_HEAD(list);
@@ -1552,8 +1552,9 @@ static void mroute_clean_tables(struct mr6_table *mrt)
 *  Shut down all active vif entries
 */
for (i = 0; i < mrt->maxvif; i++) {
-   if (!(mrt->vif6_table[i].flags & VIFF_STATIC))
-   mif6_delete(mrt, i, );
+   if (!all && (mrt->vif6_table[i].flags & VIFF_STATIC))
+   continue;
+   mif6_delete(mrt, i, );
}
unregister_netdevice_many();
 
@@ -1562,7 +1563,7 @@ static void mroute_clean_tables(struct mr6_table *mrt)
 */
for (i = 0; i < MFC6_LINES; i++) {
list_for_each_entry_safe(c, next, >mfc6_cache_array[i], 
list) {
-   if (c->mfc_flags & MFC_STATIC)
+   if (!all && (c->mfc_flags & MFC_STATIC))
continue;
write_lock_bh(_lock);
list_del(>list);
@@ -1625,7 +1626,7 @@ int ip6mr_sk_done(struct sock *sk)
 net->ipv6.devconf_all);
write_unlock_bh(_lock);
 
-   mroute_clean_tables(mrt);
+   mroute_clean_tables(mrt, false);
err = 0;
break;
}
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rhashtable: how to deal with that rhashtable_lookup_insert_key return -EBUSY

2015-11-20 Thread Herbert Xu
On Fri, Nov 20, 2015 at 01:24:01PM +0100, Phil Sutter wrote:
>
> Herbert, did you manage to reproduce the problem meanwhile? If so, was
> there any progress on fixing rhashtable? Otherwise, I could respin my
> patch from [1] to cover only -EBUSY case by default and add a parameter
> to make non-permanent -ENOMEM visible.

No I have not been able to reproduce this yet.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 12/14] mm: memcontrol: move socket code for unified hierarchy accounting

2015-11-20 Thread Vladimir Davydov
On Thu, Nov 12, 2015 at 06:41:31PM -0500, Johannes Weiner wrote:
> The unified hierarchy memory controller will account socket
> memory. Move the infrastructure functions accordingly.
> 
> Signed-off-by: Johannes Weiner 
> Acked-by: Michal Hocko 

Reviewed-by: Vladimir Davydov 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 06/14] net: tcp_memcontrol: remove dead per-memcg count of allocated sockets

2015-11-20 Thread Vladimir Davydov
On Thu, Nov 12, 2015 at 06:41:25PM -0500, Johannes Weiner wrote:
> The number of allocated sockets is used for calculations in the soft
> limit phase, where packets are accepted but the socket is under memory
> pressure. Since there is no soft limit phase in tcp_memcontrol, and
> memory pressure is only entered when packets are already dropped, this
> is actually dead code. Remove it.

Actually, we can get into the soft limit phase due to the global limit
(tcp_memory_pressure is set), but then using per-memcg sockets_allocated
counter is just wrong.

> 
> As this is the last user of parent_cg_proto(), remove that too.
> 
> Signed-off-by: Johannes Weiner 

Reviewed-by: Vladimir Davydov 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


tärkeitä viestejä

2015-11-20 Thread LendFair Loans®



--
Hyvää päivää,

 Olen rouva Ruth Ashenden, toimeenpaneva aine hyvin tunnustettu 
laillinen luotonanto yritys tunnetaan LendFair Loans®. Onko sinulla 
huono luotto tai olet tarvitsevat rahaa maksaa laskujaan? Annamme 
kaikenlaisia lainan henkilön tai yrityksen niinkin alhainen kuin 3% 
korolla.


    Täytä alla oleva lomake jos kiinnostaa.

   Koko nimi:
   sukupuoli:
   Tarvittava määrä:
   Kesto:

   Voit ottaa meihin yhteyttä Puh: (+44) 703 1920 090 sähköposti: 
lendfair_lo...@outlook.com


    Vilpittömästi
   Rouva Ruth Ashenden
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/14] net: tcp_memcontrol: sanitize tcp memory accounting callbacks

2015-11-20 Thread Vladimir Davydov
On Thu, Nov 12, 2015 at 06:41:27PM -0500, Johannes Weiner wrote:
> There won't be a tcp control soft limit, so integrating the memcg code
> into the global skmem limiting scheme complicates things
> unnecessarily. Replace this with simple and clear charge and uncharge
> calls--hidden behind a jump label--to account skb memory.
> 
> Note that this is not purely aesthetic: as a result of shoehorning the
> per-memcg code into the same memory accounting functions that handle
> the global level, the old code would compare the per-memcg consumption
> against the smaller of the per-memcg limit and the global limit. This
> allowed the total consumption of multiple sockets to exceed the global
> limit, as long as the individual sockets stayed within bounds. After
> this change, the code will always compare the per-memcg consumption to
> the per-memcg limit, and the global consumption to the global limit,
> and thus close this loophole.
> 
> Without a soft limit, the per-memcg memory pressure state in sockets
> is generally questionable. However, we did it until now, so we
> continue to enter it when the hard limit is hit, and packets are
> dropped, to let other sockets in the cgroup know that they shouldn't
> grow their transmit windows, either. However, keep it simple in the
> new callback model and leave memory pressure lazily when the next
> packet is accepted (as opposed to doing it synchroneously when packets
> are processed). When packets are dropped, network performance will
> already be in the toilet, so that should be a reasonable trade-off.
> 
> As described above, consumption is now checked on the per-memcg level
> and the global level separately. Likewise, memory pressure states are
> maintained on both the per-memcg level and the global level, and a
> socket is considered under pressure when either level asserts as much.
> 
> Signed-off-by: Johannes Weiner 

It leaves the legacy functionality intact, while making the code look
much better.

Reviewed-by: Vladimir Davydov 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/7] sock, cgroup: add sock->sk_cgroup

2015-11-20 Thread Daniel Wagner
Hi Tejun,

On 11/19/2015 07:52 PM, Tejun Heo wrote:
> +/*
> + * There's a theoretical window where the following accessors race with
> + * updaters and return part of the previous pointer as the prioidx or
> + * classid.  Such races are short-lived and the result isn't critical.
> + */
>  static inline u16 sock_cgroup_prioidx(struct sock_cgroup_data *skcd)
>  {
> - return skcd->prioidx;
> + return (skcd->is_data & 1) ? skcd->prioidx : 1;
>  }
>  
>  static inline u32 sock_cgroup_classid(struct sock_cgroup_data *skcd)
>  {
> - return skcd->classid;
> + return (skcd->is_data & 1) ? skcd->classid : 0;
>  }


I still try to understand what the code does, hence this stupid question:

Why is sock_cgroup_prioidx() returning 1 if is not data and
sock_cgroup_classid() a 0?

thanks,
daniel
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 13/14] mm: memcontrol: account socket memory in unified hierarchy memory controller

2015-11-20 Thread Vladimir Davydov
On Thu, Nov 12, 2015 at 06:41:32PM -0500, Johannes Weiner wrote:
...
> @@ -5514,16 +5550,43 @@ void sock_release_memcg(struct sock *sk)
>   */
>  bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
>  {
> + unsigned int batch = max(CHARGE_BATCH, nr_pages);
>   struct page_counter *counter;
> + bool force = false;
>  
> - if (page_counter_try_charge(>tcp_mem.memory_allocated,
> - nr_pages, )) {
> - memcg->tcp_mem.memory_pressure = 0;
> +#ifdef CONFIG_MEMCG_KMEM
> + if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) {
> + if (page_counter_try_charge(>tcp_mem.memory_allocated,
> + nr_pages, )) {
> + memcg->tcp_mem.memory_pressure = 0;
> + return true;
> + }
> + page_counter_charge(>tcp_mem.memory_allocated, nr_pages);
> + memcg->tcp_mem.memory_pressure = 1;
> + return false;
> + }
> +#endif
> + if (consume_stock(memcg, nr_pages))
>   return true;
> +retry:
> + if (page_counter_try_charge(>memory, batch, ))
> + goto done;
> +
> + if (batch > nr_pages) {
> + batch = nr_pages;
> + goto retry;
>   }
> - page_counter_charge(>tcp_mem.memory_allocated, nr_pages);
> - memcg->tcp_mem.memory_pressure = 1;
> - return false;
> +
> + page_counter_charge(>memory, batch);
> + force = true;
> +done:

> + css_get_many(>css, batch);

Is there any point to get css reference per each charged page? For kmem
it is absolutely necessary, because dangling slabs must block
destruction of memcg's kmem caches, which are destroyed on css_free. But
for sockets there's no such problem: memcg will be destroyed only after
all sockets are destroyed and therefore uncharged (since
sock_update_memcg pins css).

> + if (batch > nr_pages)
> + refill_stock(memcg, batch - nr_pages);
> +
> + schedule_work(>socket_work);

I think it's suboptimal to schedule the work even if we are below the
high threshold.

BTW why do we need this work at all? Why is reclaim_high called from
task_work not enough?

Thanks,
Vladimir

> +
> + return !force;
>  }
>  
>  /**
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] net: encx24j600: move rev announcement to probe function

2015-11-20 Thread David Miller
From: j...@ringle.org
Date: Wed, 18 Nov 2015 16:22:21 -0500

> From: Jon Ringle 
> 
> When encx24j600 is open and closed many times due to userspace polling the
> interface, the log gets noise with this log message.
> 
> Moving this to encx24j600_spi_probe function where it belongs.
> 
> Signed-off-by: Jon Ringle 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] unix: avoid use-after-free in ep_remove_wait_queue (w/ Fixes:)

2015-11-20 Thread Rainer Weikusat
Jason Baron  writes:
> On 11/19/2015 06:52 PM, Rainer Weikusat wrote:
>
> [...]
>
>> @@ -1590,21 +1718,35 @@ restart:
>>  goto out_unlock;
>>  }
>>  
>> -if (unix_peer(other) != sk && unix_recvq_full(other)) {
>> -if (!timeo) {
>> +if (unlikely(unix_peer(other) != sk && unix_recvq_full(other))) {
>> +if (timeo) {
>> +timeo = unix_wait_for_peer(other, timeo);
>> +
>> +err = sock_intr_errno(timeo);
>> +if (signal_pending(current))
>> +goto out_free;
>> +
>> +goto restart;
>> +}
>> +
>> +if (unix_peer(sk) != other ||
>> +unix_dgram_peer_wake_me(sk, other)) {
>>  err = -EAGAIN;
>>  goto out_unlock;
>>  }
>
> Hi,
>
> So here we are calling unix_dgram_peer_wake_me() without the sk lock the 
> first time
> through - right?

Yes. And this is obviously wrong. I spend most of the 'evening time'
(some people would call that 'night time') with testing this and didn't
get to read through it again yet. Thank you for pointing this out. I'll
send an updated patch shortly.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net 1/2] tcp: disable Fast Open on timeouts after handshake

2015-11-20 Thread David Miller
From: Yuchung Cheng 
Date: Wed, 18 Nov 2015 18:17:30 -0800

> Some middle-boxes black-hole the data after the Fast Open handshake
> (https://www.ietf.org/proceedings/94/slides/slides-94-tcpm-13.pdf).
> The exact reason is unknown. The work-around is to disable Fast Open
> temporarily after multiple recurring timeouts with few or no data
> delivered in the established state.
> 
> Signed-off-by: Yuchung Cheng 
> Signed-off-by: Eric Dumazet 
> Reported-by: Christoph Paasch 

Applied and queued up for -stable.

Just out of curiosity, why isn't a test for zero data sufficient?

Do these middle-boxes sometimes not black-hole all of the data?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Fw: [Bug 108191] New: tcp option TCP_USER_TIMEOUT working incorrect within tcp keepalive.

2015-11-20 Thread Stephen Hemminger


Begin forwarded message:

Date: Fri, 20 Nov 2015 11:03:58 +
From: "bugzilla-dae...@bugzilla.kernel.org" 

To: "shemmin...@linux-foundation.org" 
Subject: [Bug 108191] New: tcp option TCP_USER_TIMEOUT working incorrect within 
tcp keepalive.


https://bugzilla.kernel.org/show_bug.cgi?id=108191

Bug ID: 108191
   Summary: tcp option TCP_USER_TIMEOUT working incorrect within
tcp keepalive.
   Product: Networking
   Version: 2.5
Kernel Version: 4.3
  Hardware: All
OS: Linux
  Tree: Mainline
Status: NEW
  Severity: normal
  Priority: P1
 Component: IPV4
  Assignee: shemmin...@linux-foundation.org
  Reporter: jj@163.com
Regression: No

The TCP_USER_TIMEOUT semantic means when you send an packet,
how long time no received the ACK should disconnect the connection.

In tcp retransmits case, 
retransmits_timed_out checkout this timeout, it walks well.

but in keepalive case.
the code below may have bugs:


elapsed = keepalive_time_elapsed(tp);

if (elapsed >= keepalive_time_when(tp)) {
/* If the TCP_USER_TIMEOUT option is enabled, use that
 * to determine when to timeout instead.
 */
if ((icsk->icsk_user_timeout != 0 &&
elapsed >= icsk->icsk_user_timeout &&
icsk->icsk_probes_out > 0) ||
(icsk->icsk_user_timeout == 0 &&
icsk->icsk_probes_out >= keepalive_probes(tp))) {
tcp_send_active_reset(sk, GFP_ATOMIC);
tcp_write_err(sk);
goto out;
}
.
elapsed >= icsk->icsk_user_timeout should be
elapsed-keepalive_time_when(tp) >= icsk->icsk_user_timeout 

here is the timeline:

idle   ...   keepalive1 ..   keepalive2 ...
keepalive_probes
<- katime_when ->  <- keepalive_intvl  ->
 <-  TCP_USER_TIMEOUT ->  // user expected timeout
<---elapsed>
 <-elapsed-katime_when->


/* test code */
int v;
v=1;setsockopt(fd, SOL_SOCKET, SO_KEEPALIVE, , 4);
v=30;setsockopt(fd, SOL_TCP, TCP_KEEPIDLE, , 4);
v=5;setsockopt(fd, SOL_TCP, TCP_KEEPINTVL, , 4);
v=3;setsockopt(fd, SOL_TCP, TCP_KEEPCNT, , 4);
v=20*1000; setsockopt(fd, SOL_TCP, TCP_USER_TIMEOUT, , 4); 
connect(fd, addr, sizeof(addr);
// when connect
// drop the recv data
// iptables -t filter -A INPUT --protocol tcp --dport  -j DROP
pause();

we can see 30s later, tcp start keepalive, and close connection
without do the  first retransmits (because 30+5 > 20)
but we want waiting 20 second.

-- 
You are receiving this mail because:
You are the assignee for the bug.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Add a SOCK_DESTROY operation to close sockets from userspace

2015-11-20 Thread David Ahern

On 11/19/15 6:55 PM, Lorenzo Colitti wrote:

upstream alternatives. We might even be able to show up at netdev 1.1
for some higher-bandwidth conversations.


This use case would make a great talk for netdev.

There are similar problems when netdev's are moved between namespaces 
(and VRFs).

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net] bnx2x: Fix vxlan removal

2015-11-20 Thread David Miller
From: Yuval Mintz 
Date: Thu, 19 Nov 2015 11:56:51 +0200

> Commmit ac7eccd4d48fc "bnx2x: track vxlan port count" contains a bug -
> Instead of achieving the required goal, vxlan configuration would not
> be removed since we're decrementing the port instead of the counter.
> 
> CC: Jiri Benc 
> Signed-off-by: Yuval Mintz 

Applied, thank you.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] bpf: add show_fdinfo handler for maps

2015-11-20 Thread David Miller
From: Daniel Borkmann 
Date: Thu, 19 Nov 2015 11:56:22 +0100

> Add a handler for show_fdinfo() to be used by the anon-inodes
> backend for eBPF maps, and dump the map specification there. Not
> only useful for admins, but also it provides a minimal way to
> compare specs from ELF vs pinned object.
> 
> Signed-off-by: Daniel Borkmann 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: network stream fairness

2015-11-20 Thread Niklas Cassel
On 11/09/2015 05:07 PM, Eric Dumazet wrote:
> On Mon, 2015-11-09 at 16:53 +0100, Niklas Cassel wrote:
>> On 11/09/2015 04:50 PM, Eric Dumazet wrote:
>>> On Mon, 2015-11-09 at 16:41 +0100, Niklas Cassel wrote:
 I have a ethernet driver for a 100 Mbps NIC.
 The NIC has dedicated hardware for offloading.
 The driver has implemented TSO, GSO and BQL.
 Since the CPU on the SoC is rather weak, I'd rather
 not increase the CPU load by turning off offloading.

 Since commit
 605ad7f184b6 ("tcp: refine TSO autosizing")

 the bandwidth is no longer fair between streams.
 see output at the end of the mail, where I'm testing with 2 streams.


 If I revert 605ad7f184b6 on 4.3, I get a stable 45 Mbps per stream.

 I can also use vanilla 4.3 and do:
 echo 3000 > /sys/class/net/eth0/queues/tx-0/byte_queue_limits/limit_max
 to also get a stable 45 Mbps per stream.

 My question is, am I supposed to set the BQL limit explicitly?
 It is possible that I have missed something in my driver,
 but my understanding is that the TCP stack sets and adjusts
 the BQL limit automatically.


 Perhaps the following info might help:

 After running iperf3 on vanilla 4.3:
 /sys/class/net/eth0/queues/tx-0/byte_queue_limits/
 limit 89908
 limit_max 1879048192

 After running iperf3 on vanilla 4.3 + BQL explicitly set:
 /sys/class/net/eth0/queues/tx-0/byte_queue_limits/
 limit 3000
 limit_max 3000

 After running iperf3 on 4.3 + 605ad7f184b6 reverted:
 /sys/class/net/eth0/queues/tx-0/byte_queue_limits/
 limit 8886
 limit_max 1879048192

>>>
>>> There is absolutely nothing ensuring fairness among multiple TCP flows.
>>>
>>> One TCP flow can very easily grab whole bandwidth for itself, there are
>>> numerous descriptions of this phenomena in various TCP studies. 
>>>
>>> This is why we have packet schedulers ;)
>>
>> Oh.. How stupid of me, I forgot to mention.. all of the measurements were
>> done with fq_codel.
> 
> Your numbers suggest a cwnd growth then, which might show a CC bug.
> 
> Please run the following when your iper3 runs on regular 4.3 kernel
> 
> for i in `seq 1 10`
> do
> ss -temoi dst 192.168.0.141
> sleep 1
> done
> 
> 

I've been able to reproduce this on a ARMv7, single core, 100 Mbps NIC.
Kernel vanilla 4.3, driver has BQL implemented, but is unfortunately not 
upstreamed.

ethtool -k eth0
Offload parameters for eth0:
rx-checksumming: off
tx-checksumming: on
scatter-gather: off
tcp segmentation offload: off
udp fragmentation offload: off
generic segmentation offload: off

ip addr show dev eth0
2: eth0:  mtu 1500 qdisc fq_codel state UP 
group default qlen 1000
link/ether 00:40:8c:18:58:c8 brd ff:ff:ff:ff:ff:ff
inet 192.168.0.136/24 brd 192.168.0.255 scope global eth0
   valid_lft forever preferred_lft forever

# before iperf3 run
tc -s -d qdisc
qdisc noqueue 0: dev lo root refcnt 2 
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0 
qdisc fq_codel 0: dev eth0 root refcnt 2 limit 10240p flows 1024 quantum 1514 
target 5.0ms interval 100.0ms ecn 
 Sent 21001 bytes 45 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0 
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0

sysctl net.ipv4.tcp_congestion_control
net.ipv4.tcp_congestion_control = cubic

# after iperf3 run
tc -s -d qdisc
qdisc noqueue 0: dev lo root refcnt 2 
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0 
qdisc fq_codel 0: dev eth0 root refcnt 2 limit 10240p flows 1024 quantum 1514 
target 5.0ms interval 100.0ms ecn 
 Sent 5618224754 bytes 3710914 pkt (dropped 0, overlimits 0 requeues 1) 
 backlog 0b 0p requeues 1 
  maxpacket 1514 drop_overlimit 0 new_flow_count 2 ecn_mark 0
  new_flows_len 0 old_flows_len 0

Note that it appears stable for 411 seconds before you can see the
congestion window growth. It appears that the amount of time you have
to wait before things go downhill varies a lot.
No switch was used between the server and client; they were connected directly.

For full iperf3 log and output from ss command, see attachment.

[ ID] Interval   Transfer Bandwidth   Retr  Cwnd

[  4] 411.00-412.00 sec  5.09 MBytes  42.7 Mbits/sec0   22.6 KBytes   
[  6] 411.00-412.00 sec  5.14 MBytes  43.1 Mbits/sec0   22.6 KBytes   
[SUM] 411.00-412.00 sec  10.2 MBytes  85.8 Mbits/sec0 
- - - - - - - - - - - - - - - - - - - - - - - - -
[  4] 412.00-413.00 sec  5.12 MBytes  43.0 Mbits/sec0   22.6 KBytes   
[  6] 412.00-413.00 sec  5.13 MBytes  43.0 Mbits/sec0   22.6 KBytes   
[SUM] 412.00-413.00 sec  10.3 MBytes  86.0 Mbits/sec0 
- - - - - - - - - - - - - - - - - - - - - - - - -
[  4] 413.00-414.00 sec  5.17 MBytes  43.4 Mbits/sec0   22.6 KBytes   
[  6] 

Re: [PATCH net-next 0/2] ppp: Remove PPPOX_ZOMBIE socket state

2015-11-20 Thread David Miller
From: Guillaume Nault 
Date: Thu, 19 Nov 2015 12:52:30 +0100

> Several issues have been found lately wrt. the PPPOX_ZOMBIE socket
> state. This state is now only set upon reception of a PADT to stop
> further transmissions. However this is redundant with the PADT
> workqueue mechanism introduced by 287f3a943fef ("pppoe: Use workqueue
> to die properly when a PADT is received").
> 
> We can thus simplify pppox socket state handling by getting rid of
> PPPOX_ZOMBIE entirely.

Nice, applied to net-next, thanks!
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net: cpsw: Fix ethernet regression for dm814x

2015-11-20 Thread David Miller
From: Tony Lindgren 
Date: Wed, 18 Nov 2015 17:27:25 -0800

> Commit b6745f6e4e63 ("drivers: net: cpsw: davinci_emac: move reading mac
> id to common file") started using of_machine_is_compatible for detecting
> type but missed at dm8148 causing Ethernet to stop working.
> 
> Let's fix the issue by adding handling for dm814x.
> 
> Cc: Mugunthan V N 
> Signed-off-by: Tony Lindgren 

Applied, thank you.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net 2/2] tcp: fix Fast Open snmp over-counting bug

2015-11-20 Thread David Miller
From: Yuchung Cheng 
Date: Wed, 18 Nov 2015 18:17:31 -0800

> Fix incrementing TCPFastOpenActiveFailed snmp stats multiple times
> when the handshake experiences multiple SYN timeouts.
> 
> Signed-off-by: Yuchung Cheng 
> Signed-off-by: Eric Dumazet 

Applied.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net] tcp: fix potential huge kmalloc() calls in TCP_REPAIR

2015-11-20 Thread David Miller
From: Eric Dumazet 
Date: Wed, 18 Nov 2015 21:03:33 -0800

> From: Eric Dumazet 
> 
> tcp_send_rcvq() is used for re-injecting data into tcp receive queue.
> 
> Problems :
> 
> - No check against size is performed, allowed user to fool kernel in
>   attempting very large memory allocations, eventually triggering
>   OOM when memory is fragmented.
> 
> - In case of fault during the copy we do not return correct errno.
> 
> Lets use alloc_skb_with_frags() to cook optimal skbs.
> 
> Fixes: 292e8d8c8538 ("tcp: Move rcvq sending to tcp_input.c")
> Fixes: c0e88ff0f256 ("tcp: Repair socket queues")
> Signed-off-by: Eric Dumazet 

Good catch, applied and queued up for -stable.

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net: tulip: turn compile-time warning into dev_warn()

2015-11-20 Thread David Miller
From: Arnd Bergmann 
Date: Thu, 19 Nov 2015 11:42:26 +0100

> The tulip driver causes annoying build-time warnings for allmodconfig
> builds for all recent architectures:
> 
> dec/tulip/winbond-840.c:910:2: warning: #warning Processor architecture 
> undefined
> dec/tulip/tulip_core.c:101:2: warning: #warning Processor architecture 
> undefined!
> 
> This is the last remaining warning for arm64, and I'd like to get rid of
> it. We don't really know the cache line size, architecturally it would
> be at least 16 bytes, but all implementations I found have 64 or 128
> bytes. Configuring tulip for 32-byte lines as we do on ARM32 seems to
> be the safe but slow default, and nobody who cares about performance these
> days would use a tulip chip anyway, so we can just use that.
> 
> To save the next person the job of trying to find out what this is for
> and picking a default for their architecture just to kill off the warning,
> I'm now removing the preprocessor #warning and turning it into a pr_warn
> or dev_warn that prints the equivalent information when the driver gets
> loaded.
> 
> Signed-off-by: Arnd Bergmann 

Seems reasonable, applied, thanks Arnd!
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] unix: avoid use-after-free in ep_remove_wait_queue (w/ Fixes:)

2015-11-20 Thread Jason Baron
On 11/19/2015 06:52 PM, Rainer Weikusat wrote:

[...]

> @@ -1590,21 +1718,35 @@ restart:
>   goto out_unlock;
>   }
>  
> - if (unix_peer(other) != sk && unix_recvq_full(other)) {
> - if (!timeo) {
> + if (unlikely(unix_peer(other) != sk && unix_recvq_full(other))) {
> + if (timeo) {
> + timeo = unix_wait_for_peer(other, timeo);
> +
> + err = sock_intr_errno(timeo);
> + if (signal_pending(current))
> + goto out_free;
> +
> + goto restart;
> + }
> +
> + if (unix_peer(sk) != other ||
> + unix_dgram_peer_wake_me(sk, other)) {
>   err = -EAGAIN;
>   goto out_unlock;
>   }

Hi,

So here we are calling unix_dgram_peer_wake_me() without the sk lock the first 
time
through - right? In that case, we can end up registering on the queue of other 
for
the callback but we might have already connected to a different remote. In that 
case,
the wakeup will crash if 'sk' has freed in the meantime.

Thanks,

-Jason

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8] net: ethernet: add driver for Aurora VLSI NB8800 Ethernet controller

2015-11-20 Thread David Miller
From: Mans Rullgard 
Date: Thu, 19 Nov 2015 13:02:59 +

> This adds a driver for the Aurora VLSI NB8800 Ethernet controller.
> It is an almost complete rewrite of a driver originally found in
> a Sigma Designs 2.6.22 tree.
> 
> Signed-off-by: Mans Rullgard 

Applied, thank you.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Fw: [Bug 108201] New: Can connect with Huawei E3131-s2 (Hi-Link) 3G modem only after reboot.

2015-11-20 Thread Stephen Hemminger
Appears to be a cdc_ether driver bug. See Bugzilla for more followup info

Begin forwarded message:

Date: Fri, 20 Nov 2015 11:11:26 +
From: "bugzilla-dae...@bugzilla.kernel.org" 

To: "shemmin...@linux-foundation.org" 
Subject: [Bug 108201] New: Can connect with Huawei E3131-s2 (Hi-Link) 3G modem 
only after reboot.


https://bugzilla.kernel.org/show_bug.cgi?id=108201

Bug ID: 108201
   Summary: Can connect with Huawei E3131-s2 (Hi-Link) 3G modem
only after reboot.
   Product: Networking
   Version: 2.5
Kernel Version: 4.1.12-1-default
  Hardware: x86-64
OS: Linux
  Tree: Mainline
Status: NEW
  Severity: normal
  Priority: P1
 Component: Other
  Assignee: shemmin...@linux-foundation.org
  Reporter: cameron...@poczta.fm
Regression: No

When I start (boot) the system from shutdown state I can't connect to Internet
because there is no network connection in pop-up network plasma menu (close to
clock in right bottom of the screen) in new openSUSE Leap 42.1 with KDE 5.

When I shutdown the laptop and start again (boot again) it still doesn't
connect (no connection available), but if I restart (reboot) the system the
connection works (shows in plasma networking pop-up menu).

>From what I can see in journalctl when I boot the laptop for the 1st time the
system recognizes this modem as memory stick and there are errors in
journalctl:

NetworkManager[878]:   (eth1): failed to find device 4 'eth1' with udev
NetworkManager[878]:   (eth1): new Ethernet device (carrier: OFF, driver:
'cdc_ether', ifindex: 4)
kernel: cdc_ether 3-1:1.0 eth1: register 'cdc_ether' at usb-:00:12.2-1, CDC
Ethernet Device, 58:2c:80:13:92:63
kernel: usbcore: registered new interface driver cdc_ether
NetworkManager[878]:   (eth1): device state change: unmanaged ->
unavailable (reason 'managed') [10 20 2]
kernel: cdc_ether 3-1:1.0 eth1: kevent 12 may have been dropped
NetworkManager[878]:   (eth1): link connected
NetworkManager[878]:   (eth1): device state change: unavailable ->
disconnected (reason 'none') [20 30 0]

...but when I reboot the laptop the system recognizes it as modem strait away
and there are no "failed" nor "dropped" messages. I will attach 2 files with
journalctl from the 1st boot and the reboot.

I have always been installing the Huawei E3131-s2 (Hi-Link) from Linux driver
attached in the modem's internal memory and it always worked, but now when I
reinstalled the openSUSE system to newer version 42.1 there was an "failed"
error with runmbbservice so I deactivated it. Anyway, the connection did not
work either.

Another thing - when I unplug the modem and plug it back in it fails again and
the connection vanishes (doesn't show up when I plug modem back) and I have to
reboot the system to be able to connect to the Internet.

I is annoying to always boot the system and reboot in order to connect t the
Internet :( So, can anyone fix this please?

-- 
You are receiving this mail because:
You are the assignee for the bug.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch net-next 0/3] mlxsw: small driver update

2015-11-20 Thread David Miller
From: Jiri Pirko 
Date: Thu, 19 Nov 2015 12:27:37 +0100

> Couple of VLAN-related patches.

Series applied.

I'm really pleased with this driver and work you guys are doing
on it.

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH linux-firmware] bnx2x: Add FW 7.13.1.0.

2015-11-20 Thread Kyle McMartin
On Thu, Nov 19, 2015 at 06:41:26PM +0200, Yuval Mintz wrote:
> This adds new FW for bnx2x, which adds the following:
>  - Ability to change outer vlan ID for some multi-function modes.
>  - FW ability for Geneve RSS classification according to inner headers.
>  - Prevent VFs from sending MAC control frames.
> 
> Signed-off-by: Yuval Mintz 

applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] gianfar: use of_property_read_bool()

2015-11-20 Thread Saurabh Sengar
use of_property_read_bool() for testing bool property

Signed-off-by: Saurabh Sengar 
---
 drivers/net/ethernet/freescale/gianfar.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/freescale/gianfar.c 
b/drivers/net/ethernet/freescale/gianfar.c
index 3e6b9b4..ebeea5e 100644
--- a/drivers/net/ethernet/freescale/gianfar.c
+++ b/drivers/net/ethernet/freescale/gianfar.c
@@ -738,7 +738,6 @@ static int gfar_of_init(struct platform_device *ofdev, 
struct net_device **pdev)
struct gfar_private *priv = NULL;
struct device_node *np = ofdev->dev.of_node;
struct device_node *child = NULL;
-   struct property *stash;
u32 stash_len = 0;
u32 stash_idx = 0;
unsigned int num_tx_qs, num_rx_qs;
@@ -854,9 +853,7 @@ static int gfar_of_init(struct platform_device *ofdev, 
struct net_device **pdev)
goto err_grp_init;
}
 
-   stash = of_find_property(np, "bd-stash", NULL);
-
-   if (stash) {
+   if (of_property_read_bool(np, "bd-stash")) {
priv->device_flags |= FSL_GIANFAR_DEV_HAS_BD_STASHING;
priv->bd_stash_en = 1;
}
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 4/4] rhashtable-test: allow to retry even if -ENOMEM was returned

2015-11-20 Thread Phil Sutter
This is rather a hack to expose the current issue with rhashtable to
under high pressure sometimes return -ENOMEM even though system memory
is not exhausted and a consecutive insert may succeed.

Signed-off-by: Phil Sutter 
---
 lib/test_rhashtable.c | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/lib/test_rhashtable.c b/lib/test_rhashtable.c
index 6fa77b3..270bf72 100644
--- a/lib/test_rhashtable.c
+++ b/lib/test_rhashtable.c
@@ -52,6 +52,10 @@ static int tcount = 10;
 module_param(tcount, int, 0);
 MODULE_PARM_DESC(tcount, "Number of threads to spawn (default: 10)");
 
+static bool enomem_retry = false;
+module_param(enomem_retry, bool, 0);
+MODULE_PARM_DESC(enomem_retry, "Retry insert even if -ENOMEM was returned 
(default: off)");
+
 struct test_obj {
int value;
struct rhash_head   node;
@@ -79,14 +83,22 @@ static struct semaphore startup_sem = 
__SEMAPHORE_INITIALIZER(startup_sem, 0);
 static int insert_retry(struct rhashtable *ht, struct rhash_head *obj,
 const struct rhashtable_params params)
 {
-   int err, retries = -1;
+   int err, retries = -1, enomem_retries = 0;
 
do {
retries++;
cond_resched();
err = rhashtable_insert_fast(ht, obj, params);
+   if (err == -ENOMEM && enomem_retry) {
+   enomem_retries++;
+   err = -EBUSY;
+   }
} while (err == -EBUSY);
 
+   if (enomem_retries)
+   pr_info(" %u insertions retried after -ENOMEM\n",
+   enomem_retries);
+
return err ? : retries;
 }
 
-- 
2.1.2

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 4/4] rhashtable-test: allow to retry even if -ENOMEM was returned

2015-11-20 Thread Phil Sutter
On Fri, Nov 20, 2015 at 06:17:20PM +0100, Phil Sutter wrote:
> This is rather a hack to expose the current issue with rhashtable to
> under high pressure sometimes return -ENOMEM even though system memory
> is not exhausted and a consecutive insert may succeed.

Please note that this problem does not show every time when running the
test in default configuration on my system. With increased number of
threads though, it becomes very visible. Load test_rhashtable like so:

modprobe test_rhashtable enomem_retry=1 tcount=20

and grep dmesg for 'insertions retried after -ENOMEM'. In my case:

# dmesg | grep -E '(insertions retried after -ENOMEM|Started)' | tail
[   34.642980]  1 insertions retried after -ENOMEM
[   34.642989]  1 insertions retried after -ENOMEM
[   34.642994]  1 insertions retried after -ENOMEM
[   34.648353]  28294 insertions retried after -ENOMEM
[   34.689687]  31262 insertions retried after -ENOMEM
[   34.714015]  16280 insertions retried after -ENOMEM
[   34.736019]  15327 insertions retried after -ENOMEM
[   34.755100]  39012 insertions retried after -ENOMEM
[   34.769116]  49369 insertions retried after -ENOMEM
[   35.387200] Started 20 threads, 0 failed

Cheers, Phil
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 3/4] rhashtable-test: calculate max_entries value by default

2015-11-20 Thread Phil Sutter
A maximum table size of 64k entries is insufficient for the multiple
threads test even in default configuration (10 threads * 5 objects =
50 objects in total). Since we know how many objects will be
inserted, calculate the max size unless overridden by parameter.

Note that specifying the exact number of objects upon table init won't
suffice as that value is being rounded down to the next power of two -
anticipate this by rounding up to the next power of two in beforehand.

Signed-off-by: Phil Sutter 
---
 lib/test_rhashtable.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/lib/test_rhashtable.c b/lib/test_rhashtable.c
index cfc3440..6fa77b3 100644
--- a/lib/test_rhashtable.c
+++ b/lib/test_rhashtable.c
@@ -36,9 +36,9 @@ static int runs = 4;
 module_param(runs, int, 0);
 MODULE_PARM_DESC(runs, "Number of test runs per variant (default: 4)");
 
-static int max_size = 65536;
+static int max_size = 0;
 module_param(max_size, int, 0);
-MODULE_PARM_DESC(runs, "Maximum table size (default: 65536)");
+MODULE_PARM_DESC(runs, "Maximum table size (default: calculated)");
 
 static bool shrinking = false;
 module_param(shrinking, bool, 0);
@@ -321,7 +321,7 @@ static int __init test_rht_init(void)
entries = min(entries, MAX_ENTRIES);
 
test_rht_params.automatic_shrinking = shrinking;
-   test_rht_params.max_size = max_size;
+   test_rht_params.max_size = max_size ? : roundup_pow_of_two(entries);
test_rht_params.nelem_hint = size;
 
pr_info("Running rhashtable test nelem=%d, max_size=%d, shrinking=%d\n",
@@ -367,6 +367,8 @@ static int __init test_rht_init(void)
return -ENOMEM;
}
 
+   test_rht_params.max_size = max_size ? :
+  roundup_pow_of_two(tcount * entries);
err = rhashtable_init(, _rht_params);
if (err < 0) {
pr_warn("Test failed: Unable to initialize hashtable: %d\n",
-- 
2.1.2

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/4] rhashtable-test: retry insert operations

2015-11-20 Thread Phil Sutter
After adding cond_resched() calls to threadfunc(), a surprisingly high
rate of insert failures occurred probably due to table resizes getting a
better chance to run in background. To not soften up the remaining
tests, retry inserts until they either succeed or fail permanently.

Also change the non-threaded test to retry insert operations, too.

Suggested-by: Thomas Graf 
Signed-off-by: Phil Sutter 
---
 lib/test_rhashtable.c | 53 ---
 1 file changed, 29 insertions(+), 24 deletions(-)

diff --git a/lib/test_rhashtable.c b/lib/test_rhashtable.c
index 63654e3..cfc3440 100644
--- a/lib/test_rhashtable.c
+++ b/lib/test_rhashtable.c
@@ -76,6 +76,20 @@ static struct rhashtable_params test_rht_params = {
 static struct semaphore prestart_sem;
 static struct semaphore startup_sem = __SEMAPHORE_INITIALIZER(startup_sem, 0);
 
+static int insert_retry(struct rhashtable *ht, struct rhash_head *obj,
+const struct rhashtable_params params)
+{
+   int err, retries = -1;
+
+   do {
+   retries++;
+   cond_resched();
+   err = rhashtable_insert_fast(ht, obj, params);
+   } while (err == -EBUSY);
+
+   return err ? : retries;
+}
+
 static int __init test_rht_lookup(struct rhashtable *ht)
 {
unsigned int i;
@@ -157,7 +171,7 @@ static s64 __init test_rhashtable(struct rhashtable *ht)
 {
struct test_obj *obj;
int err;
-   unsigned int i, insert_fails = 0;
+   unsigned int i, insert_retries = 0;
s64 start, end;
 
/*
@@ -170,22 +184,16 @@ static s64 __init test_rhashtable(struct rhashtable *ht)
struct test_obj *obj = [i];
 
obj->value = i * 2;
-
-   err = rhashtable_insert_fast(ht, >node, test_rht_params);
-   if (err == -ENOMEM || err == -EBUSY) {
-   /* Mark failed inserts but continue */
-   obj->value = TEST_INSERT_FAIL;
-   insert_fails++;
-   } else if (err) {
+   err = insert_retry(ht, >node, test_rht_params);
+   if (err > 0)
+   insert_retries += err;
+   else if (err)
return err;
-   }
-
-   cond_resched();
}
 
-   if (insert_fails)
-   pr_info("  %u insertions failed due to memory pressure\n",
-   insert_fails);
+   if (insert_retries)
+   pr_info("  %u insertions retried due to memory pressure\n",
+   insert_retries);
 
test_bucket_stats(ht);
rcu_read_lock();
@@ -244,7 +252,7 @@ static int thread_lookup_test(struct thread_data *tdata)
 
 static int threadfunc(void *data)
 {
-   int i, step, err = 0, insert_fails = 0;
+   int i, step, err = 0, insert_retries = 0;
struct thread_data *tdata = data;
 
up(_sem);
@@ -253,21 +261,18 @@ static int threadfunc(void *data)
 
for (i = 0; i < entries; i++) {
tdata->objs[i].value = (tdata->id << 16) | i;
-   cond_resched();
-   err = rhashtable_insert_fast(, >objs[i].node,
-test_rht_params);
-   if (err == -ENOMEM || err == -EBUSY) {
-   tdata->objs[i].value = TEST_INSERT_FAIL;
-   insert_fails++;
+   err = insert_retry(, >objs[i].node, test_rht_params);
+   if (err > 0) {
+   insert_retries += err;
} else if (err) {
pr_err("  thread[%d]: rhashtable_insert_fast failed\n",
   tdata->id);
goto out;
}
}
-   if (insert_fails)
-   pr_info("  thread[%d]: %d insert failures\n",
-   tdata->id, insert_fails);
+   if (insert_retries)
+   pr_info("  thread[%d]: %u insertions retried due to memory 
pressure\n",
+   tdata->id, insert_retries);
 
err = thread_lookup_test(tdata);
if (err) {
-- 
2.1.2

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 0/4] improve fault-tolerance of rhashtable runtime-test

2015-11-20 Thread Phil Sutter
The following series aims to improve lib/test_rhashtable in different
situations:

Patch 1 allows the kernel to reschedule so the test does not block too
long on slow systems.
Patch 2 fixes behaviour under pressure, retrying inserts in non-permanent
error case (-EBUSY).
Patch 3 auto-adjusts the upper table size limit according to the number
of threads (in concurrency test). In fact, the current default is
already too small.
Patch 4 makes it possible to retry inserts even in supposedly permanent
error case (-ENOMEM) to expose rhashtable's remaining problem of
-ENOMEM being not as permanent as it is expected to be.

Changes since v1:
- Introduce insert_retry() which is then used in single-threaded test as
  well.
- Do not retry inserts by default if -ENOMEM was returned.
- Rename the retry counter to be a bit more verbose about what it
  contains.
- Add patch 4 as a debugging aid.

Phil Sutter (4):
  rhashtable-test: add cond_resched() to thread test
  rhashtable-test: retry insert operations
  rhashtable-test: calculate max_entries value by default
  rhashtable-test: allow to retry even if -ENOMEM was returned

 lib/test_rhashtable.c | 76 +--
 1 file changed, 50 insertions(+), 26 deletions(-)

-- 
2.1.2

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/4] rhashtable-test: add cond_resched() to thread test

2015-11-20 Thread Phil Sutter
This should fix for soft lockup bugs triggered on slow systems.

Signed-off-by: Phil Sutter 
---
 lib/test_rhashtable.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/lib/test_rhashtable.c b/lib/test_rhashtable.c
index 8c1ad1c..63654e3 100644
--- a/lib/test_rhashtable.c
+++ b/lib/test_rhashtable.c
@@ -236,6 +236,8 @@ static int thread_lookup_test(struct thread_data *tdata)
   obj->value, key);
err++;
}
+
+   cond_resched();
}
return err;
 }
@@ -251,6 +253,7 @@ static int threadfunc(void *data)
 
for (i = 0; i < entries; i++) {
tdata->objs[i].value = (tdata->id << 16) | i;
+   cond_resched();
err = rhashtable_insert_fast(, >objs[i].node,
 test_rht_params);
if (err == -ENOMEM || err == -EBUSY) {
@@ -285,6 +288,8 @@ static int threadfunc(void *data)
goto out;
}
tdata->objs[i].value = TEST_INSERT_FAIL;
+
+   cond_resched();
}
err = thread_lookup_test(tdata);
if (err) {
-- 
2.1.2

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net 1/2] tcp: disable Fast Open on timeouts after handshake

2015-11-20 Thread Yuchung Cheng
On Fri, Nov 20, 2015 at 7:52 AM, David Miller  wrote:
> From: Yuchung Cheng 
> Date: Wed, 18 Nov 2015 18:17:30 -0800
>
>> Some middle-boxes black-hole the data after the Fast Open handshake
>> (https://www.ietf.org/proceedings/94/slides/slides-94-tcpm-13.pdf).
>> The exact reason is unknown. The work-around is to disable Fast Open
>> temporarily after multiple recurring timeouts with few or no data
>> delivered in the established state.
>>
>> Signed-off-by: Yuchung Cheng 
>> Signed-off-by: Eric Dumazet 
>> Reported-by: Christoph Paasch 
>
> Applied and queued up for -stable.
>
> Just out of curiosity, why isn't a test for zero data sufficient?
>
> Do these middle-boxes sometimes not black-hole all of the data?
Great question. I should be more clear in the commit message. The
answer is yes it should be sufficient. The tricky part is
tp->bytes_acked includes data acked in the SYN. Since we don't
remember data size sent in SYN, hence the heuristic.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: network stream fairness

2015-11-20 Thread Eric Dumazet
On Fri, 2015-11-20 at 16:33 +0100, Niklas Cassel wrote:

> I've been able to reproduce this on a ARMv7, single core, 100 Mbps NIC.
> Kernel vanilla 4.3, driver has BQL implemented, but is unfortunately not 
> upstreamed.
> 
> ethtool -k eth0
> Offload parameters for eth0:
> rx-checksumming: off
> tx-checksumming: on
> scatter-gather: off
> tcp segmentation offload: off
> udp fragmentation offload: off
> generic segmentation offload: off
> 
> ip addr show dev eth0
> 2: eth0:  mtu 1500 qdisc fq_codel state UP 
> group default qlen 1000
> link/ether 00:40:8c:18:58:c8 brd ff:ff:ff:ff:ff:ff
> inet 192.168.0.136/24 brd 192.168.0.255 scope global eth0
>valid_lft forever preferred_lft forever
> 
> # before iperf3 run
> tc -s -d qdisc
> qdisc noqueue 0: dev lo root refcnt 2 
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
>  backlog 0b 0p requeues 0 
> qdisc fq_codel 0: dev eth0 root refcnt 2 limit 10240p flows 1024 quantum 1514 
> target 5.0ms interval 100.0ms ecn 
>  Sent 21001 bytes 45 pkt (dropped 0, overlimits 0 requeues 0) 
>  backlog 0b 0p requeues 0 
>   maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>   new_flows_len 0 old_flows_len 0
> 
> sysctl net.ipv4.tcp_congestion_control
> net.ipv4.tcp_congestion_control = cubic
> 
> # after iperf3 run
> tc -s -d qdisc
> qdisc noqueue 0: dev lo root refcnt 2 
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
>  backlog 0b 0p requeues 0 
> qdisc fq_codel 0: dev eth0 root refcnt 2 limit 10240p flows 1024 quantum 1514 
> target 5.0ms interval 100.0ms ecn 
>  Sent 5618224754 bytes 3710914 pkt (dropped 0, overlimits 0 requeues 1) 
>  backlog 0b 0p requeues 1 
>   maxpacket 1514 drop_overlimit 0 new_flow_count 2 ecn_mark 0
>   new_flows_len 0 old_flows_len 0
> 
> Note that it appears stable for 411 seconds before you can see the
> congestion window growth. It appears that the amount of time you have
> to wait before things go downhill varies a lot.
> No switch was used between the server and client; they were connected 
> directly.

Hi Niklas

Your results seem to show there is no special issue ;)

With TSO off and GSO off, there is no way a 'TSO autosizing' patch would
have any effect, since this code path is not taken.

You have to wait 400 seconds before getting into a mode where one of the
flow gets bigger cwnd (25 instead of 16), and then TCP cubic simply
shows typical unfairness ...

If you absolutely need to guarantee a given throughput per flow, you
might consider using fq packet scheduler and SO_MAX_PACING_RATE socket
option.

Thanks !


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 0/2] bnx2x: Statistics patch series

2015-11-20 Thread David Miller
From: Yuval Mintz 
Date: Thu, 19 Nov 2015 17:04:34 +0200

> This series contains 2 small statistics-related patches,
> first adding a new SW statistics and the other exposing port stats
> for multi-function devices.
> 
> Please consider applying this series to `net-next'.

Series applied, thanks Yuval.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/14] net: tcp_memcontrol: sanitize tcp memory accounting callbacks

2015-11-20 Thread Johannes Weiner
On Fri, Nov 20, 2015 at 01:58:57PM +0300, Vladimir Davydov wrote:
> On Thu, Nov 12, 2015 at 06:41:27PM -0500, Johannes Weiner wrote:
> > There won't be a tcp control soft limit, so integrating the memcg code
> > into the global skmem limiting scheme complicates things
> > unnecessarily. Replace this with simple and clear charge and uncharge
> > calls--hidden behind a jump label--to account skb memory.
> > 
> > Note that this is not purely aesthetic: as a result of shoehorning the
> > per-memcg code into the same memory accounting functions that handle
> > the global level, the old code would compare the per-memcg consumption
> > against the smaller of the per-memcg limit and the global limit. This
> > allowed the total consumption of multiple sockets to exceed the global
> > limit, as long as the individual sockets stayed within bounds. After
> > this change, the code will always compare the per-memcg consumption to
> > the per-memcg limit, and the global consumption to the global limit,
> > and thus close this loophole.
> > 
> > Without a soft limit, the per-memcg memory pressure state in sockets
> > is generally questionable. However, we did it until now, so we
> > continue to enter it when the hard limit is hit, and packets are
> > dropped, to let other sockets in the cgroup know that they shouldn't
> > grow their transmit windows, either. However, keep it simple in the
> > new callback model and leave memory pressure lazily when the next
> > packet is accepted (as opposed to doing it synchroneously when packets
> > are processed). When packets are dropped, network performance will
> > already be in the toilet, so that should be a reasonable trade-off.
> > 
> > As described above, consumption is now checked on the per-memcg level
> > and the global level separately. Likewise, memory pressure states are
> > maintained on both the per-memcg level and the global level, and a
> > socket is considered under pressure when either level asserts as much.
> > 
> > Signed-off-by: Johannes Weiner 
> 
> It leaves the legacy functionality intact, while making the code look
> much better.
> 
> Reviewed-by: Vladimir Davydov 

Thank you very much!
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHSET v2] netfilter, cgroup: implement xt_cgroup2 match

2015-11-20 Thread David Miller
From: Tejun Heo 
Date: Thu, 19 Nov 2015 13:52:44 -0500

> This is the second take of the xt_cgroup2 patchset.  Changes from the
> last take are
> 
> * Instead of adding sock->sk_cgroup separately, sock->sk_cgrp_data now
>   carries either (prioidx, classid) pair or cgroup2 pointer.  This
>   avoids inflating struct sock with yet another cgroup related field.
>   Unfortunately, this does add some complexity but that's the
>   trade-off and the complexity is contained in cgroup proper.
> 
> * Various small updats as per David and Jan's reviews.

I like this a lot better, thanks.

Please address Daniel's feedback on patch #6 and then I'm personally
fine with this series.

Pablo, are you ok with me merging this into net-next directly or
would you rather I take patches 1-6 into net-next and then you can
merge and then add patch #7 on top?

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V4 net-next 0/5] net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem

2015-11-20 Thread Salil Mehta
This PATCH V4 addresses the review comment provided by 
Sergei Shtylyov. The changelog of every patch has also
been modified.

PATCH V3:
 Addresses the review comment floated by David Miller 

PATCH V2:
1) Bug Fixes and Clean-up: Internally identified
2) Addresses internal review comments by Kenneth Lee and
   by Huang Daode
3) Addresses the review comment from "Yisen.Zhuang(Zhuangyuzeng)"
4) Adds fix from Fengguang Wu for an error generated from 
   "kbuild test robot" from Intel
5) Ethtool support for TSO set option from Lisheng

PATCH V1:
Adds initial support of Hip06 SoC with below changes:  
This patch-set adds support of new Hisilicon Hip06 SoC to the existing
(already part of net-next) HNS ethernet driver for Hip05 SoC. Hip06 is
a multi-core SoC and is a derivative of Hip05 SoC with lots of new
hardware featres supported like RSS, TSO, hardware VLAN assist etc. 

The changes in the driver are mainly due to following:
 1) changes in the DMA descriptor provided by the Hip06 ethernet 
hardware. These changes need to co-exist with already present
Hip05 DMA descriptor and its operating functions. The decision
to choose the correct type of DMA descriptor is taken dynamically
depending upon the version of the hardware (i.e. V1/hip05 or
V2/hip06, see already existing hisilicon-hns-nic.txt binding file
for the detailed description version and naming).
 2) To support new features added to the Hip06 ethernet hardware:
a. RSS (Receive Side Scaling)
b. TSO (TCP Segment Offload)
c. Hardware VLAN support (currently we are initializing hardware
   to not assist in stripping the vlan tag at hardware level.
   Proper support of this feature and ethtool would come after
   these patches have been accepted)

Kindly note that, this patchset has been based on latest net-next.

Salil Mehta (5):
  net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem
  net:hns: Add Hip06 "RSS(Receive Side Scaling)" support to HNS Driver
  net:hns: Add Hip06 "TSO(TCP Segment Offload)" support HNS Driver
  net:hns: Add support of ethtool TSO set option for Hip06 in HNS
  net:hns: Add the init code to disable Hip06 "Hardware VLAN assist"

 drivers/net/ethernet/hisilicon/hns/hnae.h  |   56 ++-
 drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c  |   90 +++-
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c |  213 +++--
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.h |   25 +-
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_misc.c |6 +-
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c  |   79 +++-
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.h  |   32 +-
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c  |   68 ++-
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.h  |8 +-
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_reg.h  |   88 +++-
 drivers/net/ethernet/hisilicon/hns/hns_enet.c  |  487 +---
 drivers/net/ethernet/hisilicon/hns/hns_enet.h  |   12 +
 drivers/net/ethernet/hisilicon/hns/hns_ethtool.c   |   95 +++-
 13 files changed, 1072 insertions(+), 187 deletions(-)

-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 net-next] ravb: use clock rate as basis for GTI.TIV

2015-11-20 Thread David Miller
From: Simon Horman 
Date: Fri, 20 Nov 2015 11:29:39 -0800

> The GTI.TIV may be set to 2GHz^2 / rate, where rate is
> that of the clock of the device. Rather than assuming a
> rate of 130MHz use the actual rate of the clock.
> 
> The motivation for this is to use the correct rate on
> the r8a7795/Salvator-X which is advertised as 133MHz but
> may differ depending on the extal present on the Salvator-X.
> 
> Signed-off-by: Simon Horman 

Applied, thanks Simon.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH net-next 0/8] tipc: some cleanups and improvements

2015-11-20 Thread Jon Maloy


> -Original Message-
> From: David Miller [mailto:da...@davemloft.net]
> Sent: Friday, 20 November, 2015 14:07
> To: Jon Maloy
> Cc: netdev@vger.kernel.org; paul.gortma...@windriver.com;
> parthasarathy.xx.bhuvara...@ericsson.com; Richard Alpe; Ying Xue;
> ma...@donjonn.com; tipc-discuss...@lists.sourceforge.net
> Subject: Re: [PATCH net-next 0/8] tipc: some cleanups and improvements
> 
> From: Jon Maloy 
> Date: Thu, 19 Nov 2015 14:30:38 -0500
> 
> > This series mostly contains cleanups and cosmetic code changes.
> > The only real functional change is in #4 and #5, where we change the
> > locking structure for nodes and links in order to permit full
> > concurrency between links working in parallel on different interfaces.
> > Since the groundwork for this has been done in previous commit series,
> > this change constitutes only the final, small step to achieve that goal.
> 
> Series applied, thanks.
> 
> Generally speaking, rwlock usage really never buys you anything significant.
> Therefore in the long run I think a single spinlock plus RCU is going to be
> much better for per-node locking in TIPC.

Thank you for the feedback.  My own benchmarking has already confirmed
what you are stating.  I am currently looking at how to convert it to RCU.

///jon

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net] ppp: fix pppoe_dev deletion condition in pppoe_release()

2015-11-20 Thread Christoph Schulz

Hello!

David Miller schrieb am Fri, 23 Oct 2015 03:30:48 -0700 (PDT):


From: Guillaume Nault 
Date: Thu, 22 Oct 2015 16:57:10 +0200


We can't rely on PPPOX_ZOMBIE to decide whether to clear po->pppoe_dev.
PPPOX_ZOMBIE can be set by pppoe_disc_rcv() even when po->pppoe_dev is
NULL. So we have no guarantee that (sk->sk_state & PPPOX_ZOMBIE) implies
(po->pppoe_dev != NULL).
Since we're releasing a PPPoE socket, we want to release the pppoe_dev
if it exists and reset sk_state to PPPOX_DEAD, no matter the previous
value of sk_state. So we can just check for po->pppoe_dev and avoid any
assumption on sk->sk_state.

Fixes: 2b018d57ff18 ("pppoe: drop PPPOX_ZOMBIEs in pppoe_release")
Signed-off-by: Guillaume Nault 


Applied and queued up for -stable, thanks.


Somehow this commit (1acea4f6ce1b1c0941438aca75dd2e5c6b09db60) did not  
make it into Linux 4.2.6, 4.1.13, or 3.18.24. But I don't find it in  
your stable bundle on Patchwork either. Has this patch been  
inadvertently "lost in translation"?



Best regards,
--
Christoph Schulz

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 13/14] mm: memcontrol: account socket memory in unified hierarchy memory controller

2015-11-20 Thread Johannes Weiner
On Fri, Nov 20, 2015 at 04:10:33PM +0300, Vladimir Davydov wrote:
> On Thu, Nov 12, 2015 at 06:41:32PM -0500, Johannes Weiner wrote:
> ...
> > @@ -5514,16 +5550,43 @@ void sock_release_memcg(struct sock *sk)
> >   */
> >  bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int 
> > nr_pages)
> >  {
> > +   unsigned int batch = max(CHARGE_BATCH, nr_pages);
> > struct page_counter *counter;
> > +   bool force = false;
> >  
> > -   if (page_counter_try_charge(>tcp_mem.memory_allocated,
> > -   nr_pages, )) {
> > -   memcg->tcp_mem.memory_pressure = 0;
> > +#ifdef CONFIG_MEMCG_KMEM
> > +   if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) {
> > +   if (page_counter_try_charge(>tcp_mem.memory_allocated,
> > +   nr_pages, )) {
> > +   memcg->tcp_mem.memory_pressure = 0;
> > +   return true;
> > +   }
> > +   page_counter_charge(>tcp_mem.memory_allocated, nr_pages);
> > +   memcg->tcp_mem.memory_pressure = 1;
> > +   return false;
> > +   }
> > +#endif
> > +   if (consume_stock(memcg, nr_pages))
> > return true;
> > +retry:
> > +   if (page_counter_try_charge(>memory, batch, ))
> > +   goto done;
> > +
> > +   if (batch > nr_pages) {
> > +   batch = nr_pages;
> > +   goto retry;
> > }
> > -   page_counter_charge(>tcp_mem.memory_allocated, nr_pages);
> > -   memcg->tcp_mem.memory_pressure = 1;
> > -   return false;
> > +
> > +   page_counter_charge(>memory, batch);
> > +   force = true;
> > +done:
> 
> > +   css_get_many(>css, batch);
> 
> Is there any point to get css reference per each charged page? For kmem
> it is absolutely necessary, because dangling slabs must block
> destruction of memcg's kmem caches, which are destroyed on css_free. But
> for sockets there's no such problem: memcg will be destroyed only after
> all sockets are destroyed and therefore uncharged (since
> sock_update_memcg pins css).

I'm afraid we have to when we want to share 'stock' with cache and
anon pages, which hold individual references. drain_stock() always
assumes one reference per cached page.

> > +   if (batch > nr_pages)
> > +   refill_stock(memcg, batch - nr_pages);
> > +
> > +   schedule_work(>socket_work);
> 
> I think it's suboptimal to schedule the work even if we are below the
> high threshold.

Hm, it seemed unnecessary to duplicate the hierarchy check since this
is in the batch-exhausted slowpath anyway.

> BTW why do we need this work at all? Why is reclaim_high called from
> task_work not enough?

The problem lies in the memcg association: the random task that gets
interrupted by an arriving packet might not be in the same memcg as
the one owning receiving socket. And multiple interrupts could happen
while we're in the kernel already charging pages. We'd basically have
to maintain a list of memcgs that need to run reclaim_high associated
with current.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V4 net-next 1/5] net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem

2015-11-20 Thread Salil Mehta
This patchset adds support of Hisilicon Hip06 SoC to the existing HNS
ethernet driver.

The changes in the driver are mainly due to changes in the DMA
descriptor provided by the Hip06 ethernet hardware. These changes
need to co-exist with already present Hip05 DMA descriptor and its
operating functions. The decision to choose the correct type of DMA
descriptor is taken dynamically depending upon the version of the
hardware (i.e. V1/hip05 or V2/hip06, see alredy existing
hisilicon-hns-nic.txt binding file for detailed description). other
changes includes in SBM, DSAF and PPE modules as well. Changes
affecting the driver related to the newly added ethernet hardware
features in Hip06 would be added as separate patch over this and
subsequent patches.

Signed-off-by: Salil Mehta 
Signed-off-by: yankejian 
Signed-off-by: huangdaode 
Signed-off-by: lipeng 
Signed-off-by: lisheng 
Signed-off-by: Fengguang Wu 
---

PATCH V4:
No change over PATCH V3

PATCH V3:
- This patch addresses comments floated by David Miller on
  PATCH V2. In summary, changing is_ver1 data-type from 'int' to
  'bool' at different places of the code:
  Link: https://lkml.org/lkml/2015/11/18/656

PATCH V2:
- Fix the comment from "kbuild test robot" from Intel(Fengguang Wu)
  Link: https://lkml.org/lkml/2015/10/20/562
https://lkml.org/lkml/2015/10/20/563
- Fixes the internal review comments from:
  Kenneth Lee 
  huangdaode 

PATCH V1:
- Intial driver Version to support HNS over Hip06 SoC
---
 drivers/net/ethernet/hisilicon/hns/hnae.h  |   49 ++-
 drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c  |   29 ++
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c |  213 +---
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.h |   25 +-
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_misc.c |6 +-
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c  |6 +-
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c  |   68 +++-
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.h  |8 +-
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_reg.h  |   72 +++-
 drivers/net/ethernet/hisilicon/hns/hns_enet.c  |  364 
 drivers/net/ethernet/hisilicon/hns/hns_enet.h  |   12 +
 drivers/net/ethernet/hisilicon/hns/hns_ethtool.c   |2 +-
 12 files changed, 677 insertions(+), 177 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns/hnae.h 
b/drivers/net/ethernet/hisilicon/hns/hnae.h
index cec95ac..aa53dd3 100644
--- a/drivers/net/ethernet/hisilicon/hns/hnae.h
+++ b/drivers/net/ethernet/hisilicon/hns/hnae.h
@@ -35,7 +35,7 @@
 #include 
 #include 
 
-#define HNAE_DRIVER_VERSION "1.3.0"
+#define HNAE_DRIVER_VERSION "2.0"
 #define HNAE_DRIVER_NAME "hns"
 #define HNAE_COPYRIGHT "Copyright(c) 2015 Huawei Corporation."
 #define HNAE_DRIVER_STRING "Hisilicon Network Subsystem Driver"
@@ -63,6 +63,7 @@ do { \
 
 #define AE_VERSION_1 ('6' << 16 | '6' << 8 | '0')
 #define AE_VERSION_2 ('1' << 24 | '6' << 16 | '1' << 8 | '0')
+#define AE_IS_VER1(ver) ((ver) == AE_VERSION_1)
 #define AE_NAME_SIZE 16
 
 /* some said the RX and TX RCB format should not be the same in the future. But
@@ -144,23 +145,59 @@ enum hnae_led_state {
 #define HNS_RXD_ASID_S 24
 #define HNS_RXD_ASID_M (0xff << HNS_RXD_ASID_S)
 
+#define HNSV2_TXD_RI_B   1
+#define HNSV2_TXD_L4CS_B   2
+#define HNSV2_TXD_L3CS_B   3
+#define HNSV2_TXD_FE_B   4
+#define HNSV2_TXD_VLD_B  5
+
+#define HNSV2_TXD_TSE_B   0
+#define HNSV2_TXD_VLAN_EN_B   1
+#define HNSV2_TXD_SNAP_B   2
+#define HNSV2_TXD_IPV6_B   3
+#define HNSV2_TXD_SCTP_B   4
+
 /* hardware spec ring buffer format */
 struct __packed hnae_desc {
__le64 addr;
union {
struct {
-   __le16 asid_bufnum_pid;
+   union {
+   __le16 asid_bufnum_pid;
+   __le16 asid;
+   };
__le16 send_size;
-   __le32 flag_ipoffset;
-   __le32 reserved_3[4];
+   union {
+   __le32 flag_ipoffset;
+   struct {
+   __u8 bn_pid;
+   __u8 ra_ri_cs_fe_vld;
+   __u8 ip_offset;
+   __u8 tse_vlan_snap_v6_sctp_nth;
+   };
+   };
+   __le16 mss;
+   __u8 l4_len;
+   __u8 reserved1;
+   __le16 paylen;
+   __u8 vmid;
+   __u8 qid;
+   __le32 reserved2[2];
} tx;
 
struct {
__le32 ipoff_bnum_pid_flag;

Re: tty,net: use-after-free in x25_asy_open_tty

2015-11-20 Thread Peter Hurley
[ + David Miller ]

On 11/20/2015 08:56 AM, Sasha Levin wrote:
> Hi all,
> 
> While fuzzing with syzkaller inside a kvmtools guest running latest -next 
> kernel, I've hit:
> 
> [  634.336761] 
> ==
> [  634.338226] BUG: KASAN: use-after-free in x25_asy_open_tty+0x13d/0x490 at 
> addr 8800a743efd0
> [  634.339558] Read of size 4 by task syzkaller_execu/8981
> [  634.340359] 
> =
> [  634.341598] BUG kmalloc-512 (Not tainted): kasan: bad access detected

Thanks for the report, Sasha.
Would you please test the patch below?

The ldisc api should really prevent these kinds of errors. I'll prepare
a patch to the tty core which should address the api weakness.

Regards,
Peter Hurley

--->% ---
Subject: [PATCH] wan/x25: Fix use-after-free in x25_asy_open_tty()

The N_X25 line discipline may access the previous line discipline's closed
and already-freed private data on open [1].

The tty->disc_data field _never_ refers to valid data on entry to the
line discipline's open() method. Rather, the ldisc is expected to
initialize that field for its own use for the lifetime of the instance
(ie. from open() to close() only).

[1] Report by Sasha Levin 
[  634.336761] 
==
[  634.338226] BUG: KASAN: use-after-free in x25_asy_open_tty+0x13d/0x490 
at addr 8800a743efd0
[  634.339558] Read of size 4 by task syzkaller_execu/8981
[  634.340359] 
=
[  634.341598] BUG kmalloc-512 (Not tainted): kasan: bad access detected
...
[  634.405018] Call Trace:
[  634.405277] dump_stack (lib/dump_stack.c:52)
[  634.405775] print_trailer (mm/slub.c:655)
[  634.406361] object_err (mm/slub.c:662)
[  634.406824] kasan_report_error (mm/kasan/report.c:138 
mm/kasan/report.c:236)
[  634.409581] __asan_report_load4_noabort (mm/kasan/report.c:279)
[  634.411355] x25_asy_open_tty (drivers/net/wan/x25_asy.c:559 
(discriminator 1))
[  634.413997] tty_ldisc_open.isra.2 (drivers/tty/tty_ldisc.c:447)
[  634.414549] tty_set_ldisc (drivers/tty/tty_ldisc.c:567)
[  634.415057] tty_ioctl (drivers/tty/tty_io.c:2646 
drivers/tty/tty_io.c:2879)
[  634.423524] do_vfs_ioctl (fs/ioctl.c:43 fs/ioctl.c:607)
[  634.427491] SyS_ioctl (fs/ioctl.c:622 fs/ioctl.c:613)
[  634.427945] entry_SYSCALL_64_fastpath (arch/x86/entry/entry_64.S:188)

Reported-by: Sasha Levin 
Signed-off-by: Peter Hurley 
---
 drivers/net/wan/x25_asy.c | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/drivers/net/wan/x25_asy.c b/drivers/net/wan/x25_asy.c
index 5c47b01..cd39025 100644
--- a/drivers/net/wan/x25_asy.c
+++ b/drivers/net/wan/x25_asy.c
@@ -549,16 +549,12 @@ static void x25_asy_receive_buf(struct tty_struct *tty,
 
 static int x25_asy_open_tty(struct tty_struct *tty)
 {
-   struct x25_asy *sl = tty->disc_data;
+   struct x25_asy *sl;
int err;
 
if (tty->ops->write == NULL)
return -EOPNOTSUPP;
 
-   /* First make sure we're not already connected. */
-   if (sl && sl->magic == X25_ASY_MAGIC)
-   return -EEXIST;
-
/* OK.  Find a free X.25 channel to use. */
sl = x25_asy_alloc();
if (sl == NULL)
-- 
2.6.3


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] net: remove useless check in napi_gro_frags()

2015-11-20 Thread David Miller
From: Eric Dumazet 
Date: Thu, 19 Nov 2015 13:43:45 -0800

> On Thu, 2015-11-19 at 16:06 -0500, Aaron Conole wrote:
> 
>> >
>> 
>> Would the following be an appropriate change in addition to the one
>> you've posted, then? If so I can repost as a formal patch, if you'd
>> like. At present, there's only one user of napi_frags_skb(), and your
>> patch removes the NULL check. If this can really only be the result of
>> buggy driver, then perhaps we should just call out the bug?
> 
> Lets mark my patch as "premature" optimization, and revisit whole thing
> after audit of the 10 drivers using this interface ;)

Also BUG_ON() is way too large a hammer.

An attempt to continue should be made in some way, so that person
inspecting the message and still have a network and work on fixing
the driver after the check triggers :-)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 0/8] tipc: some cleanups and improvements

2015-11-20 Thread David Miller
From: Jon Maloy 
Date: Thu, 19 Nov 2015 14:30:38 -0500

> This series mostly contains cleanups and cosmetic code changes.
> The only real functional change is in #4 and #5, where we change the
> locking structure for nodes and links in order to permit full
> concurrency between links working in parallel on different interfaces.
> Since the groundwork for this has been done in previous commit series,
> this change constitutes only the final, small step to achieve that goal.

Series applied, thanks.

Generally speaking, rwlock usage really never buys you anything
significant.  Therefore in the long run I think a single spinlock plus
RCU is going to be much better for per-node locking in TIPC.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 net-next] ravb: use clock rate as basis for GTI.TIV

2015-11-20 Thread Simon Horman
The GTI.TIV may be set to 2GHz^2 / rate, where rate is
that of the clock of the device. Rather than assuming a
rate of 130MHz use the actual rate of the clock.

The motivation for this is to use the correct rate on
the r8a7795/Salvator-X which is advertised as 133MHz but
may differ depending on the extal present on the Salvator-X.

Signed-off-by: Simon Horman 

---
v2
* Corrected typos in changelog, as pointed out by Geert Uytterhoeven
* Use do_div() rather than 64-bit division to allow compilation on
  32-bit ARM

v3
* Dropped RFC prefix
---
 drivers/net/ethernet/renesas/ravb.h  |  3 +++
 drivers/net/ethernet/renesas/ravb_main.c | 38 +++-
 2 files changed, 40 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/renesas/ravb.h 
b/drivers/net/ethernet/renesas/ravb.h
index 0623fff932e4..f9dee7436e81 100644
--- a/drivers/net/ethernet/renesas/ravb.h
+++ b/drivers/net/ethernet/renesas/ravb.h
@@ -576,6 +576,9 @@ enum GTI_BIT {
GTI_TIV = 0x0FFF,
 };
 
+#define GTI_TIV_MAXGTI_TIV
+#define GTI_TIV_MIN0x20
+
 /* GIC */
 enum GIC_BIT {
GIC_PTCE= 0x0001,   /* Undocumented? */
diff --git a/drivers/net/ethernet/renesas/ravb_main.c 
b/drivers/net/ethernet/renesas/ravb_main.c
index ee8d1ec61fab..990dc55cdada 100644
--- a/drivers/net/ethernet/renesas/ravb_main.c
+++ b/drivers/net/ethernet/renesas/ravb_main.c
@@ -32,6 +32,8 @@
 #include 
 #include 
 
+#include 
+
 #include "ravb.h"
 
 #define RAVB_DEF_MSG_ENABLE \
@@ -1659,6 +1661,38 @@ static const struct of_device_id ravb_match_table[] = {
 };
 MODULE_DEVICE_TABLE(of, ravb_match_table);
 
+static int ravb_set_gti(struct net_device *ndev)
+{
+
+   struct device *dev = ndev->dev.parent;
+   struct device_node *np = dev->of_node;
+   unsigned long rate;
+   struct clk *clk;
+   uint64_t inc;
+
+   clk = of_clk_get(np, 0);
+   if (IS_ERR(clk)) {
+   dev_err(dev, "could not get clock\n");
+   return PTR_ERR(clk);
+   }
+
+   rate = clk_get_rate(clk);
+   clk_put(clk);
+
+   inc = 10ULL << 20;
+   do_div(inc, rate);
+
+   if (inc < GTI_TIV_MIN || inc > GTI_TIV_MAX) {
+   dev_err(dev, "gti.tiv increment 0x%llx is outside the range 
0x%x - 0x%x\n",
+   inc, GTI_TIV_MIN, GTI_TIV_MAX);
+   return -EINVAL;
+   }
+
+   ravb_write(ndev, inc, GTI);
+
+   return 0;
+}
+
 static int ravb_probe(struct platform_device *pdev)
 {
struct device_node *np = pdev->dev.of_node;
@@ -1755,7 +1789,9 @@ static int ravb_probe(struct platform_device *pdev)
   CCC);
 
/* Set GTI value */
-   ravb_write(ndev, ((1000 << 20) / 130) & GTI_TIV, GTI);
+   error = ravb_set_gti(ndev);
+   if (error)
+   goto out_release;
 
/* Request GTI loading */
ravb_write(ndev, ravb_read(ndev, GCCR) | GCCR_LTI, GCCR);
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V4 net-next 4/5] net:hns: Add support of ethtool TSO set option for Hip06 in HNS

2015-11-20 Thread Salil Mehta
From: Salil 

This patch adds the support of ethtool TSO option to support
Hip06 SoC to HNS

Signed-off-by: Salil Mehta 
Signed-off-by: lisheng 
---

PATCH V4:
This fixes the comments given by Sergei Shtylyov over the PATCH V3:
 Link: https://lkml.org/lkml/2015/11/20/358

PATCH V3/V2:
- No change over the initial patch

PATCH V1:
- Initial version of Ethtool support of TSO by Lisheng
---
 drivers/net/ethernet/hisilicon/hns/hns_enet.c |   47 +
 1 file changed, 47 insertions(+)

diff --git a/drivers/net/ethernet/hisilicon/hns/hns_enet.c 
b/drivers/net/ethernet/hisilicon/hns/hns_enet.c
index 055e14c..09995d2 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_enet.c
@@ -1386,6 +1386,51 @@ static int hns_nic_change_mtu(struct net_device *ndev, 
int new_mtu)
return ret;
 }
 
+static int hns_nic_set_features(struct net_device *netdev,
+   netdev_features_t features)
+{
+   struct hns_nic_priv *priv = netdev_priv(netdev);
+   struct hnae_handle *h = priv->ae_handle;
+
+   switch (priv->enet_ver) {
+   case AE_VERSION_1:
+   if (features & (NETIF_F_TSO | NETIF_F_TSO6))
+   netdev_info(netdev, "enet v1 do not support tso!\n");
+   break;
+   default:
+   if (features & (NETIF_F_TSO | NETIF_F_TSO6)) {
+   priv->ops.fill_desc = fill_tso_desc;
+   priv->ops.maybe_stop_tx = hns_nic_maybe_stop_tso;
+   /* The chip only support 7*4096 */
+   netif_set_gso_max_size(netdev, 7 * 4096);
+   h->dev->ops->set_tso_stats(h, 1);
+   } else {
+   priv->ops.fill_desc = fill_v2_desc;
+   priv->ops.maybe_stop_tx = hns_nic_maybe_stop_tx;
+   h->dev->ops->set_tso_stats(h, 0);
+   }
+   break;
+   }
+   netdev->features = features;
+   return 0;
+}
+
+static netdev_features_t hns_nic_fix_features(
+   struct net_device *netdev, netdev_features_t features)
+{
+   struct hns_nic_priv *priv = netdev_priv(netdev);
+
+   switch (priv->enet_ver) {
+   case AE_VERSION_1:
+   features &= ~(NETIF_F_TSO | NETIF_F_TSO6 |
+   NETIF_F_HW_VLAN_CTAG_FILTER);
+   break;
+   default:
+   break;
+   }
+   return features;
+}
+
 /**
  * nic_set_multicast_list - set mutl mac address
  * @netdev: net device
@@ -1481,6 +1526,8 @@ static const struct net_device_ops hns_nic_netdev_ops = {
.ndo_set_mac_address = hns_nic_net_set_mac_address,
.ndo_change_mtu = hns_nic_change_mtu,
.ndo_do_ioctl = hns_nic_do_ioctl,
+   .ndo_set_features = hns_nic_set_features,
+   .ndo_fix_features = hns_nic_fix_features,
.ndo_get_stats64 = hns_nic_get_stats64,
 #ifdef CONFIG_NET_POLL_CONTROLLER
.ndo_poll_controller = hns_nic_poll_controller,
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND] cgroups: Allow dynamically changing net_classid

2015-11-20 Thread Tejun Heo
Hello,

On Fri, Nov 20, 2015 at 12:31:39PM -0800, Nina Schiff wrote:
> The classid of a process is changed either when a process is moved to
> or from a cgroup or when the net_cls.classid file is updated.
> Previously net_cls only supported propogating these changes to the
> cgroup's related sockets when a process was added or removed from the
> cgroup. This means it was neccessary to remove and re-add all processes
> to a cgroup in order to update its classid. This change introduces
> support for doing this dynamically - i.e. when the value is changed in
> the net_cls_classid file, this will also trigger an update to the
> classid associated with all sockets controlled by the cgroup.
> This mimics the behaviour of other cgroup subsystems.
> net_prio circumvents this issue by storing an index into a table with
> each socket (and so any updates to the table, don't require updating
> the value associated with the socket). net_cls, however, passes the
> socket the classid directly, and so this additional step is needed.
> 
> Signed-off-by: Nina Schiff 

Acked-by: Tejun Heo 

This was broken from the beginning.  Thanks for fixing this.

BTW, this will cause a context conflict with the cgroup2 match
patches.  I'll update the patchset once this lands in net-next.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net 1/1] tipc: correct settings of broadcast link state

2015-11-20 Thread David Miller
From: Jon Maloy 
Date: Thu, 19 Nov 2015 14:12:50 -0500

> Since commit 5266698661401afc5e ("tipc: let broadcast packet
> reception use new link receive function") the broadcast send
> link state was meant to always be set to LINK_ESTABLISHED, since
> we don't need this link to follow the regular link FSM rules. It
> was also the intention that this state anyway shouldn't impact
> the run-time working state of the link, since the latter in
> reality is controlled by the number of registered peers.
> 
> We have now discovered that this assumption is not quite correct.
> If the broadcast link is reset because of too many retransmissions,
> its state will inadvertently go to LINK_RESETTING, and never go
> back to LINK_ESTABLISHED, because the LINK_FAILURE event was not
> anticipated. This will work well once, but if it happens a second
> time, the reset on a link in LINK_RESETTING has has no effect, and
> neither the broadcast link nor the unicast links will go down as
> they should.
> 
> Furthermore, it is confusing that the management tool shows that
> this link is in UP state when that obviously isn't the case.
> 
> We now ensure that this state strictly follows the true working
> state of the link. The state is set to LINK_ESTABLISHED when
> the number of peers is non-zero, and to LINK_RESET otherwise.
> 
> Signed-off-by: Jon Maloy 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3 v3] dl2k: Implement suspend

2015-11-20 Thread David Miller
From: Ondrej Zary 
Date: Thu, 19 Nov 2015 20:13:06 +0100

> Add suspend/resume support to dl2k driver.
> This requires RX/TX rings to be reset so split out the required
> functionality from alloc_list() into new rio_reset_ring().
> 
> Tested on Asus NX1101 (IP1000A) and D-Link DGE-550T (DL-2000).
> 
> Signed-off-by: Ondrej Zary 

Applied to net-next.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHSET v2] netfilter, cgroup: implement xt_cgroup2 match

2015-11-20 Thread Pablo Neira Ayuso
On Fri, Nov 20, 2015 at 08:56:25PM +0100, Pablo Neira Ayuso wrote:
> Regarding #7, I have a couple two concerns:
> 
> 1) cgroup currently doesn't work the way users expect, ie. to perform any
>reasonable firewalling. Since this relies on early demux, only a
>limited number of sockets get access to the cgroup info.

Ops sorry, I forgot to indicate that I'm refering to the INPUT chain.

> 2) We have traditionally rejected match2 and target2 extensions. I
>guess you can accomodate the new cgroup code through the revision
>iptables infrastructure, so we still use the cgroup match.
> 
> Let me know, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHSET v2] netfilter, cgroup: implement xt_cgroup2 match

2015-11-20 Thread Pablo Neira Ayuso
On Fri, Nov 20, 2015 at 01:59:12PM -0500, David Miller wrote:
> From: Tejun Heo 
> Date: Thu, 19 Nov 2015 13:52:44 -0500
> 
> > This is the second take of the xt_cgroup2 patchset.  Changes from the
> > last take are
> > 
> > * Instead of adding sock->sk_cgroup separately, sock->sk_cgrp_data now
> >   carries either (prioidx, classid) pair or cgroup2 pointer.  This
> >   avoids inflating struct sock with yet another cgroup related field.
> >   Unfortunately, this does add some complexity but that's the
> >   trade-off and the complexity is contained in cgroup proper.
> > 
> > * Various small updats as per David and Jan's reviews.
> 
> I like this a lot better, thanks.
> 
> Please address Daniel's feedback on patch #6 and then I'm personally
> fine with this series.
> 
> Pablo, are you ok with me merging this into net-next directly or
> would you rather I take patches 1-6 into net-next and then you can
> merge and then add patch #7 on top?

I'd suggest you get 1-6, then I'll pull this info my tree. Thanks David!

Regarding #7, I have a couple two concerns:

1) cgroup currently doesn't work the way users expect, ie. to perform any
   reasonable firewalling. Since this relies on early demux, only a
   limited number of sockets get access to the cgroup info.

2) We have traditionally rejected match2 and target2 extensions. I
   guess you can accomodate the new cgroup code through the revision
   iptables infrastructure, so we still use the cgroup match.

Let me know, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RESEND] cgroups: Allow dynamically changing net_classid

2015-11-20 Thread Nina Schiff
The classid of a process is changed either when a process is moved to
or from a cgroup or when the net_cls.classid file is updated.
Previously net_cls only supported propogating these changes to the
cgroup's related sockets when a process was added or removed from the
cgroup. This means it was neccessary to remove and re-add all processes
to a cgroup in order to update its classid. This change introduces
support for doing this dynamically - i.e. when the value is changed in
the net_cls_classid file, this will also trigger an update to the
classid associated with all sockets controlled by the cgroup.
This mimics the behaviour of other cgroup subsystems.
net_prio circumvents this issue by storing an index into a table with
each socket (and so any updates to the table, don't require updating
the value associated with the socket). net_cls, however, passes the
socket the classid directly, and so this additional step is needed.

Signed-off-by: Nina Schiff 
---
Concatented two email addresses by mistake, so resending

 net/core/netclassid_cgroup.c | 26 ++
 1 file changed, 18 insertions(+), 8 deletions(-)

diff --git a/net/core/netclassid_cgroup.c b/net/core/netclassid_cgroup.c
index 6441f47..2e4df84 100644
--- a/net/core/netclassid_cgroup.c
+++ b/net/core/netclassid_cgroup.c
@@ -56,7 +56,7 @@ static void cgrp_css_free(struct cgroup_subsys_state *css)
kfree(css_cls_state(css));
 }
 
-static int update_classid(const void *v, struct file *file, unsigned n)
+static int update_classid_sock(const void *v, struct file *file, unsigned n)
 {
int err;
struct socket *sock = sock_from_file(file, );
@@ -67,18 +67,25 @@ static int update_classid(const void *v, struct file *file, 
unsigned n)
return 0;
 }
 
-static void cgrp_attach(struct cgroup_subsys_state *css,
-   struct cgroup_taskset *tset)
+static void update_classid(struct cgroup_subsys_state *css, void *v)
 {
-   struct cgroup_cls_state *cs = css_cls_state(css);
-   void *v = (void *)(unsigned long)cs->classid;
+   struct css_task_iter it;
struct task_struct *p;
 
-   cgroup_taskset_for_each(p, tset) {
+   css_task_iter_start(css, );
+   while ((p = css_task_iter_next())) {
task_lock(p);
-   iterate_fd(p->files, 0, update_classid, v);
+   iterate_fd(p->files, 0, update_classid_sock, v);
task_unlock(p);
}
+   css_task_iter_end();
+}
+
+static void cgrp_attach(struct cgroup_subsys_state *css,
+   struct cgroup_taskset *tset)
+{
+   update_classid(css,
+  (void *)(unsigned long)css_cls_state(css)->classid);
 }
 
 static u64 read_classid(struct cgroup_subsys_state *css, struct cftype *cft)
@@ -89,8 +96,11 @@ static u64 read_classid(struct cgroup_subsys_state *css, 
struct cftype *cft)
 static int write_classid(struct cgroup_subsys_state *css, struct cftype *cft,
 u64 value)
 {
-   css_cls_state(css)->classid = (u32) value;
+   struct cgroup_cls_state *cs = css_cls_state(css);
+
+   cs->classid = (u32)value;
 
+   update_classid(css, (void *)(unsigned long)cs->classid);
return 0;
 }
 
-- 
2.4.6

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net 2/2] net: ip6mr: fix static mfc/dev leaks on table destruction

2015-11-20 Thread Cong Wang
On Fri, Nov 20, 2015 at 4:54 AM, Nikolay Aleksandrov
 wrote:
> From: Nikolay Aleksandrov 
>
> Similar to ipv4, when destroying an mrt table the static mfc entries and
> the static devices are kept, which leads to devices that can never be
> destroyed (because of refcnt taken) and leaked memory. Make sure that
> everything is cleaned up on netns destruction.
>
> Fixes: 8229efdaef1e ("netns: ip6mr: enable namespace support in ipv6 
> multicast forwarding code")
> CC: Benjamin Thery 
> Signed-off-by: Nikolay Aleksandrov 

Reviewed-by: Cong Wang 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net 1/2] net: ipmr: fix static mfc/dev leaks on table destruction

2015-11-20 Thread Cong Wang
On Fri, Nov 20, 2015 at 4:54 AM, Nikolay Aleksandrov
 wrote:
> From: Nikolay Aleksandrov 
>
> When destroying an mrt table the static mfc entries and the static
> devices are kept, which leads to devices that can never be destroyed
> (because of refcnt taken) and leaked memory, for example:
> unreferenced object 0x880034c144c0 (size 192):
>   comm "mfc-broken", pid 4777, jiffies 4320349055 (age 46001.964s)
>   hex dump (first 32 bytes):
> 98 53 f0 34 00 88 ff ff 98 53 f0 34 00 88 ff ff  .S.4.S.4
> ef 0a 0a 14 01 02 03 04 00 00 00 00 01 00 00 00  
>   backtrace:
> [] kmemleak_alloc+0x4e/0xb0
> [] kmem_cache_alloc+0x190/0x300
> [] ip_mroute_setsockopt+0x5cb/0x910
> [] do_ip_setsockopt.isra.11+0x105/0xff0
> [] ip_setsockopt+0x30/0xa0
> [] raw_setsockopt+0x33/0x90
> [] sock_common_setsockopt+0x14/0x20
> [] SyS_setsockopt+0x71/0xc0
> [] entry_SYSCALL_64_fastpath+0x16/0x7a
> [] 0x
>
> Make sure that everything is cleaned on netns destruction.
>
> Signed-off-by: Nikolay Aleksandrov 

Looks good to me,

Reviewed-by: Cong Wang 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 09/14] net: tcp_memcontrol: simplify linkage between socket and page counter

2015-11-20 Thread Johannes Weiner
On Fri, Nov 20, 2015 at 03:42:16PM +0300, Vladimir Davydov wrote:
> On Thu, Nov 12, 2015 at 06:41:28PM -0500, Johannes Weiner wrote:
> > There won't be any separate counters for socket memory consumed by
> > protocols other than TCP in the future. Remove the indirection and
> 
> I really want to believe you're right. And with vmpressure propagation
> implemented properly you are likely to be right.
> 
> However, we might still want to account other socket protos to
> memcg->memory in the unified hierarchy, e.g. UDP, or SCTP, or whatever
> else. Adding new consumers should be trivial, but it will break the
> legacy usecase, where only TCP sockets are supposed to be accounted.
> What about adding a check to sock_update_memcg() so that it would enable
> accounting only for TCP sockets in case legacy hierarchy is used?

Yup, I was thinking the same thing. But we can cross that bridge when
we come to it and are actually adding further packet types.

> For the same reason, I think we'd better rename memcg->tcp_mem to
> something like memcg->sk_mem or we can even drop the cg_proto struct
> altogether embedding its fields directly to mem_cgroup struct.
> 
> Also, I don't see any reason to have tcp_memcontrol.c file. It's tiny
> and with this patch it does not depend on tcp code any more. Let's move
> it to memcontrol.c?

I actually had all this at first, but then wondered if it makes more
sense to keep the legacy code in isolation. Don't you think it would
be easier to keep track of what's v1 and what's v2 if we keep the
legacy stuff physically separate as much as possible? In particular I
found that 'tcp_mem.' marker really useful while working on the code.

In the same vein, tcp_memcontrol.c doesn't really hurt anybody and I'd
expect it to remain mostly unopened and unchanged in the future. But
if we merge it into memcontrol.c, that code will likely be in the way
and we'd have to make it explicit somehow that this is not actually
part of the new memory controller anymore.

What do you think?

> Other than that this patch looks OK to me.

Thank you!
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/7] sock, cgroup: add sock->sk_cgroup

2015-11-20 Thread Tejun Heo
Hello, Daniel.

On Fri, Nov 20, 2015 at 12:04:05PM +0100, Daniel Wagner wrote:
> >  static inline u16 sock_cgroup_prioidx(struct sock_cgroup_data *skcd)
> >  {
> > -   return skcd->prioidx;
> > +   return (skcd->is_data & 1) ? skcd->prioidx : 1;
> >  }
> >  
> >  static inline u32 sock_cgroup_classid(struct sock_cgroup_data *skcd)
> >  {
> > -   return skcd->classid;
> > +   return (skcd->is_data & 1) ? skcd->classid : 0;
> >  }
> 
> 
> I still try to understand what the code does, hence this stupid question:
> 
> Why is sock_cgroup_prioidx() returning 1 if is not data and
> sock_cgroup_classid() a 0?

I prolly should have added comments there.  prioidx carries the cgroup
ID on the hierarchy net_prio is attached to, so if nothing is
configured, the default value would be the ID of the root cgroup which
is always 1.  For net_cls, the unconfigured default value is zero.
Will refresh the patch with comments.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V4 net-next 3/5] net:hns: Add Hip06 "TSO(TCP Segment Offload)" support HNS Driver

2015-11-20 Thread Salil Mehta
This patch adds the support of "TSO (TCP Segment Offload)" feature
provided by the Hip06 ethernet hardware to the HNS ethernet
driver.

Enabling this feature would help offload the TCP Segmentation
process to the Hip06 ethernet hardware. This eventually would help
in saving precious cpu cycles.

Signed-off-by: Salil Mehta 
Signed-off-by: lisheng 
---

PATCH V4:
No change over the previous patches

PATCH V3/V2:
- No change over the initial floated patch for TSO

PATCH V1:
- Initial support of TSO feature in Hip06 SoC in HNS driver
---
 drivers/net/ethernet/hisilicon/hns/hnae.h |1 +
 drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c |8 ++
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c |5 ++
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.h |2 +-
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_reg.h |1 +
 drivers/net/ethernet/hisilicon/hns/hns_enet.c |   82 -
 6 files changed, 95 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns/hnae.h 
b/drivers/net/ethernet/hisilicon/hns/hnae.h
index 1ee42cb..6ec5bd7 100644
--- a/drivers/net/ethernet/hisilicon/hns/hnae.h
+++ b/drivers/net/ethernet/hisilicon/hns/hnae.h
@@ -472,6 +472,7 @@ struct hnae_ae_ops {
int (*set_mac_addr)(struct hnae_handle *handle, void *p);
int (*set_mc_addr)(struct hnae_handle *handle, void *addr);
int (*set_mtu)(struct hnae_handle *handle, int new_mtu);
+   void (*set_tso_stats)(struct hnae_handle *handle, int enable);
void (*update_stats)(struct hnae_handle *handle,
 struct net_device_stats *net_stats);
void (*get_stats)(struct hnae_handle *handle, u64 *data);
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c 
b/drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c
index e5a31bc..d02fa58 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c
@@ -277,6 +277,13 @@ static int hns_ae_set_mtu(struct hnae_handle *handle, int 
new_mtu)
return hns_mac_set_mtu(mac_cb, new_mtu);
 }
 
+static void hns_ae_set_tso_stats(struct hnae_handle *handle, int enable)
+{
+   struct hns_ppe_cb *ppe_cb = hns_get_ppe_cb(handle);
+
+   hns_ppe_set_tso_enable(ppe_cb, enable);
+}
+
 static int hns_ae_start(struct hnae_handle *handle)
 {
int ret;
@@ -824,6 +831,7 @@ static struct hnae_ae_ops hns_dsaf_ops = {
.set_mc_addr = hns_ae_set_multicast_one,
.set_mtu = hns_ae_set_mtu,
.update_stats = hns_ae_update_stats,
+   .set_tso_stats = hns_ae_set_tso_stats,
.get_stats = hns_ae_get_stats,
.get_strings = hns_ae_get_strings,
.get_sset_count = hns_ae_get_sset_count,
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c 
b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c
index 824fe50..b6bf292 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c
@@ -19,6 +19,11 @@
 
 #include "hns_dsaf_ppe.h"
 
+void hns_ppe_set_tso_enable(struct hns_ppe_cb *ppe_cb, u32 value)
+{
+   dsaf_set_dev_bit(ppe_cb, PPEV2_CFG_TSO_EN_REG, 0, !!value);
+}
+
 void hns_ppe_set_rss_key(struct hns_ppe_cb *ppe_cb,
 const u32 rss_key[HNS_PPEV2_RSS_KEY_NUM])
 {
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.h 
b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.h
index dac8532..0f5cb69 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.h
+++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.h
@@ -113,7 +113,7 @@ void hns_ppe_get_regs(struct hns_ppe_cb *ppe_cb, void 
*data);
 
 void hns_ppe_get_strings(struct hns_ppe_cb *ppe_cb, int stringset, u8 *data);
 void hns_ppe_get_stats(struct hns_ppe_cb *ppe_cb, u64 *data);
-
+void hns_ppe_set_tso_enable(struct hns_ppe_cb *ppe_cb, u32 value);
 void hns_ppe_set_rss_key(struct hns_ppe_cb *ppe_cb,
 const u32 rss_key[HNS_PPEV2_RSS_KEY_NUM]);
 void hns_ppe_set_indir_table(struct hns_ppe_cb *ppe_cb,
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_reg.h 
b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_reg.h
index b070d57..98c163e 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_reg.h
+++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_reg.h
@@ -317,6 +317,7 @@
 #define PPE_CFG_TAG_GEN_REG0x90
 #define PPE_CFG_PARSE_TAG_REG  0x94
 #define PPE_CFG_PRO_CHECK_EN_REG   0x98
+#define PPEV2_CFG_TSO_EN_REG0xA0
 #define PPE_INTEN_REG  0x100
 #define PPE_RINT_REG   0x104
 #define PPE_INTSTS_REG 0x108
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_enet.c 
b/drivers/net/ethernet/hisilicon/hns/hns_enet.c
index e235714..055e14c 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_enet.c
@@ -223,6 

[PATCH V4 net-next 2/5] net:hns: Add Hip06 "RSS(Receive Side Scaling)" support to HNS Driver

2015-11-20 Thread Salil Mehta
This patch adds the support of "RSS (Receive Side Scaling)" feature
provided by the Hip06 ethernet hardware to the HNS ethernet
driver.

This feature helps in distributing the different flows (mapped as
hash by hardware using Toeplitz Hash) to different Queues asssociated
with the processor cores. The mapping of flow-hash values to the
different queues is stored in indirection table (which is per Packet-
parse-Engine/PPE). This patch also provides the changes to re-program
the (flow-hash<->Qid) mapping using the ethtool.

Signed-off-by: Salil Mehta 
Reviewed-by: Kenneth Lee 
---

PATCH V4:
- No Change over previous patches

PATCH V3:
- No change ove PATCH V2

PATCH V2:
- Fix for review-comments on PATCH V1 by Yisen.Zhuang(Zhuangyuzeng)
  Link: https://lkml.org/lkml/2015/10/21/1032
- Rework for Internal review comments by Kenneth Lee

PATCH V1:
- Initial version to support RSS and its Ethtool interface on
  Hip06 SoC
---
 drivers/net/ethernet/hisilicon/hns/hnae.h |6 ++
 drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c |   53 +++-
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c |   61 +-
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.h |   32 +--
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_reg.h |   14 
 drivers/net/ethernet/hisilicon/hns/hns_ethtool.c  |   93 +
 6 files changed, 249 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns/hnae.h 
b/drivers/net/ethernet/hisilicon/hns/hnae.h
index aa53dd3..1ee42cb 100644
--- a/drivers/net/ethernet/hisilicon/hns/hnae.h
+++ b/drivers/net/ethernet/hisilicon/hns/hnae.h
@@ -483,6 +483,12 @@ struct hnae_ae_ops {
  enum hnae_led_state status);
void (*get_regs)(struct hnae_handle *handle, void *data);
int (*get_regs_len)(struct hnae_handle *handle);
+   u32 (*get_rss_key_size)(struct hnae_handle *handle);
+   u32 (*get_rss_indir_size)(struct hnae_handle *handle);
+   int (*get_rss)(struct hnae_handle *handle, u32 *indir, u8 *key,
+  u8 *hfunc);
+   int (*set_rss)(struct hnae_handle *handle, const u32 *indir,
+  const u8 *key, const u8 hfunc);
 };
 
 struct hnae_ae_dev {
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c 
b/drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c
index c03bc1e..e5a31bc 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c
@@ -749,6 +749,53 @@ int hns_ae_get_regs_len(struct hnae_handle *handle)
return total_num;
 }
 
+static u32 hns_ae_get_rss_key_size(struct hnae_handle *handle)
+{
+   return HNS_PPEV2_RSS_KEY_SIZE;
+}
+
+static u32 hns_ae_get_rss_indir_size(struct hnae_handle *handle)
+{
+   return HNS_PPEV2_RSS_IND_TBL_SIZE;
+}
+
+static int hns_ae_get_rss(struct hnae_handle *handle, u32 *indir, u8 *key,
+ u8 *hfunc)
+{
+   struct hns_ppe_cb *ppe_cb = hns_get_ppe_cb(handle);
+
+   /* currently we support only one type of hash function i.e. Toep hash */
+   if (hfunc)
+   *hfunc = ETH_RSS_HASH_TOP;
+
+   /* get the RSS Key required by the user */
+   if (key)
+   memcpy(key, ppe_cb->rss_key, HNS_PPEV2_RSS_KEY_SIZE);
+
+   /* update the current hash->queue mappings from the shadow RSS table */
+   memcpy(indir, ppe_cb->rss_indir_table, HNS_PPEV2_RSS_IND_TBL_SIZE);
+
+   return 0;
+}
+
+static int hns_ae_set_rss(struct hnae_handle *handle, const u32 *indir,
+ const u8 *key, const u8 hfunc)
+{
+   struct hns_ppe_cb *ppe_cb = hns_get_ppe_cb(handle);
+
+   /* set the RSS Hash Key if specififed by the user */
+   if (key)
+   hns_ppe_set_rss_key(ppe_cb, (int *)key);
+
+   /* update the shadow RSS table with user specified qids */
+   memcpy(ppe_cb->rss_indir_table, indir, HNS_PPEV2_RSS_IND_TBL_SIZE);
+
+   /* now update the hardware */
+   hns_ppe_set_indir_table(ppe_cb, ppe_cb->rss_indir_table);
+
+   return 0;
+}
+
 static struct hnae_ae_ops hns_dsaf_ops = {
.get_handle = hns_ae_get_handle,
.put_handle = hns_ae_put_handle,
@@ -783,7 +830,11 @@ static struct hnae_ae_ops hns_dsaf_ops = {
.update_led_status = hns_ae_update_led_status,
.set_led_id = hns_ae_cpld_set_led_id,
.get_regs = hns_ae_get_regs,
-   .get_regs_len = hns_ae_get_regs_len
+   .get_regs_len = hns_ae_get_regs_len,
+   .get_rss_key_size = hns_ae_get_rss_key_size,
+   .get_rss_indir_size = hns_ae_get_rss_indir_size,
+   .get_rss = hns_ae_get_rss,
+   .set_rss = hns_ae_set_rss
 };
 
 int hns_dsaf_ae_init(struct dsaf_device *dsaf_dev)
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c 
b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c
index 9531992..824fe50 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c

[PATCH V4 net-next 5/5] net:hns: Add the init code to disable Hip06 "Hardware VLAN assist"

2015-11-20 Thread Salil Mehta
This patch adds the initializzation code to disable the hardware
vlan support for VLAN Tag stripping by default for now.

Proper support of "hardware VLAN assitance" feature would
soon come in the next coming patches.

Signed-off-by: Salil Mehta 
---

PATCH V4:
- No change over the earlier patches

PATCH V2/V3:
- No change over the initial floated patch

PATCH V1:
- Initial code to disable the hardware VLAN assist for now
---
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c |7 +++
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_reg.h |1 +
 2 files changed, 8 insertions(+)

diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c 
b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c
index b6bf292..544f323 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c
@@ -176,6 +176,11 @@ static void hns_ppe_cnt_clr_ce(struct hns_ppe_cb *ppe_cb)
 PPE_CNT_CLR_CE_B, 1);
 }
 
+static void hns_ppe_set_vlan_strip(struct hns_ppe_cb *ppe_cb, int en)
+{
+   dsaf_write_dev(ppe_cb, PPEV2_VLAN_STRIP_EN_REG, en);
+}
+
 /**
  * hns_ppe_checksum_hw - set ppe checksum caculate
  * @ppe_device: ppe device
@@ -345,6 +350,8 @@ static void hns_ppe_init_hw(struct hns_ppe_cb *ppe_cb)
hns_ppe_cnt_clr_ce(ppe_cb);
 
if (!AE_IS_VER1(dsaf_dev->dsaf_ver)) {
+   hns_ppe_set_vlan_strip(ppe_cb, 0);
+
hns_ppe_set_rss_key(ppe_cb, rss_key);
 
for (i = 0; i < HNS_PPEV2_RSS_IND_TBL_SIZE; i++)
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_reg.h 
b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_reg.h
index 98c163e..6c18ca9 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_reg.h
+++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_reg.h
@@ -318,6 +318,7 @@
 #define PPE_CFG_PARSE_TAG_REG  0x94
 #define PPE_CFG_PRO_CHECK_EN_REG   0x98
 #define PPEV2_CFG_TSO_EN_REG0xA0
+#define PPEV2_VLAN_STRIP_EN_REG 0xAC
 #define PPE_INTEN_REG  0x100
 #define PPE_RINT_REG   0x104
 #define PPE_INTSTS_REG 0x108
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] dl2k: Reorder and cleanup initialization

2015-11-20 Thread David Miller
From: Ondrej Zary 
Date: Thu, 19 Nov 2015 20:13:05 +0100

> Move HW init and stop into separate functions.
> Request IRQ only after the HW has been reset (so interrupts are
> disabled and no stale interrupts are pending).
> 
> Signed-off-by: Ondrej Zary 

Applied to net-next.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3 v2] dl2k: Handle memory allocation errors in alloc_list

2015-11-20 Thread David Miller
From: Ondrej Zary 
Date: Thu, 19 Nov 2015 20:13:04 +0100

> If memory allocation fails in alloc_list(), free the already allocated
> memory and return -ENOMEM. In rio_open(), call alloc_list() first and
> abort if it fails. Move HW access (set RFDListPtr) out ot alloc_list().
> 
> Signed-off-by: Ondrej Zary 

Applied to net-next.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 4/6] kcm: Kernel Connection Multiplexor module

2015-11-20 Thread Tom Herbert
This module implement the Kernel Connection Multiplexor.

Kernel Connection Multiplexor (KCM) is a facility that provides a
message based interface over TCP for generic application protocols.
With KCM an application can efficiently send and receive application
protocol messages over TCP using datagram sockets.

For more information see the included Documentation/networking/kcm.txt

Signed-off-by: Tom Herbert 
---
 include/linux/socket.h   |6 +-
 include/net/kcm.h|  121 +++
 include/uapi/linux/kcm.h |   27 +
 net/Kconfig  |1 +
 net/Makefile |1 +
 net/kcm/Kconfig  |   10 +
 net/kcm/Makefile |3 +
 net/kcm/kcmsock.c| 1974 ++
 8 files changed, 2142 insertions(+), 1 deletion(-)
 create mode 100644 include/net/kcm.h
 create mode 100644 include/uapi/linux/kcm.h
 create mode 100644 net/kcm/Kconfig
 create mode 100644 net/kcm/Makefile
 create mode 100644 net/kcm/kcmsock.c

diff --git a/include/linux/socket.h b/include/linux/socket.h
index d834af2..73bf6c6 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -200,7 +200,9 @@ struct ucred {
 #define AF_ALG 38  /* Algorithm sockets*/
 #define AF_NFC 39  /* NFC sockets  */
 #define AF_VSOCK   40  /* vSockets */
-#define AF_MAX 41  /* For now.. */
+#define AF_KCM 41  /* Kernel Connection Multiplexor*/
+
+#define AF_MAX 42  /* For now.. */
 
 /* Protocol families, same as address families. */
 #define PF_UNSPEC  AF_UNSPEC
@@ -246,6 +248,7 @@ struct ucred {
 #define PF_ALG AF_ALG
 #define PF_NFC AF_NFC
 #define PF_VSOCK   AF_VSOCK
+#define PF_KCM AF_KCM
 #define PF_MAX AF_MAX
 
 /* Maximum queue length specifiable by listen.  */
@@ -323,6 +326,7 @@ struct ucred {
 #define SOL_CAIF   278
 #define SOL_ALG279
 #define SOL_NFC280
+#define SOL_KCM281
 
 /* IPX options */
 #define IPX_TYPE   1
diff --git a/include/net/kcm.h b/include/net/kcm.h
new file mode 100644
index 000..4f371fe
--- /dev/null
+++ b/include/net/kcm.h
@@ -0,0 +1,121 @@
+/* Kernel Connection Multiplexor */
+
+#ifndef __NET_KCM_H_
+#define __NET_KCM_H_
+
+#include 
+#include 
+#include 
+
+#ifdef __KERNEL__
+
+extern unsigned int kcm_net_id;
+
+struct kcm_tx_msg {
+   unsigned int sent;
+   unsigned int fragidx;
+   unsigned int frag_offset;
+   unsigned int msg_flags;
+   struct sk_buff *frag_skb;
+   struct sk_buff *last_skb;
+};
+
+struct kcm_rx_msg {
+   int full_len;
+   int accum_len;
+   int offset;
+};
+
+/* Socket structure for KCM client sockets */
+struct kcm_sock {
+   struct sock sk;
+   struct kcm_mux *mux;
+   struct list_head kcm_sock_list;
+   int index;
+   u32 done : 1;
+   struct work_struct done_work;
+
+   /* Transmit */
+   struct kcm_psock *tx_psock;
+   struct work_struct tx_work;
+   struct list_head wait_psock_list;
+   struct sk_buff *seq_skb;
+
+   /* Don't use bit fields here, these are set under different locks */
+   bool tx_wait;
+   bool tx_wait_more;
+
+   /* Receive */
+   struct kcm_psock *rx_psock;
+   struct list_head wait_rx_list; /* KCMs waiting for receiving */
+   bool rx_wait;
+   u32 rx_disabled : 1;
+};
+
+struct bpf_prog;
+
+/* Structure for an attached lower socket */
+struct kcm_psock {
+   struct sock *sk;
+   struct kcm_mux *mux;
+   int index;
+
+   u32 tx_stopped : 1;
+   u32 rx_stopped : 1;
+   u32 done : 1;
+   u32 unattaching : 1;
+
+   void (*save_state_change)(struct sock *sk);
+   void (*save_data_ready)(struct sock *sk);
+   void (*save_write_space)(struct sock *sk);
+
+   struct list_head psock_list;
+
+   /* Receive */
+   struct sk_buff *rx_skb_head;
+   struct sk_buff **rx_skb_nextp;
+   struct sk_buff *ready_rx_msg;
+   struct list_head psock_ready_list;
+   struct work_struct rx_work;
+   struct delayed_work rx_delayed_work;
+   struct bpf_prog *bpf_prog;
+   struct kcm_sock *rx_kcm;
+
+   /* Transmit */
+   struct kcm_sock *tx_kcm;
+   struct list_head psock_avail_list;
+};
+
+/* Per net MUX list */
+struct kcm_net {
+   struct mutex mutex;
+   struct list_head mux_list;
+   int count;
+};
+
+/* Structure for a MUX */
+struct kcm_mux {
+   struct list_head kcm_mux_list;
+   struct rcu_head rcu;
+   struct kcm_net *knet;
+
+   struct list_head kcm_socks; /* All KCM sockets on MUX */
+   int kcm_socks_cnt;  /* Total KCM socket count for MUX */
+   struct list_head psocks;/* List of all psocks on MUX */
+   int psocks_cnt; /* Total attached sockets */
+
+   /* Receive */
+   spinlock_t rx_lock 

[PATCH net-next 5/6] kcm: Add statistics and proc interfaces

2015-11-20 Thread Tom Herbert
This patch adds various counters for KCM. These include counters for
messages and bytes received or sent, as well as counters for number of
attached/unattached TCP sockets and other error or edge events.

The statistics are exposed via a proc interface. /proc/net/kcm provides
statistics per KCM socket and per psock (attached TCP sockets).
/proc/net/kcm_stats provides aggregate statistics.

Signed-off-by: Tom Herbert 
---
 include/net/kcm.h | 102 +
 net/kcm/Makefile  |   2 +-
 net/kcm/kcmproc.c | 422 ++
 net/kcm/kcmsock.c |  66 +
 4 files changed, 591 insertions(+), 1 deletion(-)
 create mode 100644 net/kcm/kcmproc.c

diff --git a/include/net/kcm.h b/include/net/kcm.h
index 4f371fe..83b4f91 100644
--- a/include/net/kcm.h
+++ b/include/net/kcm.h
@@ -11,6 +11,45 @@
 
 extern unsigned int kcm_net_id;
 
+#define KCM_STATS_ADD(stat, count) \
+   ((stat) += (count))
+
+#define KCM_STATS_INCR(stat)   \
+   ((stat)++)
+
+struct kcm_psock_stats {
+   unsigned long long rx_msgs;
+   unsigned long long rx_bytes;
+   unsigned long long tx_msgs;
+   unsigned long long tx_bytes;
+   unsigned int rx_aborts;
+   unsigned int rx_mem_fail;
+   unsigned int rx_need_more_hdr;
+   unsigned int rx_bad_hdr_len;
+   unsigned long long reserved;
+   unsigned long long unreserved;
+   unsigned int tx_aborts;
+};
+
+struct kcm_mux_stats {
+   unsigned long long rx_msgs;
+   unsigned long long rx_bytes;
+   unsigned long long tx_msgs;
+   unsigned long long tx_bytes;
+   unsigned int rx_ready_drops;
+   unsigned int tx_retries;
+   unsigned int psock_attach;
+   unsigned int psock_unattach_rsvd;
+   unsigned int psock_unattach;
+};
+
+struct kcm_stats {
+   unsigned long long rx_msgs;
+   unsigned long long rx_bytes;
+   unsigned long long tx_msgs;
+   unsigned long long tx_bytes;
+};
+
 struct kcm_tx_msg {
unsigned int sent;
unsigned int fragidx;
@@ -35,6 +74,8 @@ struct kcm_sock {
u32 done : 1;
struct work_struct done_work;
 
+   struct kcm_stats stats;
+
/* Transmit */
struct kcm_psock *tx_psock;
struct work_struct tx_work;
@@ -71,6 +112,8 @@ struct kcm_psock {
 
struct list_head psock_list;
 
+   struct kcm_psock_stats stats;
+
/* Receive */
struct sk_buff *rx_skb_head;
struct sk_buff **rx_skb_nextp;
@@ -80,15 +123,21 @@ struct kcm_psock {
struct delayed_work rx_delayed_work;
struct bpf_prog *bpf_prog;
struct kcm_sock *rx_kcm;
+   unsigned long long saved_rx_bytes;
+   unsigned long long saved_rx_msgs;
 
/* Transmit */
struct kcm_sock *tx_kcm;
struct list_head psock_avail_list;
+   unsigned long long saved_tx_bytes;
+   unsigned long long saved_tx_msgs;
 };
 
 /* Per net MUX list */
 struct kcm_net {
struct mutex mutex;
+   struct kcm_psock_stats aggregate_psock_stats;
+   struct kcm_mux_stats aggregate_mux_stats;
struct list_head mux_list;
int count;
 };
@@ -104,6 +153,9 @@ struct kcm_mux {
struct list_head psocks;/* List of all psocks on MUX */
int psocks_cnt; /* Total attached sockets */
 
+   struct kcm_mux_stats stats;
+   struct kcm_psock_stats aggregate_psock_stats;
+
/* Receive */
spinlock_t rx_lock cacheline_aligned_in_smp;
struct list_head kcm_rx_waiters; /* KCMs waiting for receiving */
@@ -116,6 +168,56 @@ struct kcm_mux {
struct list_head kcm_tx_waiters; /* KCMs waiting for a TX psock */
 };
 
+#ifdef CONFIG_PROC_FS
+int kcm_proc_init(void);
+void kcm_proc_exit(void);
+#else
+static int kcm_proc_init(void) { return 0; }
+static void kcm_proc_exit(void) { }
+#endif
+
+
+static inline void aggregate_psock_stats(struct kcm_psock_stats *stats,
+struct kcm_psock_stats *agg_stats)
+{
+   /* Save psock statistics in the mux when psock is being unattached. */
+
+#define SAVE_PSOCK_STATS(_stat) (agg_stats->_stat += stats->_stat)
+
+   SAVE_PSOCK_STATS(rx_msgs);
+   SAVE_PSOCK_STATS(rx_bytes);
+   SAVE_PSOCK_STATS(rx_aborts);
+   SAVE_PSOCK_STATS(rx_mem_fail);
+   SAVE_PSOCK_STATS(rx_need_more_hdr);
+   SAVE_PSOCK_STATS(rx_bad_hdr_len);
+   SAVE_PSOCK_STATS(tx_msgs);
+   SAVE_PSOCK_STATS(tx_bytes);
+   SAVE_PSOCK_STATS(reserved);
+   SAVE_PSOCK_STATS(unreserved);
+   SAVE_PSOCK_STATS(tx_aborts);
+
+#undef SAVE_PSOCK_STATS
+}
+
+static inline void aggregate_mux_stats(struct kcm_mux_stats *stats,
+  struct kcm_mux_stats *agg_stats)
+{
+   /* Save psock statistics in the mux when psock is being unattached. */
+
+#define SAVE_MUX_STATS(_stat) (agg_stats->_stat += stats->_stat)
+
+   

[PATCH net-next 6/6] kcm: Add description in Documentation

2015-11-20 Thread Tom Herbert
Add kcm.txt to desribe KCM and interfaces.

Signed-off-by: Tom Herbert 
---
 Documentation/networking/kcm.txt | 273 +++
 1 file changed, 273 insertions(+)
 create mode 100644 Documentation/networking/kcm.txt

diff --git a/Documentation/networking/kcm.txt b/Documentation/networking/kcm.txt
new file mode 100644
index 000..5432090
--- /dev/null
+++ b/Documentation/networking/kcm.txt
@@ -0,0 +1,273 @@
+Kernel Connection Mulitplexor
+-
+
+Kernel Connection Multiplexor (KCM) is a mechanism that provides a message 
based
+interface over TCP for generic application protocols. With KCM an application
+can efficiently send and receive application protocol messages over TCP using
+datagram sockets.
+
+KCM implements an NxM multiplexor in the kernel as diagrammed below:
+
+++   ++   ++   ++
+| KCM socket |   | KCM socket |   | KCM socket |   | KCM socket |
+++   ++   ++   ++
+  | |   ||
+  +---+ |   | +--+
+  | |   | |
+   +--+
+   |   Multiplexor|
+   +--+
+ |   |   |   |  |
+   +-+   |   |   |  +
+   | |   |   |  |
++--+  +--+  +--+  +--+ +--+
+|  Psock   |  |  Psock   |  |  Psock   |  |  Psock   | |  Psock   |
++--+  +--+  +--+  +--+ +--+
+  |  |   || |
++--+  +--+  +--+  +--+ +--+
+| TCP sock |  | TCP sock |  | TCP sock |  | TCP sock | | TCP sock |
++--+  +--+  +--+  +--+ +--+
+
+KCM sockets
+---
+
+The KCM sockets provide the user interface to the muliplexor. All the KCM 
sockets
+bound to a multiplexor are considered to have equivalent function, and I/O
+operations in different sockets may be done in parallel without the need for
+synchronization between threads in userspace.
+
+Multiplexor
+---
+
+The multiplexor provides the message steering. In the transmit path, messages
+written on a KCM socket are sent atomically on an appropriate TCP socket.
+Similarly, in the receive path, messages are constructed on each TCP socket
+(Psock) and complete messages are steered to a KCM socket.
+
+TCP sockets & Psocks
+
+
+TCP sockets may be bound to a KCM multiplexor. A Psock structure is allocated
+for each bound TCP socket, this structure holds the state for constructing
+messages on receive as well as other connection specific information for KCM.
+
+Connected mode semantics
+
+
+Each multiplexor assumes that all attached TCP connections are to the same
+destination and can use the different connections for load balancing when
+transmitting. The normal send and recv calls (include sendmmsg and recvmmsg)
+can be used to send and receive messages from the KCM socket.
+
+Socket types
+
+
+KCM supports SOCK_DGRAM and SOCK_SEQPACKET socket types.
+
+Message delineation
+---
+
+Messages are sent over a TCP stream with some application protocol message
+format that typically includes a header which frames the messages. The length
+of a received message can be deduced from the application protocol header
+(often just a simple length field).
+
+A TCP stream must be parsed to determine message boundaries. Berkeley Packet
+Filter (BPF) is used for this. When attaching a TCP socket to a multiplexor a
+BPF program must be specified. The program is called at the start of receiving
+a new message and is given an skbuff that contains the bytes received so far.
+It parses the message header and returns the length of the message. Given this
+information, KCM will construct the message of the stated length and deliver it
+to a KCM socket.
+
+TCP socket management
+-
+
+When a TCP socket is attached to a KCM multiplexor data ready (POLLIN) and
+write space available (POLLOUT) events are handled by the multiplexor. If there
+is a state change (disconnection) or other error on a TCP socket, an error is
+posted on the TCP socket so that a POLLERR event happens and KCM discontinues
+using the socket. When the application gets the error notification for a
+TCP socket, it should unattach the socket from KCM and then handle the error
+condition (the typical response is to close the socket and create a new
+connection if necessary).
+
+User interface
+==
+
+Creating a multiplexor
+--
+
+A new multiplexor and initial KCM socket is created by a socket call:
+
+  

[PATCH net-next 2/6] net: Make sock_alloc exportable

2015-11-20 Thread Tom Herbert
Export it for cases where we want to create sockets by hand.

Signed-off-by: Tom Herbert 
---
 include/linux/net.h | 1 +
 net/socket.c| 3 ++-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/linux/net.h b/include/linux/net.h
index 70ac5e2..f9e3d3a 100644
--- a/include/linux/net.h
+++ b/include/linux/net.h
@@ -210,6 +210,7 @@ int __sock_create(struct net *net, int family, int type, 
int proto,
 int sock_create(int family, int type, int proto, struct socket **res);
 int sock_create_kern(struct net *net, int family, int type, int proto, struct 
socket **res);
 int sock_create_lite(int family, int type, int proto, struct socket **res);
+struct socket *sock_alloc(void);
 void sock_release(struct socket *sock);
 int sock_sendmsg(struct socket *sock, struct msghdr *msg);
 int sock_recvmsg(struct socket *sock, struct msghdr *msg, size_t size,
diff --git a/net/socket.c b/net/socket.c
index dd2c247..21373f8 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -532,7 +532,7 @@ static const struct inode_operations sockfs_inode_ops = {
  * NULL is returned.
  */
 
-static struct socket *sock_alloc(void)
+struct socket *sock_alloc(void)
 {
struct inode *inode;
struct socket *sock;
@@ -553,6 +553,7 @@ static struct socket *sock_alloc(void)
this_cpu_add(sockets_in_use, 1);
return sock;
 }
+EXPORT_SYMBOL(sock_alloc);
 
 /**
  * sock_release-   close a socket
-- 
2.4.6

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 3/6] net: Add MSG_BATCH flag

2015-11-20 Thread Tom Herbert
Add a new msg flag called MSG_BATCH. This flag is used in sendmsg to
indicate that more messages will follow (i.e. a batch of messages is
being sent). This is similar to MSG_MORE except that the following
messages are not merged into one packet, they are sent individually.

MSG_BATCH is a performance optimization in cases where a socket
implementation can benefit by transmitting packets in a batch.

This patch also updates sendmmsg so that each contained message except
for the last one is marked as MSG_BATCH.

Signed-off-by: Tom Herbert 
---
 include/linux/socket.h |  1 +
 net/socket.c   | 17 +
 2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/include/linux/socket.h b/include/linux/socket.h
index 5bf59c8..d834af2 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -274,6 +274,7 @@ struct ucred {
 #define MSG_MORE   0x8000  /* Sender will send more */
 #define MSG_WAITFORONE 0x1 /* recvmmsg(): block until 1+ packets avail */
 #define MSG_SENDPAGE_NOTLAST 0x2 /* sendpage() internal : not the last 
page */
+#define MSG_BATCH  0x4 /* sendmmsg(): more messages coming */
 #define MSG_EOF MSG_FIN
 
 #define MSG_FASTOPEN   0x2000  /* Send data in TCP SYN */
diff --git a/net/socket.c b/net/socket.c
index 21373f8..ef64b72 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -1880,7 +1880,7 @@ static int copy_msghdr_from_user(struct msghdr *kmsg,
 
 static int ___sys_sendmsg(struct socket *sock, struct user_msghdr __user *msg,
 struct msghdr *msg_sys, unsigned int flags,
-struct used_address *used_address)
+struct used_address *used_address, bool doing_mmsg)
 {
struct compat_msghdr __user *msg_compat =
(struct compat_msghdr __user *)msg;
@@ -1906,6 +1906,8 @@ static int ___sys_sendmsg(struct socket *sock, struct 
user_msghdr __user *msg,
 
if (msg_sys->msg_controllen > INT_MAX)
goto out_freeiov;
+   if (doing_mmsg)
+   flags |= (msg_sys->msg_flags & MSG_EOR);
ctl_len = msg_sys->msg_controllen;
if ((MSG_CMSG_COMPAT & flags) && ctl_len) {
err =
@@ -1984,7 +1986,7 @@ long __sys_sendmsg(int fd, struct user_msghdr __user 
*msg, unsigned flags)
if (!sock)
goto out;
 
-   err = ___sys_sendmsg(sock, msg, _sys, flags, NULL);
+   err = ___sys_sendmsg(sock, msg, _sys, flags, NULL, false);
 
fput_light(sock->file, fput_needed);
 out:
@@ -2011,6 +2013,7 @@ int __sys_sendmmsg(int fd, struct mmsghdr __user *mmsg, 
unsigned int vlen,
struct compat_mmsghdr __user *compat_entry;
struct msghdr msg_sys;
struct used_address used_address;
+   unsigned int oflags = flags;
 
if (vlen > UIO_MAXIOV)
vlen = UIO_MAXIOV;
@@ -2025,11 +2028,16 @@ int __sys_sendmmsg(int fd, struct mmsghdr __user *mmsg, 
unsigned int vlen,
entry = mmsg;
compat_entry = (struct compat_mmsghdr __user *)mmsg;
err = 0;
+   flags |= MSG_BATCH;
 
while (datagrams < vlen) {
+   if (datagrams == vlen - 1)
+   flags = oflags;
+
if (MSG_CMSG_COMPAT & flags) {
err = ___sys_sendmsg(sock, (struct user_msghdr __user 
*)compat_entry,
-_sys, flags, _address);
+_sys, flags, _address,
+true);
if (err < 0)
break;
err = __put_user(err, _entry->msg_len);
@@ -2037,7 +2045,8 @@ int __sys_sendmmsg(int fd, struct mmsghdr __user *mmsg, 
unsigned int vlen,
} else {
err = ___sys_sendmsg(sock,
 (struct user_msghdr __user *)entry,
-_sys, flags, _address);
+_sys, flags, _address,
+true);
if (err < 0)
break;
err = put_user(err, >msg_len);
-- 
2.4.6

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 06/27] brcm80211: move under broadcom vendor directory

2015-11-20 Thread Arend van Spriel

On 11/19/2015 08:48 AM, Kalle Valo wrote:

Hauke Mehrtens  writes:


On 11/18/2015 03:45 PM, Kalle Valo wrote:

Part of reorganising wireless drivers directory and Kconfig. Note that I had to
edit Makefiles from subdirectories to use the new location.

Signed-off-by: Kalle Valo 
---


I would prefer to remove the brcm80211 directory in this process and create:
drivers/net/wireless/broadcom/brcmfmac
drivers/net/wireless/broadcom/brcmsmac
drivers/net/wireless/broadcom/brcmutil
drivers/net/wireless/broadcom/include

This way we have one directory less.


I think this could be done separately. This patchset is big enough
already, I would not like to make it anymore complicated.

And I actually like the brcm80211 directory, I would not mind keeping it
still.


I prefer to keep it as brcmsmac and brcmfmac rely on brcmutil module so 
I want to keep them together under brcm80211.


So does this patch go in before or after the patches I submitted before 
the merge window. I hope after :-p


Regards,
Arend

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 06/27] brcm80211: move under broadcom vendor directory

2015-11-20 Thread Arend van Spriel

On 11/19/2015 08:54 AM, Kalle Valo wrote:

Florian Fainelli  writes:


On 18/11/15 11:19, Hauke Mehrtens wrote:

On 11/18/2015 03:45 PM, Kalle Valo wrote:

Part of reorganising wireless drivers directory and Kconfig. Note that I had to
edit Makefiles from subdirectories to use the new location.

Signed-off-by: Kalle Valo 
---


I would prefer to remove the brcm80211 directory in this process and create:
drivers/net/wireless/broadcom/brcmfmac
drivers/net/wireless/broadcom/brcmsmac
drivers/net/wireless/broadcom/brcmutil
drivers/net/wireless/broadcom/include

This way we have one directory less.


Would not that make keeping track of the previous and future history
harder for people contributing to these drivers? I could imagine that
for Arend and other Broadcom engineers, dealing with a simple level move
would be manageable, but having to account for a different directory
hierarchy could be a pain.

What is the impact on compat-wireless after/before these changes by the way?


It's called backports nowadays :)

But I understood that as long as we have a separate kconfig option for
the vendor directories (CONFIG_WLAN_VENDOR_*) it should be ok. For 4.3
we didn't have that for realtek directory and that caused pain for
backports.


That is my understanding as well.

Regards,
Arend

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] unix: avoid use-after-free in ep_remove_wait_queue

2015-11-20 Thread Rainer Weikusat
Rainer Weikusat  writes:
An AF_UNIX datagram socket being the client in an n:1 association with
some server socket is only allowed to send messages to the server if the
receive queue of this socket contains at most sk_max_ack_backlog
datagrams. This implies that prospective writers might be forced to go
to sleep despite none of the message presently enqueued on the server
receive queue were sent by them. In order to ensure that these will be
woken up once space becomes again available, the present unix_dgram_poll
routine does a second sock_poll_wait call with the peer_wait wait queue
of the server socket as queue argument (unix_dgram_recvmsg does a wake
up on this queue after a datagram was received). This is inherently
problematic because the server socket is only guaranteed to remain alive
for as long as the client still holds a reference to it. In case the
connection is dissolved via connect or by the dead peer detection logic
in unix_dgram_sendmsg, the server socket may be freed despite "the
polling mechanism" (in particular, epoll) still has a pointer to the
corresponding peer_wait queue. There's no way to forcibly deregister a
wait queue with epoll.

Based on an idea by Jason Baron, the patch below changes the code such
that a wait_queue_t belonging to the client socket is enqueued on the
peer_wait queue of the server whenever the peer receive queue full
condition is detected by either a sendmsg or a poll. A wake up on the
peer queue is then relayed to the ordinary wait queue of the client
socket via wake function. The connection to the peer wait queue is again
dissolved if either a wake up is about to be relayed or the client
socket reconnects or a dead peer is detected or the client socket is
itself closed. This enables removing the second sock_poll_wait from
unix_dgram_poll, thus avoiding the use-after-free, while still ensuring
that no blocked writer sleeps forever.

Signed-off-by: Rainer Weikusat 
Fixes: ec0d215f9420 ("af_unix: fix 'poll for write'/connected DGRAM sockets")
---

- uninvert the lock/ check code in _dgram_sendmsg

- introduce a unix_dgram_peer_wake_disconnect_wakuep helper
  function as there were two calls with a wakeup immediately
  following and two without

diff --git a/include/net/af_unix.h b/include/net/af_unix.h
index b36d837..2a91a05 100644
--- a/include/net/af_unix.h
+++ b/include/net/af_unix.h
@@ -62,6 +62,7 @@ struct unix_sock {
 #define UNIX_GC_CANDIDATE  0
 #define UNIX_GC_MAYBE_CYCLE1
struct socket_wqpeer_wq;
+   wait_queue_tpeer_wake;
 };
 
 static inline struct unix_sock *unix_sk(const struct sock *sk)
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 94f6582..3d93b0d 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -326,6 +326,118 @@ found:
return s;
 }
 
+/* Support code for asymmetrically connected dgram sockets
+ *
+ * If a datagram socket is connected to a socket not itself connected
+ * to the first socket (eg, /dev/log), clients may only enqueue more
+ * messages if the present receive queue of the server socket is not
+ * "too large". This means there's a second writeability condition
+ * poll and sendmsg need to test. The dgram recv code will do a wake
+ * up on the peer_wait wait queue of a socket upon reception of a
+ * datagram which needs to be propagated to sleeping would-be writers
+ * since these might not have sent anything so far. This can't be
+ * accomplished via poll_wait because the lifetime of the server
+ * socket might be less than that of its clients if these break their
+ * association with it or if the server socket is closed while clients
+ * are still connected to it and there's no way to inform "a polling
+ * implementation" that it should let go of a certain wait queue
+ *
+ * In order to propagate a wake up, a wait_queue_t of the client
+ * socket is enqueued on the peer_wait queue of the server socket
+ * whose wake function does a wake_up on the ordinary client socket
+ * wait queue. This connection is established whenever a write (or
+ * poll for write) hit the flow control condition and broken when the
+ * association to the server socket is dissolved or after a wake up
+ * was relayed.
+ */
+
+static int unix_dgram_peer_wake_relay(wait_queue_t *q, unsigned mode, int 
flags,
+ void *key)
+{
+   struct unix_sock *u;
+   wait_queue_head_t *u_sleep;
+
+   u = container_of(q, struct unix_sock, peer_wake);
+
+   __remove_wait_queue(_sk(u->peer_wake.private)->peer_wait,
+   q);
+   u->peer_wake.private = NULL;
+
+   /* relaying can only happen while the wq still exists */
+   u_sleep = sk_sleep(>sk);
+   if (u_sleep)
+   wake_up_interruptible_poll(u_sleep, key);
+
+   return 0;
+}
+
+static int unix_dgram_peer_wake_connect(struct sock *sk, struct sock *other)

[PATCH net-next 0/6] kcm: Kernel Connection Multiplexor (KCM)

2015-11-20 Thread Tom Herbert
Kernel Connection Multiplexor (KCM) is a facility that provides a
message based interface over TCP for generic application protocols.
The motivation for this is based on the observation that although
TCP is byte stream transport protocol with no concept of message
boundaries, a common use case is to implement a framed application
layer protocol running over TCP. To date, most TCP stacks offer
byte stream API for applications, which places the burden of message
delineation, message I/O operation atomicity, and load balancing
in the application. With KCM an application can efficiently send
and receive application protocol messages over TCP using a
datagram interface.

In order to delineate message in a TCP stream for receive in KCM, the
kernel implements a message parser. For this we chose to employ BPF
which is applied to the TCP stream. BPF code parses application layer
messages and returns a message length. Nearly all binary application
protocols are parsable in this manner, so KCM should be applicable
across a wide range of applications. Other than message length
determination in receive, KCM does not require any other application
specific awareness. KCM does not implement any other application
protocol semantics-- these are are provided in userspace or could be
implemented in a kernel module layered above KCM.

KCM implements an NxM multiplexor in the kernel as diagrammed below:

++   ++   ++   ++
| KCM socket |   | KCM socket |   | KCM socket |   | KCM socket |
++   ++   ++   ++
  | |   ||
  +---+ |   | +--+
  | |   | |
   +--+
   |   Multiplexor|
   +--+
 |   |   |   |  |
   +-+   |   |   |  +
   | |   |   |  |
+--+  +--+  +--+  +--+ +--+
|  Psock   |  |  Psock   |  |  Psock   |  |  Psock   | |  Psock   |
+--+  +--+  +--+  +--+ +--+
  |  |   || |
+--+  +--+  +--+  +--+ +--+
| TCP sock |  | TCP sock |  | TCP sock |  | TCP sock | | TCP sock |
+--+  +--+  +--+  +--+ +--+

The KCM sockets provide the datagram interface to applications,
Psocks are the state for each attached TCP connection (i.e. where
message delineation is performed on receive).

A description of the APIs and design can be found in the included
Documentation/networking/kcm.txt.

In this patch set:

  - Add MSG_BATCH flag. This is used in sendmsg msg_hdr flags to
indicate that more messages will be sent on the socket. The stack
may batch messages up if it is beneficial for transmission.
  - In sendmmsg, set MSG_BATCH in all sub messages except for the last
one.
  - In order to allow sendmmsg to contain multiple messages with
SOCK_SEQPAKET we allow each msg_hdr in the sendmmsg to set MSG_EOR.
  - Add KCM module
- This supports SOCK_DGRAM and SOCK_SEQPACKET.
  - KCM documentation

Testing:

Dave Watson has integrated KCM into Thrift and we intend to put these
changes into open source. Example of this is in:

https://github.com/djwatson/fbthrift/commit/
dd7e0f9cf4e80912fdb90f6cd394db24e61a14cc

Some initial KCM Thrift benchmark numbers (comment from Dave)

Thrift by default ties a single connection to a single thread.  KCM is
instead able to load balance multiple connections across multiple epoll
loops easily.

A test sending ~5k bytes of data to a kcm thrift server, dropping the
bytes on recv:

QPS Latency / std dev Latency
  without KCM
70336 209/123
  with KCM
70353 191/124

A test sending a small request, then doing work in the epoll thread,
before serving more requests:

QPS Latency / std dev Latency
without KCM
14282 559/602
with KCM
23192 344/234

At the high end, there's definitely some additional kernel overhead:

Cranking the pipelining way up, with lots of small requests

QPS Latency / std dev Latency
without KCM
   1863429 127/119
with KCM
   1337713 192/241

---

So for a "realistic" workload, KCM performs pretty well (second case).
Under extreme conditions of highest tps we still have some work to do.
In its nature a multiplexor will spread work between CPUs which is
logically good for load balancing but coan conflict with the goal
promoting affinity. Batching messages on both send and receive are
the means to recoup performance.

Future support:

 - Integration with TLS (TLS-in-kernel is a separate initiative).
 - Page operations/splice support
 - Unconnected KCM sockets. Will be able to attach sockets to 

[PATCH net-next 1/6] rcu: Add list_next_or_null_rcu

2015-11-20 Thread Tom Herbert
This is a convenience function that returns the next entry in an RCU
list or NULL if at the end of the list.

Signed-off-by: Tom Herbert 
---
 include/linux/rculist.h | 21 +
 1 file changed, 21 insertions(+)

diff --git a/include/linux/rculist.h b/include/linux/rculist.h
index 5ed5409..a9376fd 100644
--- a/include/linux/rculist.h
+++ b/include/linux/rculist.h
@@ -290,6 +290,27 @@ static inline void list_splice_init_rcu(struct list_head 
*list,
 })
 
 /**
+ * list_next_or_null_rcu - get the first element from a list
+ * @head:  the head for the list.
+ * @ptr:the list head to take the next element from.
+ * @type:   the type of the struct this is embedded in.
+ * @member: the name of the list_head within the struct.
+ *
+ * Note that if the ptr is at the end of the list, NULL is returned.
+ *
+ * This primitive may safely run concurrently with the _rcu list-mutation
+ * primitives such as list_add_rcu() as long as it's guarded by 
rcu_read_lock().
+ */
+#define list_next_or_null_rcu(head, ptr, type, member) \
+({ \
+   struct list_head *__head = (head); \
+   struct list_head *__ptr = (ptr); \
+   struct list_head *__next = READ_ONCE(__ptr->next); \
+   likely(__next != __head) ? list_entry_rcu(__next, type, \
+ member) : NULL; \
+})
+
+/**
  * list_for_each_entry_rcu -   iterate over rcu list of given type
  * @pos:   the type * to use as a loop cursor.
  * @head:  the head for your list.
-- 
2.4.6

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/7] kernfs: implement kernfs_walk_and_get()

2015-11-20 Thread Greg Kroah-Hartman
On Fri, Nov 20, 2015 at 04:12:54PM -0500, Tejun Heo wrote:
> On Thu, Nov 19, 2015 at 08:41:04PM -0800, Greg Kroah-Hartman wrote:
> > On Thu, Nov 19, 2015 at 01:52:46PM -0500, Tejun Heo wrote:
> > > Implement kernfs_walk_and_get() which is similar to
> > > kernfs_find_and_get() but can walk a path instead of just a name.
> > > 
> > > v2: Use strlcpy() instead of strlen() + memcpy() as suggested by
> > > David.
> > > 
> > > Signed-off-by: Tejun Heo 
> > > Cc: Greg Kroah-Hartman 
> > > Cc: David Miller 
> > > ---
> > >  fs/kernfs/dir.c| 46 
> > > ++
> > >  include/linux/kernfs.h | 12 
> > >  2 files changed, 58 insertions(+)
> > 
> > Acked-by: Greg Kroah-Hartman 
> 
> Greg, would it be okay to route this one through either cgroup or net
> tree?

Either is fine with me, whatever works best for you.

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 00/15] net: The beginning of the end for NETIF_F_IP_CSUM and NETIF_F_IPV6_CSUM

2015-11-20 Thread David Miller
From: Tom Herbert 
Date: Thu, 19 Nov 2015 11:55:46 -0800

> Goals of this patch set:
> 
> We propose that drivers advertise NETIF_F_HW_CSUM instead of protocol
> specific values of NETIF_F_IP_CSUM and NETIF_F_IPV6_CSUM.  If the
> driver's device is constrained (for instance it can only offlaod simple
> IPv4 and IPv6 packets) then these constraints can be checked in the
> transmit path and skb_checksum_help would be called for packets that the
> driver is unable to offload. In order to facilitate this, we add some
> helper functions that takes a specification argument indicating the
> type of packets a device is able to offload. If a packet does not match
> the specification, the helper function calls skb_checksum_help.

I very much like the direction this is taking things.  And I do sincerely
hope that this does in fact actually encourage HW vendors to drop all of
the protocol specific offloading, and just support 2's complement sums.

They can turn that _trivially_ into whatever the Windows et al. driver
interfaces want in their respective drivers.

There is absolutely no reason to implement protocol specific checksum
offloads in silicon in this day and age.

Absolutely none.

So driver folks tell your hardware buddies to just stop doing it now
and get with the program.  Even your marketing department shouldn't
care, they can list support for every protocol on the planet in their
specs and packaging if they want, and it might even look impressive...

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHSET v2] netfilter, cgroup: implement xt_cgroup2 match

2015-11-20 Thread Tejun Heo
Hello, David, Pablo.

On Fri, Nov 20, 2015 at 08:56:25PM +0100, Pablo Neira Ayuso wrote:
> > Pablo, are you ok with me merging this into net-next directly or
> > would you rather I take patches 1-6 into net-next and then you can
> > merge and then add patch #7 on top?
> 
> I'd suggest you get 1-6, then I'll pull this info my tree. Thanks David!

Hmm 1-3 will be needed to address similar issues in a different
controller, so putting them in a separate branch would work best.  I
created a branch which contains the 1-3 on top of v4.4-rc1.

  git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git 
for-4.5-ancestor-test

If creating a different branch from net side is better, please let me
know.

> Regarding #7, I have a couple two concerns:
> 
> 1) cgroup currently doesn't work the way users expect, ie. to perform any
>reasonable firewalling. Since this relies on early demux, only a
>limited number of sockets get access to the cgroup info.

Right, it doesn't work well on INPUT side, so the big warning in the
man page.

> 2) We have traditionally rejected match2 and target2 extensions. I
>guess you can accomodate the new cgroup code through the revision
>iptables infrastructure, so we still use the cgroup match.

I thought it would be confusing because the two are completely
separate.  Hmmm... okay, I'll merge it into xt_cgroup.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/7] kernfs: implement kernfs_walk_and_get()

2015-11-20 Thread Tejun Heo
On Thu, Nov 19, 2015 at 08:41:04PM -0800, Greg Kroah-Hartman wrote:
> On Thu, Nov 19, 2015 at 01:52:46PM -0500, Tejun Heo wrote:
> > Implement kernfs_walk_and_get() which is similar to
> > kernfs_find_and_get() but can walk a path instead of just a name.
> > 
> > v2: Use strlcpy() instead of strlen() + memcpy() as suggested by
> > David.
> > 
> > Signed-off-by: Tejun Heo 
> > Cc: Greg Kroah-Hartman 
> > Cc: David Miller 
> > ---
> >  fs/kernfs/dir.c| 46 ++
> >  include/linux/kernfs.h | 12 
> >  2 files changed, 58 insertions(+)
> 
> Acked-by: Greg Kroah-Hartman 

Greg, would it be okay to route this one through either cgroup or net
tree?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] net: avoid NULL deref in napi_get_frags()

2015-11-20 Thread David Miller
From: Eric Dumazet 
Date: Thu, 19 Nov 2015 12:11:23 -0800

> From: Eric Dumazet 
> 
> napi_alloc_skb() can return NULL.
> We should not crash should this happen.
> 
> Fixes: 93f93a440415 ("net: move skb_mark_napi_id() into core networking 
> stack")
> Signed-off-by: Eric Dumazet 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 04/14] net: tcp_memcontrol: remove bogus hierarchy pressure propagation

2015-11-20 Thread Vladimir Davydov
On Thu, Nov 12, 2015 at 06:41:23PM -0500, Johannes Weiner wrote:
> When a cgroup currently breaches its socket memory limit, it enters
> memory pressure mode for itself and its *ancestors*. This throttles
> transmission in unrelated sibling and cousin subtrees that have
> nothing to do with the breached limit.
> 
> On the contrary, breaching a limit should make that group and its
> *children* enter memory pressure mode. But this happens already,
> albeit lazily: if an ancestor limit is breached, siblings will enter
> memory pressure on their own once the next packet arrives for them.

Hmm, we still call sk_prot->enter_memory_pressure, which might hurt a
workload in the root cgroup AFAICS. Strange. You fix it in patch 8
though.

> 
> So no additional hierarchy code is needed. Remove the bogus stuff.
> 
> Signed-off-by: Johannes Weiner 

Reviewed-by: Vladimir Davydov 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [B.A.T.M.A.N.] [PATCH 3/3] batman-adv: Less function calls in batadv_is_ap_isolated() after error detection

2015-11-20 Thread Antonio Quartulli
On 04/11/15 04:56, SF Markus Elfring wrote:
> From: Markus Elfring 
> Date: Tue, 3 Nov 2015 21:10:51 +0100
> 
> The variables "tt_local_entry" and "tt_global_entry" were eventually checked
> again despite of a corresponding null pointer test before.
> Let us avoid this double check by reordering a function call sequence
> and the better selection of jump targets.
> 
> Signed-off-by: Markus Elfring 
> ---
>  net/batman-adv/translation-table.c | 21 +
>  1 file changed, 9 insertions(+), 12 deletions(-)
> 
> diff --git a/net/batman-adv/translation-table.c 
> b/net/batman-adv/translation-table.c
> index 965a004..3ac32d9 100644
> --- a/net/batman-adv/translation-table.c
> +++ b/net/batman-adv/translation-table.c
> @@ -3323,27 +3323,24 @@ bool batadv_is_ap_isolated(struct batadv_priv 
> *bat_priv, u8 *src, u8 *dst,
>   return false;
>  
>   if (!atomic_read(>ap_isolation))
> - goto out;
> + goto vlan_free;
>  
>   tt_local_entry = batadv_tt_local_hash_find(bat_priv, dst, vid);
>   if (!tt_local_entry)
> - goto out;
> + goto vlan_free;
>  
>   tt_global_entry = batadv_tt_global_hash_find(bat_priv, src, vid);
>   if (!tt_global_entry)
> - goto out;
> + goto local_entry_free;
>  
> - if (!_batadv_is_ap_isolated(tt_local_entry, tt_global_entry))
> - goto out;
> -
> - ret = true;
> + if (_batadv_is_ap_isolated(tt_local_entry, tt_global_entry))
> + ret = true;
>  
> -out:
> + batadv_tt_global_entry_free_ref(tt_global_entry);
> +local_entry_free:
> + batadv_tt_local_entry_free_ref(tt_local_entry);
> +vlan_free:
>   batadv_softif_vlan_free_ref(vlan);
> - if (tt_global_entry)
> - batadv_tt_global_entry_free_ref(tt_global_entry);
> - if (tt_local_entry)
> - batadv_tt_local_entry_free_ref(tt_local_entry);
>   return ret;

Markus,
if you really want to make this codestyle change, I'd suggest you to go
through the whole batman-adv code and apply the same change where
needed. It does not make sense to change the codestyle in one spot only.

On top of that, by going through the batman-adv code you might agree
that the current style is actually not a bad idea.


Cheers,

-- 
Antonio Quartulli



signature.asc
Description: OpenPGP digital signature


Re: [PATCH 4/7] netprio_cgroup: limit the maximum css->id to USHRT_MAX

2015-11-20 Thread Daniel Wagner
On 11/19/2015 07:52 PM, Tejun Heo wrote:
> netprio builds per-netdev contiguous priomap array which is indexed by
> css->id.  The array is allocated using kzalloc() effectively limiting
> the maximum ID supported to some thousand range.  This patch caps the
> maximum supported css->id to USHRT_MAX which should be way above what
> is actually useable.
> 
> This allows reducing sock->sk_cgrp_prioidx to u16 from u32.  The freed
> up part will be used to overload the cgroup related fields.
> sock->sk_cgrp_prioidx's position is swapped with sk_mark so that the
> two cgroup related fields are adjacent.
> 
> Signed-off-by: Tejun Heo 
> Cc: Daniel Borkmann 
> Cc: Daniel Wagner 

Acked-by: Daniel Wagner 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] packet: Allow packets with only a header (but no payload)

2015-11-20 Thread Martin Blumenstingl
9c70776 added validation for the packet size in packet_snd. This change
enforced that every packet needs a header with at least hard_header_len
bytes  and at least one byte payload.

This fixes PPPoE connections which do not have a "Service" or
"Host-Uniq" configured (which is violating the spec, but is still
widely used in real-world setups). Those are currently failing with the
following message: "pppd: packet size is too short (24 <= 24)"

Signed-off-by: Martin Blumenstingl 
---
v2: Simply change the existing logic in ll_header_truncated instead of
splitting it and having multiple checks.

 include/linux/netdevice.h | 3 ++-
 net/packet/af_packet.c| 4 ++--
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 67bfac1..1f42cb7 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1398,7 +1398,8 @@ enum netdev_priv_flags {
  * @dma:   DMA channel
  * @mtu:   Interface MTU value
  * @type:  Interface hardware type
- * @hard_header_len: Hardware header length
+ * @hard_header_len: Hardware header length, which means that this is the
+ * minimum size of a packet.
  *
  * @needed_headroom: Extra headroom the hardware may need, but not in all
  *   cases can this be guaranteed
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 1cf928f..992396a 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -2329,8 +2329,8 @@ static void tpacket_destruct_skb(struct sk_buff *skb)
 static bool ll_header_truncated(const struct net_device *dev, int len)
 {
/* net device doesn't like empty head */
-   if (unlikely(len <= dev->hard_header_len)) {
-   net_warn_ratelimited("%s: packet size is too short (%d <= 
%d)\n",
+   if (unlikely(len < dev->hard_header_len)) {
+   net_warn_ratelimited("%s: packet size is too short (%d < %d)\n",
 current->comm, len, dev->hard_header_len);
return true;
}
-- 
2.6.2

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: r8169 regression: UDP packets dropped intermittantly

2015-11-20 Thread Francois Romieu
Jonathan Woithe  :
[...]
> This indicates to me that in the fault condition, packets coming into the PC
> are being held by the lower layers (perhaps even the hardware) for a very
> long time, and in fact only seem to be released once a packet is queued for
> transmission.
> 
> I then ran a test using the capture script you suggested.  For this test I
> arranged to only send the C-A packet sequence repeatedly until an error
> condition was detected.  Approximate times of the sequence's start time and
> the outcome of the sequence were:
> 
>   1447985720: C response received, A response received
>   1447985722: C response received, A response was not seen(*)
>   1447985739: C response received, A response received
> 
> (*) Based on the earlier test, I expect it was delivered to the OS layer at
> the start of the next test, when the "C" packet was sent.

The register dumps all look the same. Nothing to see here.

The hardware stats are not exactly clear. Is the initial Tx - Rx packet
difference (6) at the hardware stats level expected ?

! late Tx burp
  272 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  + + + ++ *** +
  270 +Tx ***+*..+-+
  |Rx ### : ::*: : |
  | : : ** : : |
  268 +-+..*.+-+
  | : * :: : : |
  266 +-+*...+-+
  | : *** : :: : : |
  264 +-+*.###...+-+
  | :*: ::#: : |
  *** : ::#: : |
  262 +-+..###...+-+
  | : # :: : : |
  260 +-+#...+-+
  | : ### : :: : : |
  | :#: :: : : |
  258 +-+#...+-+
  ### + ++ + + +
  256 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
15:15 15:20 15:25 15:3015:35 15:40 15:45 15:50


 Tx  Rx
1447985715.230668398 263 257
1447985716.234480645 263 257
1447985717.238563603 263 257
1447985718.242506424 263 257
1447985719.246230757 263 257
1447985720.249900368 263 257
1447985721.253727549 265 259
1447985722.257578916 265 259
1447985723.261457073 267 261
1447985724.264998842 267 261
1447985725.268928099 267 261
1447985726.272550987 269 262
1447985727.277086547 269 262
1447985728.280609761 269 262
1447985729.284707715 269 262
1447985730.288299143 269 262
1447985731.292144772 269 262
1447985732.295803667 269 262
1447985733.299568961 269 262
1447985734.303204633 269 262
1447985735.307015920 269 262
1447985736.310640604 269 262
1447985737.314153757 269 262
1447985738.318170685 269 262
1447985739.322196876 269 262
1447985740.325972075 271 264
1447985741.330118113 271 264
1447985742.334009017 271 264
1447985743.337646906 271 264
1447985744.341710301 271 264
1447985745.345781415 271 264
1447985746.349656916 271 264
1447985747.353606904 271 264

-- 
Ueimor
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   >