[PATCH iproute2] tc: fix second printing of requeues

2018-01-27 Thread Jakub Kicinski
Non-JSON tc qdisc output used to print the "requeues" statistic
twice.  Commit 4fcec7f3665b ("tc: jsonify stats2") tried to preserve
this behaviour for both standard output and JSON, but used the wrong
statistic (q.qlen).  Also duplicating keys in JSON is not allowed,
so the second occurrence should be completely skipped with JSON.

Fixes: 4fcec7f3665b ("tc: jsonify stats2")
Signed-off-by: Jakub Kicinski 
---
 tc/tc_util.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tc/tc_util.c b/tc/tc_util.c
index 10e5aa91168a..aceb0d944933 100644
--- a/tc/tc_util.c
+++ b/tc/tc_util.c
@@ -846,7 +846,7 @@ void print_tcstats2_attr(FILE *fp, struct rtattr *rta, char 
*prefix, struct rtat
print_string(PRINT_FP, NULL, "backlog %s",
 sprint_size(q.backlog, b1));
print_uint(PRINT_ANY, "qlen", " %up", q.qlen);
-   print_uint(PRINT_ANY, "requeues", " requeues %u", q.qlen);
+   print_uint(PRINT_FP, NULL, " requeues %u", q.requeues);
}
 
if (xstats)
-- 
2.15.1



Re: [PATCH net] ipv6: addrconf: break critical section in addrconf_verify_rtnl()

2018-01-27 Thread Ido Schimmel
On Fri, Jan 26, 2018 at 04:10:43PM -0800, Eric Dumazet wrote:
> From: Eric Dumazet 
> 
> Heiner reported a lockdep splat [1]
> 
> This is caused by attempting GFP_KERNEL allocation while RCU lock is
> held and BH blocked.
> 
> We believe that addrconf_verify_rtnl() could run for a long period,
> so instead of using GFP_ATOMIC here as Ido suggested, we should break
> the critical section and restart it after the allocation.

[...]

> Fixes: f3d9832e56c4 ("ipv6: addrconf: cleanup locking in ipv6_add_addr")
> Signed-off-by: Eric Dumazet 
> Reported-by: Heiner Kallweit 

Reviewed-by: Ido Schimmel 

Thanks!


GREETINGS FROM MR.MUSTAPHA ALI.

2018-01-27 Thread mustapha ali
My Dear Friend.

I am Mr. Mustapha  Ali  a banker in Bank of Africa Burkina Faso West
Africa, Please i want to transfer an abandoned sum of 13.5 millions
USD to your account.50% will be for you and 50% for me.

No risk involved. Respond back to me if you are interested along with
your personal information needed below for more details.

1. Full name:.
2. Current Address:.
3. Phone.
4. Occupation:.
5. Age:
6. Country:
7. Sex
8. Your Passport or ID card or Driving License

Thanks.

Mr. Mustapha  Ali


Re: [PATCH bpf-next v7 2/5] libbpf: add function to setup XDP

2018-01-27 Thread Eric Leblond
Hi,

On Sat, 2018-01-27 at 02:23 +0100, Daniel Borkmann wrote:
> On 01/25/2018 01:05 AM, Eric Leblond wrote:
> > Most of the code is taken from set_link_xdp_fd() in bpf_load.c and
> > slightly modified to be library compliant.
> > 
> > Signed-off-by: Eric Leblond 
> > Acked-by: Alexei Starovoitov 
> > ---
> >  tools/lib/bpf/bpf.c| 127
> > +
> >  tools/lib/bpf/libbpf.c |   2 +
> >  tools/lib/bpf/libbpf.h |   4 ++
> >  3 files changed, 133 insertions(+)
> > 
> > diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
> > index 5128677e4117..749a447ec9ed 100644
> > --- a/tools/lib/bpf/bpf.c
> > +++ b/tools/lib/bpf/bpf.c
> > @@ -25,6 +25,17 @@
> >  #include 
> >  #include 
> >  #include "bpf.h"
> > +#include "libbpf.h"
> > +#include "nlattr.h"
> > +#include 
> 
> Doesn't libbpf pull in already -I$(srctree)/tools/include/uapi? Seems
> the
> other headers don't need 'uapi/' path prefix.

Right, it works without the uapi.
> 
> > +#include 
> > +#include 
> > +
> > +#ifndef IFLA_XDP_MAX
> > +#define IFLA_XDP   43
> > +#define IFLA_XDP_FD1
> > +#define IFLA_XDP_FLAGS 3
> > +#endif
> 
> Hm, given we pull in tools/include/uapi/linux/netlink.h, shouldn't we
> also
> get include/uapi/linux/if_link.h dependency in here, so above ifdef
> workaround
> can be avoided?

This values are fixed so we risk nothing by keeping a definition if
ever it is not available in system headers. But it is fine with me if
you want me to add if_link.h to include/uapi/. 

BR,
-- 
Eric Leblond 
Blog: https://home.regit.org/


Re: [PATCH net] ipv6: change route cache aging logic

2018-01-27 Thread Paolo Abeni
On Fri, 2018-01-26 at 11:40 -0800, Wei Wang wrote:
> From: Wei Wang 
> 
> In current route cache aging logic, if a route has both RTF_EXPIRE and
> RTF_GATEWAY set, the route will only be removed if the neighbor cache
> has no RTN_ROUTE flag. Otherwise, even if the route has expired, it
> won't get deleted.
> Fix this logic to always check if the route has expired first and then
> do the gateway neighbor cache check if previous check decide to not
> remove the exception entry.
> 
> Fixes: 1859bac04fb6 ("ipv6: remove from fib tree aged out RTF_CACHE dst")
> Signed-off-by: Wei Wang 
> Signed-off-by: Eric Dumazet 

Thank you for the fix!

LGTM

Acked-by: Paolo Abeni 

Cheers,

/P


Re: [PATCH] net/mlx4_en: ensure rx_desc updating reaches HW before prod db updating

2018-01-27 Thread jianchao.wang
Hi Tariq

Thanks for your kindly response.
That's really appreciated.

On 01/25/2018 05:54 PM, Tariq Toukan wrote:
> 
> 
> On 25/01/2018 8:25 AM, jianchao.wang wrote:
>> Hi Eric
>>
>> Thanks for you kindly response and suggestion.
>> That's really appreciated.
>>
>> Jianchao
>>
>> On 01/25/2018 11:55 AM, Eric Dumazet wrote:
>>> On Thu, 2018-01-25 at 11:27 +0800, jianchao.wang wrote:
 Hi Tariq

 On 01/22/2018 10:12 AM, jianchao.wang wrote:
>>> On 19/01/2018 5:49 PM, Eric Dumazet wrote:
 On Fri, 2018-01-19 at 23:16 +0800, jianchao.wang wrote:
> Hi Tariq
>
> Very sad that the crash was reproduced again after applied the patch.
>>
>> Memory barriers vary for different Archs, can you please share more 
>> details regarding arch and repro steps?
> The hardware is HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 
> 12/27/2015
> The xen is installed. The crash occurred in DOM0.
> Regarding to the repro steps, it is a customer's test which does heavy 
> disk I/O over NFS storage without any guest.
>

 What is the finial suggestion on this ?
 If use wmb there, is the performance pulled down ?
> 
> I want to evaluate this effect.
> I agree with Eric, expected impact is restricted, especially after batching 
> the allocations.> 
>>>
>>> Since 
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_davem_net-2Dnext.git_commit_-3Fid-3Ddad42c3038a59d27fced28ee4ec1d4a891b28155=DwICaQ=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE=7WdAxUBeiTUTCy8v-7zXyr4qk7sx26ATvfo6QSTvZyQ=c0oI8duFkyFBILMQYDsqRApHQrOlLY_2uGiz_utcd7s=E4_XKmSI0B63qB0DLQ1EX_fj1bOP78ZdeYADBf33B-k=
>>>
>>> we batch allocations, so mlx4_en_refill_rx_buffers() is not called that 
>>> often.
>>>
>>> I doubt the additional wmb() will have serious impact there.
>>>
> 
> I will test the effect (it'll be beginning of next week).
> I'll update so we can make a more confident decision.
> 
I have also sent patches with wmb and batching allocations to customer and let 
them check whether the performance is impacted.
And update here asap when get feedback.

> Thanks,
> Tariq
> 
>>>
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__vger.kernel.org_majordomo-2Dinfo.html=DwICaQ=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE=7WdAxUBeiTUTCy8v-7zXyr4qk7sx26ATvfo6QSTvZyQ=f0myCdBQoRjaklxGau_S9ZtQKSQYALW9p2MIuTMAEYo=447fFu-xZoLvmxdaVhijK6cUk4Jcx7GtBCNddQT4GOQ=
>>
> 


Re: [PATCH v2 0/6] wl1251: Fix MAC address for Nokia N900

2018-01-27 Thread Pali Rohár
On Friday 05 January 2018 02:45:10 Luis R. Rodriguez wrote:
> On Tue, Jan 02, 2018 at 08:23:45PM +0100, Pali Rohár wrote:
> > On Friday 10 November 2017 00:38:22 Pali Rohár wrote:
> > > This patch series fix processing MAC address for wl1251 chip found in 
> > > Nokia N900.
> > > 
> > > Changes since v1:
> > > * Added Acked-by for Pavel Machek
> > > * Fixed grammar
> > > * Magic numbers for NVS offsets are replaced by defines
> > > * Check for validity of mac address NVS data is moved into function
> > > * Changed order of patches as Pavel requested
> > > 
> > > Pali Rohár (6):
> > >   wl1251: Update wl->nvs_len after wl->nvs is valid
> > >   wl1251: Generate random MAC address only if driver does not have
> > > valid
> > >   wl1251: Parse and use MAC address from supplied NVS data
> > >   wl1251: Set generated MAC address back to NVS data
> > >   firmware: Add request_firmware_prefer_user() function
> > >   wl1251: Use request_firmware_prefer_user() for loading NVS
> > > calibration data
> > > 
> > >  drivers/base/firmware_class.c  |   45 +-
> > >  drivers/net/wireless/ti/wl1251/Kconfig |1 +
> > >  drivers/net/wireless/ti/wl1251/main.c  |  104 
> > > ++--
> > >  include/linux/firmware.h   |9 +++
> > >  4 files changed, 138 insertions(+), 21 deletions(-)
> > 
> > Hi! Are there any comments for first 4 patches? If not, could they be
> > accepted and merged?
> 
> Since the first 4 patches do not touch the firmware API they seem fine to me 
> so
> long as the maintainer accepts them. Maybe resend and clarify you have dropped
> the other ones and amend with the new tags.

According to get_maintainer.pl, Kalle Valo is maintainer.

Kalle Valo, if you do not have any other comments, can you accept first
4 patches? Or do you really need to resent first 4 patches again?

-- 
Pali Rohár
pali.ro...@gmail.com


signature.asc
Description: PGP signature


Re: [RFC 0/2] hv_netvsc shutdown redo

2018-01-27 Thread Vitaly Kuznetsov
Stephen Hemminger  writes:

> These patches change how teardown of Hyper-V network devices
> is done. These are tested on WS2012 and WS2016.
>
> It moves the tx/rx shutdown into the rndis close handling,
> and that makes earlier gpadl changes unnecsssary.
>

Thank you Stephen,

I gave these a try and they didn't survive my 'death row' test on
WS2016: I run 3 things in parallel:

1) iperf to some external IP
2) while true; do ethtool -L ethX combined 6; ethtool -L ethX combined 8; done
3) while true; do ip link set dev ethX mtu 1400; ip link set dev ethX mtu 1450; 
done

I ended up with a hang:

[ 1226.710034] INFO: task ip:2357 blocked for more than 120 seconds.
[ 1226.712397]   Not tainted 4.15.0-rc9+ #321
[ 1226.714030] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[ 1226.716724] ip  D0  2357   1474 0x
[ 1226.718698] Call Trace:
[ 1226.719588]  ? __schedule+0x1da/0x7b0
[ 1226.720910]  ? get_page_from_freelist+0x106d/0x15c0
[ 1226.722648]  schedule+0x28/0x80
[ 1226.723807]  schedule_preempt_disabled+0xa/0x10
[ 1226.725952]  __mutex_lock.isra.1+0x1a0/0x4e0
[ 1226.727915]  ? rtnetlink_rcv_msg+0x212/0x2d0
[ 1226.729849]  rtnetlink_rcv_msg+0x212/0x2d0
[ 1226.731611]  ? rtnl_calcit.isra.28+0x110/0x110
[ 1226.733824]  netlink_rcv_skb+0x4a/0x120
[ 1226.736916]  netlink_unicast+0x19d/0x250
[ 1226.738907]  netlink_sendmsg+0x2a5/0x3a0
[ 1226.740762]  sock_sendmsg+0x30/0x40
[ 1226.742552]  SYSC_sendto+0x10e/0x140
[ 1226.744310]  ? __do_page_fault+0x26d/0x4c0
[ 1226.746332]  entry_SYSCALL_64_fastpath+0x20/0x83
[ 1226.748730] RIP: 0033:0x7ff2cdc9aa7d
[ 1226.750776] RSP: 002b:7ffd0a3455e8 EFLAGS: 0246
[ 1349.590041] INFO: task kworker/3:6:1586 blocked for more than 120 seconds.
[ 1349.595358]   Not tainted 4.15.0-rc9+ #321
[ 1349.597335] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[ 1349.600638] kworker/3:6 D0  1586  2 0x8000
[ 1349.603335] Workqueue: ipv6_addrconf addrconf_verify_work
[ 1349.605779] Call Trace:
[ 1349.607080]  ? __schedule+0x1da/0x7b0
[ 1349.608856]  ? update_load_avg+0x563/0x6d0
[ 1349.610834]  ? update_curr+0xb9/0x190
[ 1349.613050]  schedule+0x28/0x80
[ 1349.615290]  schedule_preempt_disabled+0xa/0x10
[ 1349.617306]  __mutex_lock.isra.1+0x1a0/0x4e0
[ 1349.619072]  ? addrconf_verify_work+0xa/0x20
[ 1349.621108]  addrconf_verify_work+0xa/0x20
[ 1349.623107]  process_one_work+0x188/0x380
[ 1349.625012]  worker_thread+0x2e/0x390
[ 1349.626976]  ? process_one_work+0x380/0x380
[ 1349.628925]  kthread+0x111/0x130
[ 1349.630498]  ? kthread_create_worker_on_cpu+0x70/0x70
[ 1349.632786]  ? do_group_exit+0x3a/0xa0
[ 1349.634598]  ret_from_fork+0x35/0x40



(I'm not 100% sure this is a _new_ issue btw, it can happen that the
race was always there and it's just easier to trigger it now).

I'll try to do more testing next week.

Thanks,

-- 
  Vitaly


Re: [PATCH bpf-next v7 3/5] libbpf: add error reporting in XDP

2018-01-27 Thread Eric Leblond
Hi,

On Sat, 2018-01-27 at 02:28 +0100, Daniel Borkmann wrote:
> On 01/25/2018 01:05 AM, Eric Leblond wrote:
> > Parse netlink ext attribute to get the error message returned by
> > the card. Code is partially take from libnl.
> > 
> > We add netlink.h to the uapi include of tools. And we need to
> > avoid include of userspace netlink header to have a successful
> > build of sample so nlattr.h has a define to avoid
> > the inclusion. Using a direct define could have been an issue
> > as NLMSGERR_ATTR_MAX can change in the future.
> > 
> > We also define SOL_NETLINK if not defined to avoid to have to
> > copy socket.h for a fixed value.
> > 
> > Signed-off-by: Eric Leblond 
> > Acked-by: Alexei Starovoitov 
> > 
> > remote rtne
> > 
> > Signed-off-by: Eric Leblond 
> 
> Some leftover artifact from squashing commits?

Outch

> >  samples/bpf/Makefile   |   2 +-
> >  tools/lib/bpf/Build|   2 +-
> >  tools/lib/bpf/bpf.c|  13 +++-
> >  tools/lib/bpf/nlattr.c | 187
> > +
> >  tools/lib/bpf/nlattr.h |  72 +++
> >  5 files changed, 273 insertions(+), 3 deletions(-)
> >  create mode 100644 tools/lib/bpf/nlattr.c
> >  create mode 100644 tools/lib/bpf/nlattr.h
> > 
> > diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
> > index 7f61a3d57fa7..5c4cd3745282 100644
> > --- a/samples/bpf/Makefile
> > +++ b/samples/bpf/Makefile
> > @@ -45,7 +45,7 @@ hostprogs-y += xdp_rxq_info
> >  hostprogs-y += syscall_tp
> >  
> >  # Libbpf dependencies
> > -LIBBPF := ../../tools/lib/bpf/bpf.o
> > +LIBBPF := ../../tools/lib/bpf/bpf.o ../../tools/lib/bpf/nlattr.o
> >  CGROUP_HELPERS :=
> > ../../tools/testing/selftests/bpf/cgroup_helpers.o
> >  
> >  test_lru_dist-objs := test_lru_dist.o $(LIBBPF)
> > diff --git a/tools/lib/bpf/Build b/tools/lib/bpf/Build
> > index d8749756352d..64c679d67109 100644
> > --- a/tools/lib/bpf/Build
> > +++ b/tools/lib/bpf/Build
> > @@ -1 +1 @@
> > -libbpf-y := libbpf.o bpf.o
> > +libbpf-y := libbpf.o bpf.o nlattr.o
> > diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
> > index 749a447ec9ed..765fd95b0657 100644
> > --- a/tools/lib/bpf/bpf.c
> > +++ b/tools/lib/bpf/bpf.c
> > @@ -27,7 +27,7 @@
> >  #include "bpf.h"
> >  #include "libbpf.h"
> >  #include "nlattr.h"
> > -#include 
> > +#include 
> 
> Okay, so here it's put back from prior added uapi/linux/rtnetlink.h
> into linux/rtnetlink.h. Could you add this properly in the first
> commit rather than relative adjustment/fix within the same set?

Yes, sure.

> >  #include 
> >  #include 
> >  
> > @@ -37,6 +37,10 @@
> >  #define IFLA_XDP_FLAGS 3
> >  #endif
> >  
> > +#ifndef SOL_NETLINK
> > +#define SOL_NETLINK 270
> > +#endif
> 
> This would need include/linux/socket.h into tools/ include infra
> as well, no?

Yes, and I fear a lot of dependencies.

++
-- 
Eric Leblond 
Blog: https://home.regit.org/


Re: [4.15-rc9] fs_reclaim lockdep trace

2018-01-27 Thread Dave Jones
On Tue, Jan 23, 2018 at 08:36:51PM -0500, Dave Jones wrote:
 > Just triggered this on a server I was rsync'ing to.

Actually, I can trigger this really easily, even with an rsync from one
disk to another.  Though that also smells a little like networking in
the traces. Maybe netdev has ideas.

 
The first instance:

 > 
 > WARNING: possible recursive locking detected
 > 4.15.0-rc9-backup-debug+ #1 Not tainted
 > 
 > sshd/24800 is trying to acquire lock:
 >  (fs_reclaim){+.+.}, at: [<84f438c2>] 
 > fs_reclaim_acquire.part.102+0x5/0x30
 > 
 > but task is already holding lock:
 >  (fs_reclaim){+.+.}, at: [<84f438c2>] 
 > fs_reclaim_acquire.part.102+0x5/0x30
 > 
 > other info that might help us debug this:
 >  Possible unsafe locking scenario:
 > 
 >CPU0
 >
 >   lock(fs_reclaim);
 >   lock(fs_reclaim);
 > 
 >  *** DEADLOCK ***
 > 
 >  May be due to missing lock nesting notation
 > 
 > 2 locks held by sshd/24800:
 >  #0:  (sk_lock-AF_INET6){+.+.}, at: [<1a069652>] 
 > tcp_sendmsg+0x19/0x40
 >  #1:  (fs_reclaim){+.+.}, at: [<84f438c2>] 
 > fs_reclaim_acquire.part.102+0x5/0x30
 > 
 > stack backtrace:
 > CPU: 3 PID: 24800 Comm: sshd Not tainted 4.15.0-rc9-backup-debug+ #1
 > Call Trace:
 >  dump_stack+0xbc/0x13f
 >  ? _atomic_dec_and_lock+0x101/0x101
 >  ? fs_reclaim_acquire.part.102+0x5/0x30
 >  ? print_lock+0x54/0x68
 >  __lock_acquire+0xa09/0x2040
 >  ? debug_show_all_locks+0x2f0/0x2f0
 >  ? mutex_destroy+0x120/0x120
 >  ? hlock_class+0xa0/0xa0
 >  ? kernel_text_address+0x5c/0x90
 >  ? __kernel_text_address+0xe/0x30
 >  ? unwind_get_return_address+0x2f/0x50
 >  ? __save_stack_trace+0x92/0x100
 >  ? graph_lock+0x8d/0x100
 >  ? check_noncircular+0x20/0x20
 >  ? __lock_acquire+0x616/0x2040
 >  ? debug_show_all_locks+0x2f0/0x2f0
 >  ? __lock_acquire+0x616/0x2040
 >  ? debug_show_all_locks+0x2f0/0x2f0
 >  ? print_irqtrace_events+0x110/0x110
 >  ? active_load_balance_cpu_stop+0x7b0/0x7b0
 >  ? debug_show_all_locks+0x2f0/0x2f0
 >  ? mark_lock+0x1b1/0xa00
 >  ? lock_acquire+0x12e/0x350
 >  lock_acquire+0x12e/0x350
 >  ? fs_reclaim_acquire.part.102+0x5/0x30
 >  ? lockdep_rcu_suspicious+0x100/0x100
 >  ? set_next_entity+0x20e/0x10d0
 >  ? mark_lock+0x1b1/0xa00
 >  ? match_held_lock+0x8d/0x440
 >  ? mark_lock+0x1b1/0xa00
 >  ? save_trace+0x1e0/0x1e0
 >  ? print_irqtrace_events+0x110/0x110
 >  ? alloc_extent_state+0xa7/0x410
 >  fs_reclaim_acquire.part.102+0x29/0x30
 >  ? fs_reclaim_acquire.part.102+0x5/0x30
 >  kmem_cache_alloc+0x3d/0x2c0
 >  ? rb_erase+0xe63/0x1240
 >  alloc_extent_state+0xa7/0x410
 >  ? lock_extent_buffer_for_io+0x3f0/0x3f0
 >  ? find_held_lock+0x6d/0xd0
 >  ? test_range_bit+0x197/0x210
 >  ? lock_acquire+0x350/0x350
 >  ? do_raw_spin_unlock+0x147/0x220
 >  ? do_raw_spin_trylock+0x100/0x100
 >  ? iotree_fs_info+0x30/0x30
 >  __clear_extent_bit+0x3ea/0x570
 >  ? clear_state_bit+0x270/0x270
 >  ? count_range_bits+0x2f0/0x2f0
 >  ? lock_acquire+0x350/0x350
 >  ? rb_prev+0x21/0x90
 >  try_release_extent_mapping+0x21a/0x260
 >  __btrfs_releasepage+0xb0/0x1c0
 >  ? btrfs_submit_direct+0xca0/0xca0
 >  ? check_new_page_bad+0x1f0/0x1f0
 >  ? match_held_lock+0xa5/0x440
 >  ? debug_show_all_locks+0x2f0/0x2f0
 >  btrfs_releasepage+0x161/0x170
 >  ? __btrfs_releasepage+0x1c0/0x1c0
 >  ? page_rmapping+0xd0/0xd0
 >  ? rmap_walk+0x100/0x100
 >  try_to_release_page+0x162/0x1c0
 >  ? generic_file_write_iter+0x3c0/0x3c0
 >  ? page_evictable+0xcc/0x110
 >  ? lookup_address_in_pgd+0x107/0x190
 >  shrink_page_list+0x1d5a/0x2fb0
 >  ? putback_lru_page+0x3f0/0x3f0
 >  ? save_trace+0x1e0/0x1e0
 >  ? _lookup_address_cpa.isra.13+0x40/0x60
 >  ? debug_show_all_locks+0x2f0/0x2f0
 >  ? kmem_cache_free+0x8c/0x280
 >  ? free_extent_state+0x1c8/0x3b0
 >  ? mark_lock+0x1b1/0xa00
 >  ? page_rmapping+0xd0/0xd0
 >  ? print_irqtrace_events+0x110/0x110
 >  ? shrink_node_memcg.constprop.88+0x4c9/0x5e0
 >  ? shrink_node+0x12d/0x260
 >  ? try_to_free_pages+0x418/0xaf0
 >  ? __alloc_pages_slowpath+0x976/0x1790
 >  ? __alloc_pages_nodemask+0x52c/0x5c0
 >  ? delete_node+0x28d/0x5c0
 >  ? find_held_lock+0x6d/0xd0
 >  ? free_pcppages_bulk+0x381/0x570
 >  ? lock_acquire+0x350/0x350
 >  ? do_raw_spin_unlock+0x147/0x220
 >  ? do_raw_spin_trylock+0x100/0x100
 >  ? __lock_is_held+0x51/0xc0
 >  ? _raw_spin_unlock+0x24/0x30
 >  ? free_pcppages_bulk+0x381/0x570
 >  ? mark_lock+0x1b1/0xa00
 >  ? free_compound_page+0x30/0x30
 >  ? print_irqtrace_events+0x110/0x110
 >  ? __kernel_map_pages+0x2c9/0x310
 >  ? mark_lock+0x1b1/0xa00
 >  ? print_irqtrace_events+0x110/0x110
 >  ? __delete_from_page_cache+0x2e7/0x4e0
 >  ? save_trace+0x1e0/0x1e0
 >  ? __add_to_page_cache_locked+0x680/0x680
 >  ? find_held_lock+0x6d/0xd0
 >  ? __list_add_valid+0x29/0xa0
 >  ? free_unref_page_commit+0x198/0x270
 >  ? drain_local_pages_wq+0x20/0x20
 >  ? stop_critical_timings+0x210/0x210
 >  ? mark_lock+0x1b1/0xa00
 >  ? mark_lock+0x1b1/0xa00
 >  ? 

[PATCH net] net_sched: gen_estimator: fix lockdep splat

2018-01-27 Thread Eric Dumazet
From: Eric Dumazet 

syzbot reported a lockdep splat in gen_new_estimator() /
est_fetch_counters() when attempting to lock est->stats_lock.

Since est_fetch_counters() is called from BH context from timer
interrupt, we need to block BH as well when calling it from process
context.

Most qdiscs use per cpu counters and are immune to the problem,
but net/sched/act_api.c and net/netfilter/xt_RATEEST.c are using
a spinlock to protect their data. They both call gen_new_estimator()
while object is created and not yet alive, so this bug could
not trigger a deadlock, only a lockdep splat.

Fixes: 1c0d32fde5bd ("net_sched: gen_estimator: complete rewrite of rate 
estimators")
Signed-off-by: Eric Dumazet 
Reported-by: syzbot 
---
 net/core/gen_estimator.c |4 
 1 file changed, 4 insertions(+)

diff --git a/net/core/gen_estimator.c b/net/core/gen_estimator.c
index 
9834cfa21b21168a7654290dc2a999e41937b534..0a3f88f08727f1f1217560407ff539c8a8c17496
 100644
--- a/net/core/gen_estimator.c
+++ b/net/core/gen_estimator.c
@@ -159,7 +159,11 @@ int gen_new_estimator(struct gnet_stats_basic_packed 
*bstats,
est->intvl_log = intvl_log;
est->cpu_bstats = cpu_bstats;
 
+   if (stats_lock)
+   local_bh_disable();
est_fetch_counters(est, );
+   if (stats_lock)
+   local_bh_enable();
est->last_bytes = b.bytes;
est->last_packets = b.packets;
old = rcu_dereference_protected(*rate_est, 1);



kernel 4.15.0-rc9+ (net-next) high cpu load at 50Gbit/s - about 6Mpps

2018-01-27 Thread Paweł Staszewski

Hi


Today I made some real life traffic tests with kernel 4.15.0-rc9

but when traffic reach 50Gbit/s and about 6Mpps cpou load rises fast 
from 48% to 100% for all cpu cores.


Here is some graph that presenting how cpu load rises when there was 
more pps.



https://ibb.co/mhD5ob


here is perf record from that time:

https://pastebin.com/3zqG1rvE


There is 8x 10G ixgbe 82599 interfaces teamed with teamd.

No traffic queueing - only pfifo fast on all interfaces.

No NAT or iptables forles other than INPUT (about 30rules)

All nic's have same ethtool settings:

ethtool -k eth0
Features for eth0:
Cannot get device udp-fragmentation-offload settings: Operation not 
supported

rx-checksumming: on
tx-checksumming: on
    tx-checksum-ipv4: off [fixed]
    tx-checksum-ip-generic: on
    tx-checksum-ipv6: off [fixed]
    tx-checksum-fcoe-crc: off [fixed]
    tx-checksum-sctp: on
scatter-gather: on
    tx-scatter-gather: on
    tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
    tx-tcp-segmentation: on
    tx-tcp-ecn-segmentation: off [fixed]
    tx-tcp-mangleid-segmentation: off
    tx-tcp6-segmentation: on
udp-fragmentation-offload: off
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: on
receive-hashing: on
highdma: on [fixed]
rx-vlan-filter: on
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: on
tx-gre-csum-segmentation: on
tx-ipxip4-segmentation: on
tx-ipxip6-segmentation: on
tx-udp_tnl-segmentation: on
tx-udp_tnl-csum-segmentation: on
tx-gso-partial: on
tx-sctp-segmentation: off [fixed]
tx-esp-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off
hw-tc-offload: off
esp-hw-offload: off [fixed]
esp-tx-csum-hw-offload: off [fixed]
rx-udp_tunnel-port-offload: on


ethtool -g eth0
Ring parameters for eth0:
Pre-set maximums:
RX: 4096
RX Mini:    0
RX Jumbo:   0
TX: 4096
Current hardware settings:
RX: 4096
RX Mini:    0
RX Jumbo:   0
TX: 2048


ethtool -c eth0
Coalesce parameters for eth0:
Adaptive RX: off  TX: off
stats-block-usecs: 0
sample-interval: 0
pkt-rate-low: 0
pkt-rate-high: 0

rx-usecs: 512
rx-frames: 0
rx-usecs-irq: 0
rx-frames-irq: 0

tx-usecs: 0
tx-frames: 0
tx-usecs-irq: 0
tx-frames-irq: 0

rx-usecs-low: 0
rx-frame-low: 0
tx-usecs-low: 0
tx-frame-low: 0

rx-usecs-high: 0
rx-frame-high: 0
tx-usecs-high: 0
tx-frame-high: 0





[bpf-next PATCH 1/5] bpf: Sync kernel ABI header with tooling header for bpf_common.h

2018-01-27 Thread Jesper Dangaard Brouer
I recently fixed up a lot of commits that forgot to keep the tooling
headers in sync.  And then I forgot to do the same thing in commit
cb5f7334d479 ("bpf: add comments to BPF ld/ldx sizes"). Let correct
that before people notice ;-).

Lawrence did partly fix/sync this for bpf.h in commit d6d4f60c3a09
("bpf: add selftest for tcpbpf").

Fixes: cb5f7334d479 ("bpf: add comments to BPF ld/ldx sizes")
Signed-off-by: Jesper Dangaard Brouer 
---
 0 files changed

diff --git a/tools/include/uapi/linux/bpf_common.h 
b/tools/include/uapi/linux/bpf_common.h
index 18be90725ab0..ee97668bdadb 100644
--- a/tools/include/uapi/linux/bpf_common.h
+++ b/tools/include/uapi/linux/bpf_common.h
@@ -15,9 +15,10 @@
 
 /* ld/ldx fields */
 #define BPF_SIZE(code)  ((code) & 0x18)
-#defineBPF_W   0x00
-#defineBPF_H   0x08
-#defineBPF_B   0x10
+#defineBPF_W   0x00 /* 32-bit */
+#defineBPF_H   0x08 /* 16-bit */
+#defineBPF_B   0x10 /*  8-bit */
+/* eBPFBPF_DW  0x1864-bit */
 #define BPF_MODE(code)  ((code) & 0xe0)
 #defineBPF_IMM 0x00
 #defineBPF_ABS 0x20



[bpf-next PATCH 3/5] tools/libbpf: add test program for loading BPF ELF files

2018-01-27 Thread Jesper Dangaard Brouer
Signed-off-by: Jesper Dangaard Brouer 
---
 0 files changed

diff --git a/tools/lib/bpf/Makefile b/tools/lib/bpf/Makefile
index 83714ca1f22b..f968702f4ef6 100644
--- a/tools/lib/bpf/Makefile
+++ b/tools/lib/bpf/Makefile
@@ -147,11 +147,11 @@ LIB_FILE := $(addprefix $(OUTPUT),$(LIB_FILE))
 
 CMD_TARGETS = $(LIB_FILE)
 
-TARGETS = $(CMD_TARGETS)
+TARGETS = $(CMD_TARGETS) test_libbpf_open
 
 all: fixdep all_cmd
 
-all_cmd: $(CMD_TARGETS)
+all_cmd: $(TARGETS)
 
 $(BPF_IN): force elfdep bpfdep
@(test -f ../../include/uapi/linux/bpf.h -a -f 
../../../include/uapi/linux/bpf.h && ( \
@@ -168,6 +168,9 @@ $(OUTPUT)libbpf.so: $(BPF_IN)
 $(OUTPUT)libbpf.a: $(BPF_IN)
$(QUIET_LINK)$(RM) $@; $(AR) rcs $@ $^
 
+test_libbpf_open: test_libbpf_open.c $(OUTPUT)libbpf.a
+   $(QUIET_LINK)$(CC) -lelf -Wall $< $(OUTPUT)libbpf.a -o $(OUTPUT)$@
+
 define do_install
if [ ! -d '$(DESTDIR_SQ)$2' ]; then \
$(INSTALL) -d -m 755 '$(DESTDIR_SQ)$2'; \
diff --git a/tools/lib/bpf/test_libbpf_open.c b/tools/lib/bpf/test_libbpf_open.c
new file mode 100644
index ..8fcd1c076add
--- /dev/null
+++ b/tools/lib/bpf/test_libbpf_open.c
@@ -0,0 +1,150 @@
+/* SPDX-License-Identifier: GPL-2.0
+ * Copyright (c) 2018 Jesper Dangaard Brouer, Red Hat Inc.
+ */
+static const char *__doc__ =
+   "Libbpf test program for loading BPF ELF object files";
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static const struct option long_options[] = {
+   {"help",no_argument,NULL, 'h' },
+   {"debug",   no_argument,NULL, 'D' },
+   {"quiet",   no_argument,NULL, 'q' },
+   {0, 0, NULL,  0 }
+};
+
+static void usage(char *argv[])
+{
+   int i;
+
+   printf("\nDOCUMENTATION:\n%s\n\n", __doc__);
+   printf(" Usage: %s (options-see-below) BPF_FILE\n", argv[0]);
+   printf(" Listing options:\n");
+   for (i = 0; long_options[i].name != 0; i++) {
+   printf(" --%-12s", long_options[i].name);
+   printf(" short-option: -%c",
+  long_options[i].val);
+   printf("\n");
+   }
+   printf("\n");
+}
+
+#define DEFINE_PRINT_FN(name, enabled) \
+static int libbpf_##name(const char *fmt, ...) \
+{  \
+va_list args;  \
+int ret;   \
+   \
+va_start(args, fmt);   \
+   if (enabled) {  \
+   fprintf(stderr, "[" #name "] ");\
+   ret = vfprintf(stderr, fmt, args);  \
+   }   \
+va_end(args);  \
+return ret;\
+}
+DEFINE_PRINT_FN(warning, 1)
+DEFINE_PRINT_FN(info, 1)
+DEFINE_PRINT_FN(debug, 1)
+
+#define EXIT_FAIL_LIBBPF EXIT_FAILURE
+#define EXIT_FAIL_OPTION 2
+
+int test_walk_progs(struct bpf_object *obj, bool verbose)
+{
+   struct bpf_program *prog;
+   int cnt = 0;
+
+   bpf_object__for_each_program(prog, obj) {
+   cnt++;
+   if (verbose)
+   printf("Prog (count:%d) section_name: %s\n", cnt,
+  bpf_program__title(prog, false));
+   }
+   return 0;
+}
+
+int test_walk_maps(struct bpf_object *obj, bool verbose)
+{
+   struct bpf_map *map;
+   int cnt = 0;
+
+   bpf_map__for_each(map, obj) {
+   cnt++;
+   if (verbose)
+   printf("Map (count:%d) name: %s\n", cnt,
+  bpf_map__name(map));
+   }
+   return 0;
+}
+
+int test_open_file(char *filename, bool verbose)
+{
+   struct bpf_object *bpfobj = NULL;
+   long err;
+
+   if (verbose)
+   printf("Open BPF ELF-file with libbpf: %s\n", filename);
+
+   /* Load BPF ELF object file and check for errors */
+   bpfobj = bpf_object__open(filename);
+   err = libbpf_get_error(bpfobj);
+   if (err) {
+   char err_buf[128];
+   libbpf_strerror(err, err_buf, sizeof(err_buf));
+   if (verbose)
+   printf("Unable to load eBPF objects in file '%s': %s\n",
+  filename, err_buf);
+   return EXIT_FAIL_LIBBPF;
+   }
+   test_walk_progs(bpfobj, verbose);
+   test_walk_maps(bpfobj, verbose);
+
+   if (verbose)
+   printf("Close BPF ELF-file with libbpf: %s\n",
+  bpf_object__name(bpfobj));
+   bpf_object__close(bpfobj);
+
+   return 0;
+}
+
+int main(int argc, char **argv)
+{
+   char filename[1024] = { 0 };
+   bool verbose = 1;
+   int longindex = 0;
+   int opt;
+
+   

[bpf-next PATCH 4/5] selftests/bpf: add selftest that use test_libbpf_open

2018-01-27 Thread Jesper Dangaard Brouer
Signed-off-by: Jesper Dangaard Brouer 
---
 tools/testing/selftests/bpf/Makefile   |9 +-
 tools/testing/selftests/bpf/test_libbpf.sh |   45 
 2 files changed, 53 insertions(+), 1 deletion(-)
 create mode 100755 tools/testing/selftests/bpf/test_libbpf.sh

diff --git a/tools/testing/selftests/bpf/Makefile 
b/tools/testing/selftests/bpf/Makefile
index bf05bc5e36e5..ea2e7d498f5a 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -13,6 +13,7 @@ endif
 CFLAGS += -Wall -O2 -I$(APIDIR) -I$(LIBDIR) -I$(GENDIR) $(GENFLAGS) 
-I../../../include
 LDLIBS += -lcap -lelf -lrt -lpthread
 
+# Order correspond to 'make run_tests' order
 TEST_GEN_PROGS = test_verifier test_tag test_maps test_lru_map test_lpm_map 
test_progs \
test_align test_verifier_log test_dev_cgroup test_tcpbpf_user
 
@@ -22,7 +23,11 @@ TEST_GEN_FILES = test_pkt_access.o test_xdp.o test_l4lb.o 
test_tcp_estats.o test
test_l4lb_noinline.o test_xdp_noinline.o test_stacktrace_map.o \
sample_map_ret0.o test_tcpbpf_kern.o
 
-TEST_PROGS := test_kmod.sh test_xdp_redirect.sh test_xdp_meta.sh \
+# Order correspond to 'make run_tests' order
+TEST_PROGS := test_kmod.sh \
+   test_libbpf.sh \
+   test_xdp_redirect.sh \
+   test_xdp_meta.sh \
test_offload.py
 
 include ../lib.mk
@@ -36,6 +41,8 @@ $(TEST_GEN_PROGS): $(BPFOBJ)
 # force a rebuild of BPFOBJ when its dependencies are updated
 force:
 
+$(OUTPUT)/test_libbpf_open: $(BPFOBJ)
+
 $(BPFOBJ): force
$(MAKE) -C $(BPFDIR) OUTPUT=$(OUTPUT)/
 
diff --git a/tools/testing/selftests/bpf/test_libbpf.sh 
b/tools/testing/selftests/bpf/test_libbpf.sh
new file mode 100755
index ..bd623d2cbdb8
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_libbpf.sh
@@ -0,0 +1,45 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+
+export TESTNAME=test_libbpf
+
+# Determine selftest success via shell exit code
+exit_handler()
+{
+   if (( $? == 0 )); then
+   echo "selftests: $TESTNAME [PASS]";
+   else
+   echo "$TESTNAME: failed at file $LAST_LOADED" 1>&2
+   echo "selftests: $TESTNAME [FAILED]";
+   fi
+}
+
+libbpf_open_file()
+{
+   LAST_LOADED=$1
+   ./test_libbpf_open --quiet $1
+}
+
+# Exit script immediately (well catched by trap handler) if any
+# program/thing exits with a non-zero status.
+set -e
+
+# (Use 'trap -l' to list meaning of numbers)
+trap exit_handler 0 2 3 6 9
+
+libbpf_open_file test_l4lb.o
+
+# TODO: fix libbpf to load noinline functions
+# [warning] libbpf: incorrect bpf_call opcode
+#libbpf_open_file test_l4lb_noinline.o
+
+# TODO: fix test_xdp_meta.c to load with libbpf
+# [warning] libbpf: test_xdp_meta.o doesn't provide kernel version
+#libbpf_open_file test_xdp_meta.o
+
+# TODO: fix libbpf to handle .eh_frame
+# [warning] libbpf: relocation failed: no section(10)
+#libbpf_open_file ../../../../samples/bpf/tracex3_kern.o
+
+# Success
+exit 0



[bpf-next PATCH 0/5] tools/libbpf improvements and selftests

2018-01-27 Thread Jesper Dangaard Brouer
While playing with using libbpf for the Suricata project, we had
issues LLVM >= 4.0.1 generating ELF files that could not be loaded
with libbpf (tools/lib/bpf/).

During the troubleshooting phase, I wrote a test program and improved
the debugging output in libbpf.  I turned this into a selftests
program, and it also serves as a code example for libbpf in itself.

I discovered that there are at least three ELF load issues with
libbpf.  I left them as TODO comments in (tools/testing/selftests/bpf)
test_libbpf.sh. I've only fixed the load issue with eh_frames.  We can
work on the other issues later.

---

Jesper Dangaard Brouer (5):
  bpf: Sync kernel ABI header with tooling header for bpf_common.h
  tools/libbpf: improve the pr_debug statements to contain section numbers
  tools/libbpf: add test program for loading BPF ELF files
  selftests/bpf: add selftest that use test_libbpf_open
  tools/libbpf: handle issues with bpf ELF objects containing .eh_frames


 tools/testing/selftests/bpf/Makefile   |9 +-
 tools/testing/selftests/bpf/test_libbpf.sh |   45 
 2 files changed, 53 insertions(+), 1 deletion(-)
 create mode 100755 tools/testing/selftests/bpf/test_libbpf.sh

--


[bpf-next PATCH 5/5] tools/libbpf: handle issues with bpf ELF objects containing .eh_frames

2018-01-27 Thread Jesper Dangaard Brouer
If clang >= 4.0.1 is missing the option '-target bpf', it will cause
llc/llvm to create two ELF sections for "Exception Frames", with
section names '.eh_frame' and '.rel.eh_frame'.

The BPF ELF loader library libbpf fails when loading files with these
sections.  The other in-kernel BPF ELF loader in samples/bpf/bpf_load.c,
handle this gracefully. And iproute2 loader also seems to work with these
"eh" sections.

The issue in libbpf is caused by bpf_object__elf_collect() skip the
'.eh_frame' and thus doesn't create an internal data structure
pointing to this ELF section index.  Later when the relocation section
'.rel.eh_frame' is processed, it tries to find the '.eh_frame' via the
ELF section idx, which is that fails (in bpf_object__collect_reloc).

I couldn't find a way to see that the '.rel.eh_frame' was irrelevant
(that is only determined by looking at the section it reference, which
we no longer have info available on).

Thus, my solution is simply to match on the name of the relocation
section, to skip that too.

Signed-off-by: Jesper Dangaard Brouer 
---
 0 files changed

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index b4eeaa3ebff5..84e8bbe07347 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -822,6 +822,13 @@ static int bpf_object__elf_collect(struct bpf_object *obj)
void *reloc = obj->efile.reloc;
int nr_reloc = obj->efile.nr_reloc + 1;
 
+   /* Skip decoding of "eh" exception frames */
+   if (strcmp(name, ".rel.eh_frame") == 0) {
+   pr_debug("skip relo section %s(%d) for 
section(%d)\n",
+name, idx, sh.sh_info);
+   continue;
+   }
+
reloc = realloc(reloc,
sizeof(*obj->efile.reloc) * nr_reloc);
if (!reloc) {



[bpf-next PATCH 2/5] tools/libbpf: improve the pr_debug statements to contain section numbers

2018-01-27 Thread Jesper Dangaard Brouer
While debugging a bpf ELF loading issue, I needed to correlate the
ELF section number with the failed relocation section reference.
Thus, add section numbers/index to the pr_debug.

In debug mode, also print section that were skipped.  This helped
me identify that a section (.eh_frame) was skipped, and this was
the reason the relocation section (.rel.eh_frame) could not find
that section number.

The section numbers corresponds to the readelf tools Section Headers [Nr].

Signed-off-by: Jesper Dangaard Brouer 
---
 0 files changed

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 30c776375118..b4eeaa3ebff5 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -315,8 +315,8 @@ bpf_program__init(void *data, size_t size, char 
*section_name, int idx,
 
prog->section_name = strdup(section_name);
if (!prog->section_name) {
-   pr_warning("failed to alloc name for prog under section %s\n",
-  section_name);
+   pr_warning("failed to alloc name for prog under section(%d) 
%s\n",
+  idx, section_name);
goto errout;
}
 
@@ -759,29 +759,29 @@ static int bpf_object__elf_collect(struct bpf_object *obj)
 
idx++;
if (gelf_getshdr(scn, ) != ) {
-   pr_warning("failed to get section header from %s\n",
-  obj->path);
+   pr_warning("failed to get section(%d) header from %s\n",
+  idx, obj->path);
err = -LIBBPF_ERRNO__FORMAT;
goto out;
}
 
name = elf_strptr(elf, ep->e_shstrndx, sh.sh_name);
if (!name) {
-   pr_warning("failed to get section name from %s\n",
-  obj->path);
+   pr_warning("failed to get section(%d) name from %s\n",
+  idx, obj->path);
err = -LIBBPF_ERRNO__FORMAT;
goto out;
}
 
data = elf_getdata(scn, 0);
if (!data) {
-   pr_warning("failed to get section data from %s(%s)\n",
-  name, obj->path);
+   pr_warning("failed to get section(%d) data from 
%s(%s)\n",
+  idx, name, obj->path);
err = -LIBBPF_ERRNO__FORMAT;
goto out;
}
-   pr_debug("section %s, size %ld, link %d, flags %lx, type=%d\n",
-name, (unsigned long)data->d_size,
+   pr_debug("section(%d) %s, size %ld, link %d, flags %lx, 
type=%d\n",
+idx, name, (unsigned long)data->d_size,
 (int)sh.sh_link, (unsigned long)sh.sh_flags,
 (int)sh.sh_type);
 
@@ -836,6 +836,8 @@ static int bpf_object__elf_collect(struct bpf_object *obj)
obj->efile.reloc[n].shdr = sh;
obj->efile.reloc[n].data = data;
}
+   } else {
+   pr_debug("skip section(%d) %s\n", idx, name);
}
if (err)
goto out;
@@ -1115,8 +1117,7 @@ static int bpf_object__collect_reloc(struct bpf_object 
*obj)
 
prog = bpf_object__find_prog_by_idx(obj, idx);
if (!prog) {
-   pr_warning("relocation failed: no %d section\n",
-  idx);
+   pr_warning("relocation failed: no section(%d)\n", idx);
return -LIBBPF_ERRNO__RELOC;
}
 



Re: [net-next 07/15] i40e: Implement an ethtool private flag to stop LLDP in FW

2018-01-27 Thread Or Gerlitz
On Fri, Jan 26, 2018 at 11:24 PM, Jeff Kirsher
 wrote:
> From: Dave Ertman 
>
> Implement the private flag disable-fw-lldp for ethtool
> to disable the processing of LLDP packets by the FW.
> This will stop the FW from consuming LLDPDU and cause
> them to be sent up the stack.
>
> The FW is also being configured to apply a default DCB
> configuration on link up.
>
> Toggling the value of this flag will also cause a PF reset.
>
> Disabling FW DCB will also disable DCBx.

wait, isn't there a knob in the DCB NL UAPI to state where the DCBx
state-machine runs {nowhere, host, firmware}, I am pretty much there is
such, and if not, why not add it there instead of private flags?

Or.


[PATCH] net: pxa168_eth: add netconsole support

2018-01-27 Thread Alexander Monakov
This implements ndo_poll_controller callback which is necessary to
enable netconsole.

Signed-off-by: Alexander Monakov 
Cc: Russell King 
Cc: Sebastian Hesselbarth 
Cc: Florian Fainelli 
---
Hello,

I'm using this to enable netconsole on a consumer device built around the
Marvell Berlin BG2CD SoC.

Thanks.
Alexander

 drivers/net/ethernet/marvell/pxa168_eth.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/drivers/net/ethernet/marvell/pxa168_eth.c 
b/drivers/net/ethernet/marvell/pxa168_eth.c
index 7bbd86f08e5f..6a188f7b426a 100644
--- a/drivers/net/ethernet/marvell/pxa168_eth.c
+++ b/drivers/net/ethernet/marvell/pxa168_eth.c
@@ -1362,6 +1362,14 @@ static int pxa168_eth_do_ioctl(struct net_device *dev, 
struct ifreq *ifr,
return -EOPNOTSUPP;
 }
 
+#ifdef CONFIG_NET_POLL_CONTROLLER
+static void pxa168_eth_netpoll(struct net_device *dev)
+{
+   struct pxa168_eth_private *pep = netdev_priv(dev);
+   napi_schedule(>napi);
+}
+#endif
+
 static void pxa168_get_drvinfo(struct net_device *dev,
   struct ethtool_drvinfo *info)
 {
@@ -1390,6 +1398,9 @@ static const struct net_device_ops pxa168_eth_netdev_ops 
= {
.ndo_do_ioctl   = pxa168_eth_do_ioctl,
.ndo_change_mtu = pxa168_eth_change_mtu,
.ndo_tx_timeout = pxa168_eth_tx_timeout,
+#ifdef CONFIG_NET_POLL_CONTROLLER
+   .ndo_poll_controller= pxa168_eth_netpoll,
+#endif
 };
 
 static int pxa168_eth_probe(struct platform_device *pdev)
-- 
2.11.0



Re: [4.15-rc9] fs_reclaim lockdep trace

2018-01-27 Thread Linus Torvalds
On Sat, Jan 27, 2018 at 2:24 PM, Dave Jones  wrote:
> On Tue, Jan 23, 2018 at 08:36:51PM -0500, Dave Jones wrote:
>  > Just triggered this on a server I was rsync'ing to.
>
> Actually, I can trigger this really easily, even with an rsync from one
> disk to another.  Though that also smells a little like networking in
> the traces. Maybe netdev has ideas.

Is this new to 4.15? Or is it just that you're testing something new?

If it's new and easy to repro, can you just bisect it? And if it isn't
new, can you perhaps check whether it's new to 4.14 (ie 4.13 being
ok)?

Because that fs_reclaim_acquire/release() debugging isn't new to 4.15,
but it was rewritten for 4.14.. I'm wondering if that remodeling ended
up triggering something.

Adding PeterZ to the participants list in case he has ideas. I'm not
seeing what would be the problem in that call chain from hell.

   Linus


Re: [RFC 0/2] hv_netvsc shutdown redo

2018-01-27 Thread Stephen Hemminger
On Sat, 27 Jan 2018 21:00:12 +
Haiyang Zhang  wrote:

> In the functions, set_channels and change_mtu, we used to call netvsc_close 
> which has a wait for ring buffers to drain. Now, we call rndis_filter_close() 
> directly without the wait for rings to drain. Could this be a problem?
> 

rndis_filter_close now waits for rings to drain.



Re: [PATCH iproute2] ip: address: fix stats64 JSON object name

2018-01-27 Thread Stephen Hemminger
On Fri, 26 Jan 2018 11:30:35 -0800
Jakub Kicinski  wrote:

> The JSON object name for statistics in ip link show is "stats644".
> Looks like a typo, commit d0e720111aad ("ip: ipaddress.c: add support
> for json output") contains an example with the expected "stats64" name.
> 
> The fact that no one has noticed until now is probably an indication
> that no one is using this object.  Hopefully it's not too late to fix
> this, although IIUC this has already been in 4.13 and 4.14 releases :S
> 
> Fixes: d0e720111aad ("ip: ipaddress.c: add support for json output")
> Signed-off-by: Jakub Kicinski 
> ---
>  ip/ipaddress.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/ip/ipaddress.c b/ip/ipaddress.c
> index ba60125c1b78..67ac6bd31373 100644
> --- a/ip/ipaddress.c
> +++ b/ip/ipaddress.c
> @@ -598,7 +598,7 @@ static void print_link_stats64(FILE *fp, const struct 
> rtnl_link_stats64 *s,
>  const struct rtattr *carrier_changes)
>  {
>   if (is_json_context()) {
> - open_json_object("stats644");
> + open_json_object("stats64");
>  
>   /* RX stats */
>   open_json_object("rx");

Thanks for the bugfix. Applied.


Re: [PATCH iproute2] tc: fix second printing of requeues

2018-01-27 Thread Stephen Hemminger
On Sat, 27 Jan 2018 01:19:04 -0800
Jakub Kicinski  wrote:

> Non-JSON tc qdisc output used to print the "requeues" statistic
> twice.  Commit 4fcec7f3665b ("tc: jsonify stats2") tried to preserve
> this behaviour for both standard output and JSON, but used the wrong
> statistic (q.qlen).  Also duplicating keys in JSON is not allowed,
> so the second occurrence should be completely skipped with JSON.
> 
> Fixes: 4fcec7f3665b ("tc: jsonify stats2")
> Signed-off-by: Jakub Kicinski 

Also applied this fix


Re: [4.15-rc9] fs_reclaim lockdep trace

2018-01-27 Thread Tetsuo Handa
Linus Torvalds wrote:
> On Sat, Jan 27, 2018 at 2:24 PM, Dave Jones  wrote:
>> On Tue, Jan 23, 2018 at 08:36:51PM -0500, Dave Jones wrote:
>>  > Just triggered this on a server I was rsync'ing to.
>>
>> Actually, I can trigger this really easily, even with an rsync from one
>> disk to another.  Though that also smells a little like networking in
>> the traces. Maybe netdev has ideas.
> 
> Is this new to 4.15? Or is it just that you're testing something new?
> 
> If it's new and easy to repro, can you just bisect it? And if it isn't
> new, can you perhaps check whether it's new to 4.14 (ie 4.13 being
> ok)?
> 
> Because that fs_reclaim_acquire/release() debugging isn't new to 4.15,
> but it was rewritten for 4.14.. I'm wondering if that remodeling ended
> up triggering something.

--- linux-4.13.16/mm/page_alloc.c
+++ linux-4.14.15/mm/page_alloc.c
@@ -3527,53 +3519,12 @@
return true;
}
return false;
 }
 #endif /* CONFIG_COMPACTION */
 
-#ifdef CONFIG_LOCKDEP
-struct lockdep_map __fs_reclaim_map =
-   STATIC_LOCKDEP_MAP_INIT("fs_reclaim", &__fs_reclaim_map);
-
-static bool __need_fs_reclaim(gfp_t gfp_mask)
-{
-   gfp_mask = current_gfp_context(gfp_mask);
-
-   /* no reclaim without waiting on it */
-   if (!(gfp_mask & __GFP_DIRECT_RECLAIM))
-   return false;
-
-   /* this guy won't enter reclaim */
-   if ((current->flags & PF_MEMALLOC) && !(gfp_mask & __GFP_NOMEMALLOC))
-   return false;
-
-   /* We're only interested __GFP_FS allocations for now */
-   if (!(gfp_mask & __GFP_FS))
-   return false;
-
-   if (gfp_mask & __GFP_NOLOCKDEP)
-   return false;
-
-   return true;
-}
-
-void fs_reclaim_acquire(gfp_t gfp_mask)
-{
-   if (__need_fs_reclaim(gfp_mask))
-   lock_map_acquire(&__fs_reclaim_map);
-}
-EXPORT_SYMBOL_GPL(fs_reclaim_acquire);
-
-void fs_reclaim_release(gfp_t gfp_mask)
-{
-   if (__need_fs_reclaim(gfp_mask))
-   lock_map_release(&__fs_reclaim_map);
-}
-EXPORT_SYMBOL_GPL(fs_reclaim_release);
-#endif
-
 /* Perform direct synchronous page reclaim */
 static int
 __perform_reclaim(gfp_t gfp_mask, unsigned int order,
const struct alloc_context *ac)
 {
struct reclaim_state reclaim_state;
@@ -3582,21 +3533,21 @@
 
cond_resched();
 
/* We now go into synchronous reclaim */
cpuset_memory_pressure_bump();
noreclaim_flag = memalloc_noreclaim_save();
-   fs_reclaim_acquire(gfp_mask);
+   lockdep_set_current_reclaim_state(gfp_mask);
reclaim_state.reclaimed_slab = 0;
current->reclaim_state = _state;
 
progress = try_to_free_pages(ac->zonelist, order, gfp_mask,
ac->nodemask);
 
current->reclaim_state = NULL;
-   fs_reclaim_release(gfp_mask);
+   lockdep_clear_current_reclaim_state();
memalloc_noreclaim_restore(noreclaim_flag);
 
cond_resched();
 
return progress;
 }

> 
> Adding PeterZ to the participants list in case he has ideas. I'm not
> seeing what would be the problem in that call chain from hell.
> 
>Linus

Dave Jones wrote:
> 
> WARNING: possible recursive locking detected
> 4.15.0-rc9-backup-debug+ #1 Not tainted
> 
> sshd/24800 is trying to acquire lock:
>  (fs_reclaim){+.+.}, at: [<84f438c2>] 
> fs_reclaim_acquire.part.102+0x5/0x30
> 
> but task is already holding lock:
>  (fs_reclaim){+.+.}, at: [<84f438c2>] 
> fs_reclaim_acquire.part.102+0x5/0x30
> 
> other info that might help us debug this:
>  Possible unsafe locking scenario:
> 
>CPU0
>
>   lock(fs_reclaim);
>   lock(fs_reclaim);
> 
>  *** DEADLOCK ***
> 
>  May be due to missing lock nesting notation
> 
> 2 locks held by sshd/24800:
>  #0:  (sk_lock-AF_INET6){+.+.}, at: [<1a069652>] tcp_sendmsg+0x19/0x40
>  #1:  (fs_reclaim){+.+.}, at: [<84f438c2>] 
> fs_reclaim_acquire.part.102+0x5/0x30
> 
> stack backtrace:
> CPU: 3 PID: 24800 Comm: sshd Not tainted 4.15.0-rc9-backup-debug+ #1
> Call Trace:
>  dump_stack+0xbc/0x13f
>  __lock_acquire+0xa09/0x2040
>  lock_acquire+0x12e/0x350
>  fs_reclaim_acquire.part.102+0x29/0x30
>  kmem_cache_alloc+0x3d/0x2c0
>  alloc_extent_state+0xa7/0x410
>  __clear_extent_bit+0x3ea/0x570
>  try_release_extent_mapping+0x21a/0x260
>  __btrfs_releasepage+0xb0/0x1c0
>  btrfs_releasepage+0x161/0x170
>  try_to_release_page+0x162/0x1c0
>  shrink_page_list+0x1d5a/0x2fb0
>  shrink_inactive_list+0x451/0x940
>  shrink_node_memcg.constprop.88+0x4c9/0x5e0
>  shrink_node+0x12d/0x260
>  try_to_free_pages+0x418/0xaf0
>  __alloc_pages_slowpath+0x976/0x1790
>  __alloc_pages_nodemask+0x52c/0x5c0
>  new_slab+0x374/0x3f0
>  ___slab_alloc.constprop.81+0x47e/0x5a0
>  

Re: [4.15-rc9] fs_reclaim lockdep trace

2018-01-27 Thread Tetsuo Handa
On 2018/01/28 10:16, Tetsuo Handa wrote:
> Linus Torvalds wrote:
>> On Sat, Jan 27, 2018 at 2:24 PM, Dave Jones  wrote:
>>> On Tue, Jan 23, 2018 at 08:36:51PM -0500, Dave Jones wrote:
>>>  > Just triggered this on a server I was rsync'ing to.
>>>
>>> Actually, I can trigger this really easily, even with an rsync from one
>>> disk to another.  Though that also smells a little like networking in
>>> the traces. Maybe netdev has ideas.
>>
>> Is this new to 4.15? Or is it just that you're testing something new?
>>
>> If it's new and easy to repro, can you just bisect it? And if it isn't
>> new, can you perhaps check whether it's new to 4.14 (ie 4.13 being
>> ok)?
>>
>> Because that fs_reclaim_acquire/release() debugging isn't new to 4.15,
>> but it was rewritten for 4.14.. I'm wondering if that remodeling ended
>> up triggering something.
> 
> --- linux-4.13.16/mm/page_alloc.c
> +++ linux-4.14.15/mm/page_alloc.c

Oops. This output was inverted.

> @@ -3527,53 +3519,12 @@
>   return true;
>   }
>   return false;
>  }
>  #endif /* CONFIG_COMPACTION */
>  
> -#ifdef CONFIG_LOCKDEP
> -struct lockdep_map __fs_reclaim_map =
> - STATIC_LOCKDEP_MAP_INIT("fs_reclaim", &__fs_reclaim_map);
> -
> -static bool __need_fs_reclaim(gfp_t gfp_mask)
> -{
> - gfp_mask = current_gfp_context(gfp_mask);
> -
> - /* no reclaim without waiting on it */
> - if (!(gfp_mask & __GFP_DIRECT_RECLAIM))
> - return false;
> -
> - /* this guy won't enter reclaim */
> - if ((current->flags & PF_MEMALLOC) && !(gfp_mask & __GFP_NOMEMALLOC))
> - return false;

Since __kmalloc_reserve() from __alloc_skb() adds __GFP_NOMEMALLOC | 
__GFP_NOWARN
to gfp_mask, __need_fs_reclaim() is failing to return false here.

But why checking __GFP_NOMEMALLOC here? __alloc_pages_slowpath() skips direct
reclaim if !(gfp_mask & __GFP_DIRECT_RECLAIM) or (current->flags & PF_MEMALLOC),
doesn't it?

--
static inline struct page *
__alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
struct alloc_context *ac)
{
(...snipped...)
/* Caller is not willing to reclaim, we can't balance anything */
if (!can_direct_reclaim)
goto nopage;

/* Avoid recursion of direct reclaim */
if (current->flags & PF_MEMALLOC)
goto nopage;

/* Try direct reclaim and then allocating */
page = __alloc_pages_direct_reclaim(gfp_mask, order, alloc_flags, ac,
_some_progress);
if (page)
goto got_pg;
(...snipped...)
}
--



Incoming TCP packet validation (of ACK numbers)

2018-01-27 Thread Zhiyun Qian
Hi there,

As part of our ongoing research effort to understand the discrepancies
among Linux, macOS (FreeBSD), and Windows. We discover a violation of
the way Linux hanldes incoming TCP packet, specifically ACK number
validation.

According to RFC 793, "If the ACK is a duplicate (SEG.ACK < SND.UNA),
it can be ignored. If the ACK acknowledges something not yet sent
(SEG.ACK > SND.NXT) then send an ACK, drop the segment, and return".
In RFC 5961, the first sentence is changed (for more stringent ACK
number validation) but the second sentence remains the same.

Clearly, when the ACK number of the incoming packet is larger than
SND.NXT, we are supposed to send back an ACK. However, Linux currently
chooses to silently discard the packet without any reply. We have
checked macOS implementation which adheres to the specification.

I'd love to hear any thoughts on this.

Best,
-Zhiyun


RE: [RFC crypto v3 8/9] chtls: Register the ULP

2018-01-27 Thread Atul Gupta


-Original Message-
From: Dave Watson [mailto:davejwat...@fb.com] 
Sent: Friday, January 26, 2018 2:39 AM
To: Atul Gupta 
Cc: herb...@gondor.apana.org.au; linux-cry...@vger.kernel.org; 
ganes...@chelsio.co; netdev@vger.kernel.org; da...@davemloft.net; Boris 
Pismenny ; Ilya Lesokhin 
Subject: Re: [RFC crypto v3 8/9] chtls: Register the ULP

<1513769897-26945-1-git-send-email-atul.gu...@chelsio.com>

On 12/20/17 05:08 PM, Atul Gupta wrote:
> +static void __init chtls_init_ulp_ops(void) {
> + chtls_base_prot = tcp_prot;
> + chtls_base_prot.hash= chtls_hash;
> + chtls_base_prot.unhash  = chtls_unhash;
> + chtls_base_prot.close   = chtls_lsk_close;
> +
> + chtls_cpl_prot  = chtls_base_prot;
> + chtls_init_rsk_ops(_cpl_prot, _rsk_ops,
> +_prot, PF_INET);
> + chtls_cpl_prot.close= chtls_close;
> + chtls_cpl_prot.disconnect   = chtls_disconnect;
> + chtls_cpl_prot.destroy  = chtls_destroy_sock;
> + chtls_cpl_prot.shutdown = chtls_shutdown;
> + chtls_cpl_prot.sendmsg  = chtls_sendmsg;
> + chtls_cpl_prot.recvmsg  = chtls_recvmsg;
> + chtls_cpl_prot.sendpage = chtls_sendpage;
> + chtls_cpl_prot.setsockopt   = chtls_setsockopt;
> + chtls_cpl_prot.getsockopt   = chtls_getsockopt;
> +}

Much of this file should go in tls_main.c, reusing as much as possible. For 
example it doesn't look like the get/set sockopts have changed at all for chtls.

Agree, should common code and anything other than TLS_BASE_TX/TLS_SW_TX prot 
should go in vendor specific file/driver. Since, prot require redefinition for 
hardware the code is kept in chtls_main.c

> +
> +static int __init chtls_register(void) {
> + chtls_init_ulp_ops();
> + register_listen_notifier(_notifier);
> + cxgb4_register_uld(CXGB4_ULD_TLS, _uld_info);
> + tcp_register_ulp(_chtls_ulp_ops);
> + return 0;
> +}
> +
> +static void __exit chtls_unregister(void) {
> + unregister_listen_notifier(_notifier);
> + tcp_unregister_ulp(_chtls_ulp_ops);
> + chtls_free_all_uld();
> + cxgb4_unregister_uld(CXGB4_ULD_TLS);
> +}

The idea with ULP is that there is one ULP hook per protocol, not per driver.  

One thought is that apps/lib calling setsockopt pass the required ulp type [tls 
or chtls or xtls], this enables any HW assist to define base_prot as required 
and keep common code [tls_main] independent of underlying HW. 
If we are to have single TLS ULP hook [good from user point] then need a way to 
determine which Inline tls hw is used? System with multiple Inline TLS capable 
hw and differing functionality would require checks in tls_main to exercise 
that specific functionality/callback?



Re: [4.15-rc9] fs_reclaim lockdep trace

2018-01-27 Thread Tetsuo Handa
Dave, would you try below patch?



>From cae2cbf389ae3cdef1b492622722b4aeb07eb284 Mon Sep 17 00:00:00 2001
From: Tetsuo Handa 
Date: Sun, 28 Jan 2018 14:17:14 +0900
Subject: [PATCH] lockdep: Fix fs_reclaim warning.

Dave Jones reported fs_reclaim lockdep warnings.

  
  WARNING: possible recursive locking detected
  4.15.0-rc9-backup-debug+ #1 Not tainted
  
  sshd/24800 is trying to acquire lock:
   (fs_reclaim){+.+.}, at: [<84f438c2>] 
fs_reclaim_acquire.part.102+0x5/0x30

  but task is already holding lock:
   (fs_reclaim){+.+.}, at: [<84f438c2>] 
fs_reclaim_acquire.part.102+0x5/0x30

  other info that might help us debug this:
   Possible unsafe locking scenario:

 CPU0
 
lock(fs_reclaim);
lock(fs_reclaim);

   *** DEADLOCK ***

   May be due to missing lock nesting notation

  2 locks held by sshd/24800:
   #0:  (sk_lock-AF_INET6){+.+.}, at: [<1a069652>] tcp_sendmsg+0x19/0x40
   #1:  (fs_reclaim){+.+.}, at: [<84f438c2>] 
fs_reclaim_acquire.part.102+0x5/0x30

  stack backtrace:
  CPU: 3 PID: 24800 Comm: sshd Not tainted 4.15.0-rc9-backup-debug+ #1
  Call Trace:
   dump_stack+0xbc/0x13f
   __lock_acquire+0xa09/0x2040
   lock_acquire+0x12e/0x350
   fs_reclaim_acquire.part.102+0x29/0x30
   kmem_cache_alloc+0x3d/0x2c0
   alloc_extent_state+0xa7/0x410
   __clear_extent_bit+0x3ea/0x570
   try_release_extent_mapping+0x21a/0x260
   __btrfs_releasepage+0xb0/0x1c0
   btrfs_releasepage+0x161/0x170
   try_to_release_page+0x162/0x1c0
   shrink_page_list+0x1d5a/0x2fb0
   shrink_inactive_list+0x451/0x940
   shrink_node_memcg.constprop.88+0x4c9/0x5e0
   shrink_node+0x12d/0x260
   try_to_free_pages+0x418/0xaf0
   __alloc_pages_slowpath+0x976/0x1790
   __alloc_pages_nodemask+0x52c/0x5c0
   new_slab+0x374/0x3f0
   ___slab_alloc.constprop.81+0x47e/0x5a0
   __slab_alloc.constprop.80+0x32/0x60
   __kmalloc_track_caller+0x267/0x310
   __kmalloc_reserve.isra.40+0x29/0x80
   __alloc_skb+0xee/0x390
   sk_stream_alloc_skb+0xb8/0x340
   tcp_sendmsg_locked+0x8e6/0x1d30
   tcp_sendmsg+0x27/0x40
   inet_sendmsg+0xd0/0x310
   sock_write_iter+0x17a/0x240
   __vfs_write+0x2ab/0x380
   vfs_write+0xfb/0x260
   SyS_write+0xb6/0x140
   do_syscall_64+0x1e5/0xc05
   entry_SYSCALL64_slow_path+0x25/0x25

Since no fs locks are held, doing GFP_KERNEL allocation should be safe
as long as there is PF_MEMALLOC safeguard (

  /* Avoid recursion of direct reclaim */
  if (p->flags & PF_MEMALLOC)
  goto nopage;

) which prevents infinite recursion.

This warning seems to be caused by commit d92a8cfcb37ecd13
("locking/lockdep: Rework FS_RECLAIM annotation") which moved the
location of

  /* this guy won't enter reclaim */
  if ((current->flags & PF_MEMALLOC) && !(gfp_mask & __GFP_NOMEMALLOC))
  return false;

check added by commit cf40bd16fdad42c0 ("lockdep: annotate reclaim context
(__GFP_NOFS)"). Since __kmalloc_reserve() from __alloc_skb() adds
__GFP_NOMEMALLOC | __GFP_NOWARN to gfp_mask, __need_fs_reclaim() is
failing to return false despite PF_MEMALLOC context (and resulted in
lockdep warning).

Since there was no PF_MEMALLOC safeguard as of cf40bd16fdad42c0, checking
__GFP_NOMEMALLOC might make sense. But since this safeguard was added by
commit 341ce06f69abfafa ("page allocator: calculate the alloc_flags for
allocation only once"), checking __GFP_NOMEMALLOC no longer makes sense.
Thus, let's remove __GFP_NOMEMALLOC check and allow __need_fs_reclaim() to
return false.

Reported-by: Dave Jones 
Signed-off-by: Tetsuo Handa 
Cc: Peter Zijlstra 
Cc: Nick Piggin 
---
 mm/page_alloc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 76c9688..7804b0e 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3583,7 +3583,7 @@ static bool __need_fs_reclaim(gfp_t gfp_mask)
return false;
 
/* this guy won't enter reclaim */
-   if ((current->flags & PF_MEMALLOC) && !(gfp_mask & __GFP_NOMEMALLOC))
+   if (current->flags & PF_MEMALLOC)
return false;
 
/* We're only interested __GFP_FS allocations for now */
-- 
1.8.3.1