Re: [PATCH net] net: bridge: start hello timer only if device is up

2017-06-01 Thread Sebastian Ott
On Thu, 1 Jun 2017, Nikolay Aleksandrov wrote:
> When the transition of NO_STP -> KERNEL_STP was fixed by always calling
> mod_timer in br_stp_start, it introduced a new regression which causes
> the timer to be armed even when the bridge is down, and since we stop
> the timers in its ndo_stop() function, they never get disabled if the
> device is destroyed before it's upped.
> 
> To reproduce:
> $ while :; do ip l add br0 type bridge hello_time 100; brctl stp br0 on;
> ip l del br0; done;
> 
> CC: Xin Long <lucien@gmail.com>
> CC: Ivan Vecera <c...@cera.cz>
> CC: Sebastian Ott <seb...@linux.vnet.ibm.com>
> Reported-by: Sebastian Ott <seb...@linux.vnet.ibm.com>
> Fixes: 6d18c732b95c ("bridge: start hello_timer when enabling KERNEL_STP in 
> br_stp_start")
> Signed-off-by: Nikolay Aleksandrov <niko...@cumulusnetworks.com>
> ---
> Sebastian it'd be great if you can test the patch as well.

I did and can confirm that it fixes the problem.

Thanks,
Sebastian



Re: Oops with commit 6d18c73 bridge: start hello_timer when enabling KERNEL_STP in br_stp_start

2017-06-01 Thread Sebastian Ott
On Thu, 1 Jun 2017, Xin Long wrote:
> On Thu, Jun 1, 2017 at 12:32 AM, Sebastian Ott
> <seb...@linux.vnet.ibm.com> wrote:
> > [...]
> I couldn't see any bridge-related thing here, and it couldn't be reproduced
> with virbr0 (stp=1) on my box (on both s390x and x86_64), I guess there
> is something else in you machine.
> 
> With the latest upstream kernel, can you remove libvirt (virbr0) and boot your
> machine normally, then:
> # brctl addbr br0
> # ip link set br0 up
> # brctl stp br0 on
> 
> to check if it will still hang.

Nope. That doesn't hang.


> If it can't be reproduced in this way, pls add this on your kernel:
> 
> --- a/net/bridge/br_stp_if.c
> +++ b/net/bridge/br_stp_if.c
> @@ -178,9 +178,11 @@ static void br_stp_start(struct net_bridge *br)
> br->stp_enabled = BR_KERNEL_STP;
> br_debug(br, "using kernel STP\n");
> 
> +   WARN_ON(1);
> /* To start timers on any ports left in blocking */
> mod_timer(>hello_timer, jiffies + br->hello_time);
> br_port_state_selection(br);
> +   pr_warn("hello timer start done\n");
> }
> 
> spin_unlock_bh(>lock);
> diff --git a/net/bridge/br_stp_timer.c b/net/bridge/br_stp_timer.c
> index 60b6fe2..c98b3e5 100644
> --- a/net/bridge/br_stp_timer.c
> +++ b/net/bridge/br_stp_timer.c
> @@ -40,7 +40,7 @@ static void br_hello_timer_expired(unsigned long arg)
> if (br->dev->flags & IFF_UP) {
> br_config_bpdu_generation(br);
> 
> -   if (br->stp_enabled == BR_KERNEL_STP)
> +   if (br->stp_enabled != BR_USER_STP)
> mod_timer(>hello_timer,
>   round_jiffies(jiffies + br->hello_time));
> 
> 
> let's see if it hangs when starting the timer. Thanks.

No hang either:

[  134.018104] [ cut here ]
[  134.018144] WARNING: CPU: 1 PID: 1339 at net/bridge/br_stp_if.c:181 
br_stp_set_enabled+0x154/0x2b0 [bridge]
[  134.018149] Modules linked in: bridge stp llc rdma_ucm ib_ucm ib_uverbs [...]
[  134.018257] CPU: 1 PID: 1339 Comm: brctl Not tainted 
4.12.0-rc3-00011-gf511c0b-dirty #587
[  134.018262] Hardware name: IBM 2827 H66 705 (LPAR)
[  134.018266] task: d141c100 task.stack: d143
[  134.018271] Krnl PSW : 0704f0018000 03ff802bc4c4 
(br_stp_set_enabled+0x154/0x2b0 [bridge])
[  134.018286]R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:3 PM:0 
RI:0 EA:3
[  134.018294] Krnl GPRS: c5eae501 05dc 0bb8 
0001
[  134.018298]03ff802bc42c d1433c78 0001 
d3ad2d60
[  134.018303]0002 03ff802c21a8 d3ad2d60 
fffe
[  134.018308]d1671738 26a0 03ff802bc42c 
d1433c38
[  134.018320] Krnl Code: 03ff802bc4b4: e54ca9180001mvhi
2328(%r10),1
  03ff802bc4ba: c004brcl
0,3ff802bc4ba
 #03ff802bc4c0: a7f40001brc 
15,3ff802bc4c2
 >03ff802bc4c4: c418b5aalgrl
%r1,3ff802b3018
  03ff802bc4ca: 4120ac10la  
%r2,3088(%r10)
  03ff802bc4ce: e3301004lg  
%r3,0(%r1)
  03ff802bc4d4: e330a8d80008ag  
%r3,2264(%r10)
  03ff802bc4da: c0e5bc8bbrasl   
%r14,3ff802b3df0
[  134.018374] Call Trace:
[  134.018384] ([<03ff802bc42c>] br_stp_set_enabled+0xbc/0x2b0 [bridge])
[  134.018393]  [<03ff802c21d2>] set_stp_state+0x2a/0x40 [bridge] 
[  134.018402]  [<03ff802c0f30>] store_bridge_parm+0xa8/0xf8 [bridge] 
[  134.018410]  [<004012f2>] kernfs_fop_write+0x132/0x208 
[  134.018417]  [<0036088e>] __vfs_write+0x36/0x140 
[  134.018422]  [<00361b54>] vfs_write+0xbc/0x1a0 
[  134.018427]  [<0036323e>] SyS_write+0x66/0xc0 
[  134.018434]  [<008ccc80>] system_call+0xc4/0x28c 
[  134.018438] 5 locks held by brctl/1339:
[  134.018443]  #0:  (sb_writers#5){.+.+.+}, at: [<00361b3e>] 
vfs_write+0xa6/0x1a0
[  134.018462]  #1:  (>mutex){+.+.+.}, at: [<00401372>] 
kernfs_fop_write+0x1b2/0x208
[  134.018478]  #2:  (s_active#116){.+.+.+}, at: [<0040137e>] 
kernfs_fop_write+0x1be/0x208
[  134.018496]  #3:  (rtnl_mutex){+.+.+.}, at: [<03ff802c0f08>] 
store_bridge_parm+0x80/0xf8 [bridge]
[  134.018517]  #4:  (&(>lock)->rlock){+.}, at: [<03ff802bc42c>] 
br_stp_set_enabled+0xbc/0x2b0 [bridge]
[  134.018537] Last 

Oops with commit 6d18c73 bridge: start hello_timer when enabling KERNEL_STP in br_stp_start

2017-05-31 Thread Sebastian Ott
Hi,

A system running v4.12-rc3-11-gf511c0b on s390 hangs after boot with no
messages on the console. The message buffer obtained via a system dump
looked like this:

[...]
[   17.870712] virbr0: port 1(virbr0-nic) entered disabled state
[   19.618523] Unable to handle kernel pointer dereference in virtual kernel 
address space
[  250.028426] INFO: task jbd2/dasda1-8:100 blocked for more than 120 seconds.
[  250.028427]   Not tainted 4.12.0-rc3-00011-gf511c0b #573
[  250.028428] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[  250.028429] jbd2/dasda1-8   D12808   100  2 0x
[  250.028437] Stack:
[  250.028437]e8c4f9b0  00233afe 
e8c48100
[  250.028441]e8c4f978 001b1c98 e8c4f978 
e8c4f9d8
[  250.028444]0400efdcce00 e8c48890  
efdcce18
[  250.028447]e8c48100 efdcce00 e8ce8100 
e73c6900
[  250.028450]008da090 008c4f54 e8c4f9d8 
e8c4fa60
[  250.028453] Call Trace:
[  250.028458] ([<008c4f54>] __schedule+0xb14/0xc90)
[  250.028459]  [<008c5164>] schedule+0x94/0xc0 
[  250.028462]  [<001802ac>] io_schedule+0x34/0x58 
[  250.028464]  [<002a44c2>] wait_on_page_bit+0x16a/0x198 
[  250.028465]  [<002a4576>] __filemap_fdatawait_range+0x86/0x188 
[  250.028467]  [<002a46a6>] filemap_fdatawait_range+0x2e/0x58 
[  250.028471]  [<004719d4>] 
jbd2_journal_commit_transaction+0x10e4/0x2200 
[  250.028473]  [<0047890a>] kjournald2+0xda/0x2c0 
[  250.028475]  [<0016da5e>] kthread+0x166/0x178 
[  250.028477]  [<008cce7a>] kernel_thread_starter+0x6/0xc 
[  250.028479]  [<008cce74>] kernel_thread_starter+0x0/0xc 
[  250.028480] INFO: lockdep is turned off.
[...]

The system should have oopsed after
[   19.618523] Unable to handle kernel pointer dereference in virtual kernel 
address space

not sure why it didn't. Anyway, I bisected that to:

commit 6d18c732b95c0a9d35e9f978b4438bba15412284
Author: Xin Long 
Date:   Fri May 19 22:20:29 2017 +0800

bridge: start hello_timer when enabling KERNEL_STP in br_stp_start

Since commit 76b91c32dd86 ("bridge: stp: when using userspace stp stop
kernel hello and hold timers"), bridge would not start hello_timer if
stp_enabled is not KERNEL_STP when br_dev_open.

The problem is even if users set stp_enabled with KERNEL_STP later,
the timer will still not be started. It causes that KERNEL_STP can
not really work. Users have to re-ifup the bridge to avoid this.

This patch is to fix it by starting br->hello_timer when enabling
KERNEL_STP in br_stp_start.

As an improvement, it's also to start hello_timer again only when
br->stp_enabled is KERNEL_STP in br_hello_timer_expired, there is
no reason to start the timer again when it's NO_STP.

Fixes: 76b91c32dd86 ("bridge: stp: when using userspace stp stop kernel 
hello and hold timers")
Reported-by: Haidong Li 
Signed-off-by: Xin Long 
Acked-by: Nikolay Aleksandrov 
Reviewed-by: Ivan Vecera 
Signed-off-by: David S. Miller 

No clue why this broke my system. I reverted that commit on top of 
v4.12-rc3-11-gf511c0b
to be extra sure and it booted normally.

Full dmesg, config, and bisect log are attached.

Regards,
Sebastian[0.882328] Linux version 4.12.0-rc3-00011-gf511c0b (root@r35lp51) (gcc 
version 6.3.1 20161221 (Red Hat 6.3.1-1.0.ibm) (GCC) ) #573 SMP PREEMPT Wed May 
31 13:07:36 CEST 2017
[0.882339] setup: Linux is running natively in 64-bit mode
[0.882378] setup: The maximum memory size is 4096MB
[0.882385] cma: Reserved 16 MiB at 0xff00
[0.882407] numa: NUMA mode: plain
[0.882434] cpu: 7 configured CPUs, 0 standby CPUs
[0.882450] cpu: The CPU configuration topology of the machine is: 0 0 0 4 6 
6 / 3
[0.882693] Write protected kernel read-only data: 11536k
[0.890690] Zone ranges:
[0.890696]   DMA  [mem 0x-0x7fff]
[0.890702]   Normal   [mem 0x8000-0x]
[0.890707] Movable zone start for each node
[0.890710] Early memory node ranges
[0.890713]   node   0: [mem 0x-0x]
[0.890717] Initmem setup node 0 [mem 0x-0x]
[0.890721] On node 0 totalpages: 1048576
[0.890725]   DMA zone: 8192 pages used for memmap
[0.890727]   DMA zone: 0 pages reserved
[0.890730]   DMA zone: 524288 pages, LIFO batch:31
[0.895228]   Normal zone: 8192 pages used for memmap
[0.895231]   Normal zone: 524288 pages, LIFO batch:31
[0.937694] percpu: Embedded 472 pages/cpu @ef2c8000 s1895168 r8192 
d29952 

mlx5: net_device.addr_list_lock usage before initialization

2016-12-13 Thread Sebastian Ott
Hi,

I ran into the following lockdep complaint:

[7.059561] INFO: trying to register non-static key.
[7.059566] the code is fine but needs lockdep annotation.
[7.059570] turning off the locking correctness validator.
[7.059579] CPU: 6 PID: 6 Comm: kworker/u32:0 Not tainted 
4.9.0-02683-g784243e-dirty #77
[7.059582] Hardware name: IBM  2964 N96  704
  (LPAR)
[7.061260] Workqueue: mlx5e mlx5e_set_rx_mode_work [mlx5_core]
[7.061268] Stack:
[7.061270]f95739c0 f9573a50 0003 

[7.061278]f9573af0 f9573a68 f9573a68 
0020
[7.061286] 0020 000a 
000a
[7.061294]000c f9573ab8  

[7.061301]008a1038 00112a50 f9573a50 
f9573aa8
[7.061314] Call Trace:
[7.061321] ([<0011292a>] show_trace+0x8a/0xe0)
[7.061327]  [<00112a00>] show_stack+0x80/0xd8
[7.061334]  [<005cdce6>] dump_stack+0x96/0xd8
[7.061338]  [<001ae352>] register_lock_class+0x1d2/0x530
[7.061341]  [<001b33f6>] __lock_acquire+0xfe/0x7d8
[7.061345]  [<001b4394>] lock_acquire+0x30c/0x358
[7.061352]  [<0089454c>] _raw_spin_lock_bh+0x64/0xa0
[7.062171]  [<03ff81465858>] mlx5e_set_rx_mode_work+0x248/0x490 
[mlx5_core]
[7.062178]  [<00163864>] process_one_work+0x41c/0x830
[7.062181]  [<00163f2c>] worker_thread+0x2b4/0x478
[7.062186]  [<0016c46c>] kthread+0x15c/0x170
[7.062190]  [<00895a52>] kernel_thread_starter+0x6/0xc
[7.062193]  [<00895a4c>] kernel_thread_starter+0x0/0xc
[7.062196] INFO: lockdep is turned off.

The problematic lock is net_device.addr_list_lock whose usage is
asynchronously triggered by:

mlx5e_add -> mlx5e_attach -> mlx5e_attach_netdev -> mlx5e_nic_enable 
[workq] mlx5e_set_rx_mode_work -> mlx5e_handle_netdev_addr -> 
mlx5e_sync_netdev_addr

Initialization of this lock is triggered by:
mlx5e_add -> register_netdev

...after the call to mlx5e_attach which is obviously racy.

Regards,
Sebastian



Re: [PATCH net-next V2 1/7] net/mlx5e: Implement Fragmented Work Queue (WQ)

2016-11-30 Thread Sebastian Ott
Hi,

On Wed, 30 Nov 2016, Saeed Mahameed wrote:
> From: Tariq Toukan 
> 
> Add new type of struct mlx5_frag_buf which is used to allocate fragmented
> buffers rather than contiguous, and make the Completion Queues (CQs) use
> it as they are big (default of 2MB per CQ in Striding RQ).
> 
> This fixes the failures of type:
> "mlx5e_open_locked: mlx5e_open_channels failed, -12"
> due to dma_zalloc_coherent insufficient contiguous coherent memory to
> satisfy the driver's request when the user tries to setup more or larger
> rings.

Thanks for that patch! I can confirm that this fixes the lage allocation
issue.

Regards,
Sebastian



mlx5: ifup failure due to huge allocation

2016-11-02 Thread Sebastian Ott
Hi,

Ifup on an interface provided by CX4 (MLX5 driver) on s390 fails with:

[   22.318553] [ cut here ]
[   22.318564] WARNING: CPU: 1 PID: 399 at mm/page_alloc.c:3421 
__alloc_pages_nodemask+0x2ee/0x1298
[   22.318568] Modules linked in: mlx4_ib ib_core mlx5_core mlx4_en mlx4_core 
[...]
[   22.318610] CPU: 1 PID: 399 Comm: NetworkManager Not tainted 4.8.0 #13
[   22.318614] Hardware name: IBM  2964 N96  704
  (LPAR)
[   22.318618] task: dbe1c008 task.stack: dd9e4000
[   22.318622] Krnl PSW : 0704c0018000 002a427e 
(__alloc_pages_nodemask+0x2ee/0x1298)
[   22.318631]R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 
RI:0 EA:3
   Krnl GPRS:  00ceb4d4 024080c0 
0001
[   22.318640]002a4204 a410 001f 
0001
[   22.318644]024080c0 0009  

[   22.318648]a400 0088ea30 002a4204 
dd9e7060
[   22.318660] Krnl Code: 002a4272: a7740592brc 7,2a4d96
  002a4276: 92011000mvi 0(%r1),1
 #002a427a: a7f40001brc 
15,2a427c
 >002a427e: a7f4058cbrc 
15,2a4d96
  002a4282: 5830f0b4l   
%r3,180(%r15)
  002a4286: 5030f0ecst  
%r3,236(%r15)
  002a428a: 1823lr  %r2,%r3
  002a428c: a53e0048llilh   %r3,72
[   22.318695] Call Trace:
[   22.318700] ([<002a4204>] __alloc_pages_nodemask+0x274/0x1298)
[   22.318706] ([<0030dac0>] alloc_pages_current+0x1c0/0x268)
[   22.318712] ([<00135aa6>] s390_dma_alloc+0x6e/0x1e0)
[   22.318733] ([<03ff8015474c>] mlx5_dma_zalloc_coherent_node+0xb4/0xf8 
[mlx5_core])
[   22.318748] ([<03ff80154c58>] mlx5_buf_alloc_node+0x70/0x108 [mlx5_core])
[   22.318765] ([<03ff8015fe06>] mlx5_cqwq_create+0xf6/0x180 [mlx5_core])
[   22.318783] ([<03ff8016654c>] mlx5e_open_cq+0xac/0x1e0 [mlx5_core])
[   22.318802] ([<03ff801693e6>] mlx5e_open_channels+0xe66/0xeb8 
[mlx5_core])
[   22.318820] ([<03ff8016982e>] mlx5e_open_locked+0x8e/0x1e0 [mlx5_core])
[   22.318837] ([<03ff801699c6>] mlx5e_open+0x46/0x68 [mlx5_core])
[   22.318844] ([<00748338>] __dev_open+0xa8/0x118)
[   22.318848] ([<0074867a>] __dev_change_flags+0xc2/0x190)
[   22.318853] ([<0074877e>] dev_change_flags+0x36/0x78)
[   22.318858] ([<0075bc8a>] do_setlink+0x332/0xb30)
[   22.318862] ([<0075de3a>] rtnl_newlink+0x3e2/0x820)
[   22.318867] ([<0075e46e>] rtnetlink_rcv_msg+0x1f6/0x248)
[   22.318873] ([<00782202>] netlink_rcv_skb+0x92/0x108)
[   22.318878] ([<0075c668>] rtnetlink_rcv+0x48/0x58)
[   22.318882] ([<00781ace>] netlink_unicast+0x14e/0x1f0)
[   22.318887] ([<00781f82>] netlink_sendmsg+0x32a/0x3b0)
[   22.318892] ([<0071d502>] sock_sendmsg+0x5a/0x80)
[   22.318897] ([<0071ed38>] ___sys_sendmsg+0x270/0x2a8)
[   22.318901] ([<0071fe80>] __sys_sendmsg+0x60/0x90)
[   22.318905] ([<007207c6>] SyS_socketcall+0x2be/0x388)
[   22.318912] ([<0086fcae>] system_call+0xd6/0x270)
[   22.318916] 3 locks held by NetworkManager/399:
[   22.318920]  #0:  (rtnl_mutex){+.+.+.}, at: [<0075c658>] 
rtnetlink_rcv+0x38/0x58
[   22.318935]  #1:  (>state_lock){+.+.+.}, at: [<03ff801699bc>] 
mlx5e_open+0x3c/0x68 [mlx5_core]
[   22.318962]  #2:  (>alloc_mutex){+.+.+.}, at: [<03ff801546e0>] 
mlx5_dma_zalloc_coherent_node+0x48/0xf8 [mlx5_core]
[   22.318987] Last Breaking-Event-Address:
[   22.318992]  [<002a427a>] __alloc_pages_nodemask+0x2ea/0x1298
[   22.318996] ---[ end trace d2b54f5a0cd00b89 ]---
[   22.319001] mlx5_core 0001:00:00.0: 0001:00:00.0:mlx5_cqwq_create:121:(pid 
399): mlx5_buf_alloc_node() failed, -12
[   22.320548] mlx5_core 0001:00:00.0 enP1s171: mlx5e_open_locked: 
mlx5e_open_channels failed, -12



This fails because the largest possible allocation on s390 is currently 1MB 
(order 8).
Would it be possible to add the __GFP_NOWARN flag and try a smaller allocation 
if the
big one failed? (The latter change also would make the device usable when it is 
added
via hotplug and free memory is scattered).

Regards,
Sebastian



mlx4: panic during shutdown

2016-10-19 Thread Sebastian Ott
Hi,

After a userspace update (fedora 23->24) I reproducibly run into the
following oops during shutdown (on s390):

[   71.054832] Unable to handle kernel pointer dereference in virtual kernel 
address space
[   71.054835] Failing address: 6b6b6b6b6b6b6000 TEID: 6b6b6b6b6b6b6803
[   71.054838] Fault in home space mode while using kernel ASCE.
[   71.054847] AS:00f70007 R3:0024 
[   71.054883] Oops: 0038 ilc:3 [#1] PREEMPT SMP 
[   71.054887] Modules linked in: mlx4_ib ib_core mlx4_en ptp pps_core 
mlx4_core [...]
[   71.054912] CPU: 8 PID: 809 Comm: kworker/8:6 Not tainted 
4.8.0-02896-g7137af2-dirty #6
[   71.054913] Hardware name: IBM  2964 N96  704
  (LPAR)
[   71.054919] Workqueue: events linkwatch_event
[   71.054921] task: dbea0008 task.stack: dbea4000
[   71.054923] Krnl PSW : 0704e0018000 03ff8007a496 
(mlx4_en_get_phys_port_id+0x66/0xb0 [mlx4_en])
[   71.054933]R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 
RI:0 EA:3
   Krnl GPRS: 0080 0268 004e 
001c33e0
[   71.054937]03ff8007a486 00882790 6b6b6b6b6b6b6b6b 
0010
[   71.054939]dbea7b18 6b6b6b6b6b6b6b6b dbea7b18 
e72e
[   71.054941]f15ec900  03ff8007a486 
dbea79c8
[   71.054950] Krnl Code: 03ff8007a486: e310b81c0d14lgf 
%r1,55324(%r11)
  03ff8007a48c: a71b004baghi%r1,75
 #03ff8007a490: eb110003000dsllg
%r1,%r1,3
 >03ff8007a496: e3119002ltg 
%r1,0(%r1,%r9)
  03ff8007a49c: a7840015brc 
8,3ff8007a4c6
  03ff8007a4a0: 9208a020mvi 
32(%r10),8
  03ff8007a4a4: 4130a007la  
%r3,7(%r10)
  03ff8007a4a8: a7290008lghi%r2,8
[   71.054965] Call Trace:
[   71.054969] ([<03ff8007a486>] mlx4_en_get_phys_port_id+0x56/0xb0 
[mlx4_en])
[   71.054971] ([<00760b94>] rtnl_fill_ifinfo+0x4ec/0xc90)
[   71.054974] ([<00764fae>] rtmsg_ifinfo_build_skb+0x96/0xe8)
[   71.054976] ([<00765038>] rtmsg_ifinfo+0x38/0x78)
[   71.054978] ([<0074150e>] netdev_state_change+0x5e/0x70)
[   71.054981] ([<00765ca6>] linkwatch_do_dev+0x66/0xc8)
[   71.054983] ([<00765fd6>] __linkwatch_run_queue+0x13e/0x190)
[   71.054985] ([<00766070>] linkwatch_event+0x48/0x58)
[   71.054988] ([<00162a2e>] process_one_work+0x3fe/0x820)
[   71.054990] ([<001630e6>] worker_thread+0x296/0x460)
[   71.054992] ([<0016b41a>] kthread+0x112/0x120)
[   71.054996] ([<008762b2>] kernel_thread_starter+0x6/0xc)
[   71.054998] ([<008762ac>] kernel_thread_starter+0x0/0xc)
[   71.055000] INFO: lockdep is turned off.
[   71.055001] Last Breaking-Event-Address:
[   71.055004]  [<00294480>] printk+0xc8/0xd0
[   71.055006]  
[   71.055008] Kernel panic - not syncing: Fatal exception: panic_on_oops


This was observed with 4.8 but it's also reproducible on 4.9-rc1.
In mlx4_en_get_phys_port_id (which looks like it's called from userspace
via sysfs) the data behind mlx4_en_priv->mdev is already freed.

The problem probably is that the lifetime of mlx4_en_priv->mdev seems to
be shorter than that of struct net_device (and mlx4_en_get_phys_port_id
can be called as long as struct net_device exists).

Regards,
Sebastian



Re: [PATCH] net/mlx4_en: fix off by one in error handling

2016-09-14 Thread Sebastian Ott
On Wed, 14 Sep 2016, Tariq Toukan wrote:
> On 14/09/2016 4:53 PM, Sebastian Ott wrote:
> > On Wed, 14 Sep 2016, Tariq Toukan wrote:
> > > On 14/09/2016 2:09 PM, Sebastian Ott wrote:
> > > > If an error occurs in mlx4_init_eq_table the index used in the
> > > > err_out_unmap label is one too big which results in a panic in
> > > > mlx4_free_eq. This patch fixes the index in the error path.
> > > You are right, but your change below does not cover all cases.
> > > The full solution looks like this:
> > >
> > > @@ -1260,7 +1260,7 @@ int mlx4_init_eq_table(struct mlx4_dev *dev)
> > >   eq);
> > >  }
> > >  if (err)
> > > -   goto err_out_unmap;
> > > +   goto err_out_unmap_excluded;
> > In this case a call to mlx4_create_eq failed. Do you really have to call
> > mlx4_free_eq for this index again?
>
> We agree on this part, that's why here we should goto the _excluded_ label.
> For all other parts, we should not exclude the eq in the highest index, and
> thus we goto the _non_excluded_ label.

But that's exactly what the original patch does. If the failure is within
the for loop at index i, we do the cleanup starting at index i-1. If the
failure is after the for loop then i == dev->caps.num_comp_vectors + 1
and we do the cleanup starting at index i == dev->caps.num_comp_vectors.

In the latter case your patch would have an out of bounds array access.

Regards,
Sebastian



Re: [PATCH] net/mlx4_en: fix off by one in error handling

2016-09-14 Thread Sebastian Ott
Hello Tariq,

On Wed, 14 Sep 2016, Tariq Toukan wrote:
> On 14/09/2016 2:09 PM, Sebastian Ott wrote:
> > If an error occurs in mlx4_init_eq_table the index used in the
> > err_out_unmap label is one too big which results in a panic in
> > mlx4_free_eq. This patch fixes the index in the error path.
> You are right, but your change below does not cover all cases.
> The full solution looks like this:
> 
> @@ -1260,7 +1260,7 @@ int mlx4_init_eq_table(struct mlx4_dev *dev)
>  eq);
> }
> if (err)
> -   goto err_out_unmap;
> +   goto err_out_unmap_excluded;

In this case a call to mlx4_create_eq failed. Do you really have to call
mlx4_free_eq for this index again? As far as I understood this code
mlx4_create_eq cleans up when it fails and thus there is no need for an
additional mlx4_free_eq call.

Regards,
Sebastian



[PATCH] net/mlx4_en: fix off by one in error handling

2016-09-14 Thread Sebastian Ott
If an error occurs in mlx4_init_eq_table the index used in the
err_out_unmap label is one too big which results in a panic in
mlx4_free_eq. This patch fixes the index in the error path.

Signed-off-by: Sebastian Ott <seb...@linux.vnet.ibm.com>
---
 drivers/net/ethernet/mellanox/mlx4/eq.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/eq.c 
b/drivers/net/ethernet/mellanox/mlx4/eq.c
index f613977..cf8f8a7 100644
--- a/drivers/net/ethernet/mellanox/mlx4/eq.c
+++ b/drivers/net/ethernet/mellanox/mlx4/eq.c
@@ -1305,8 +1305,8 @@ int mlx4_init_eq_table(struct mlx4_dev *dev)
return 0;
 
 err_out_unmap:
-   while (i >= 0)
-   mlx4_free_eq(dev, >eq_table.eq[i--]);
+   while (i > 0)
+   mlx4_free_eq(dev, >eq_table.eq[--i]);
 #ifdef CONFIG_RFS_ACCEL
for (i = 1; i <= dev->caps.num_ports; i++) {
if (mlx4_priv(dev)->port[i].rmap) {
-- 
2.5.5



ipv6 csum failures on MLX4 VFs

2016-02-12 Thread Sebastian Ott
Hi,

I receive tons of these csum failure warnings. The patch mentioned here:
https://patchwork.ozlabs.org/patch/512005/

seems to work. Will something like that be integrated upstream?

Sebastian



Re: mlx4: failed to allocate default counter port 1

2015-07-29 Thread Sebastian Ott
On Wed, 1 Jul 2015, Or Gerlitz wrote:
 On 6/30/2015 5:17 PM, Sebastian Ott wrote:
  On Tue, 30 Jun 2015, Or Gerlitz wrote:
   On 6/30/2015 4:24 PM, Sebastian Ott wrote:
  Do you run the VF on the same system/kernel as the PF, or the VF is
  probed to
  VM which runs the latest kernel and the PF runsolder kernel (which?)
The latter case. The PF is driven by a much older Kernel running OFED
2.3.2.0.0.1
   
   Can you try running the inbox PF driver that comes with the PF kernel
   (what
   kernel is that?) I'd like to see we're OK there.
  Frankly, I don't know. Plus I also don't know how to build an ofed kernel.
 
 
 I didn't want you to build that package, but rather the outer way around,
 namely
 see what happens if uninstalling this package and running with the mlx4 inbox
 PF
 driver from the kernel provided from your distro of choice or an
 upstreamkernel installed
 by you. Anyway, I hope the below patch would provide a quick band-aid and let
 you to continue running upstream VFs over that PF config, let me know (I will
 be
 OOO till Thu-Sun). Once we see how this behaves, will take it from there.

Any updates on this one?

Regards,
Sebastian

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: mlx4: failed to allocate default counter port 1

2015-07-02 Thread Sebastian Ott
On Wed, 1 Jul 2015, Or Gerlitz wrote:
 On Wed, Jul 1, 2015 at 5:18 PM, Sebastian Ott seb...@linux.vnet.ibm.com 
 wrote:
  OK, using this patch it worked:
 
 yep, I forgot to recap err to zero.
 
 By it worked you mean the VF is live and kicking, all functionality
 you had before the 4.2 merge window is back again?
 

Yes.

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: mlx4: failed to allocate default counter port 1

2015-07-01 Thread Sebastian Ott
On Wed, 1 Jul 2015, Sebastian Ott wrote:

 On Wed, 1 Jul 2015, Or Gerlitz wrote:
  On 6/30/2015 5:17 PM, Sebastian Ott wrote:
   On Tue, 30 Jun 2015, Or Gerlitz wrote:
On 6/30/2015 4:24 PM, Sebastian Ott wrote:
   Do you run the VF on the same system/kernel as the PF, or the VF 
   is
   probed to
   VM which runs the latest kernel and the PF runsolder kernel 
   (which?)
 The latter case. The PF is driven by a much older Kernel running OFED
 2.3.2.0.0.1

Can you try running the inbox PF driver that comes with the PF kernel
(what
kernel is that?) I'd like to see we're OK there.
   Frankly, I don't know. Plus I also don't know how to build an ofed kernel.
  
  
  I didn't want you to build that package, but rather the outer way around,
  namely
  see what happens if uninstalling this package and running with the mlx4 
  inbox
  PF
  driver from the kernel provided from your distro of choice or an
  upstreamkernel installed
  by you. Anyway, I hope the below patch would provide a quick band-aid and 
  let
  you to continue running upstream VFs over that PF config, let me know (I 
  will
  be
  OOO till Thu-Sun). Once we see how this behaves, will take it from there.
 
 Thanks for the patch. Unfortunately, that didn't work:
 

OK, using this patch it worked:

diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c 
b/drivers/net/ethernet/mellanox/mlx4/main.c
index 12fbfcb..29c2a01 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -2273,6 +2273,11 @@ static int mlx4_allocate_default_counters(struct 
mlx4_dev *dev)
} else if (err == -ENOENT) {
err = 0;
continue;
+   } else if (mlx4_is_slave(dev)  err == -EINVAL) {
+   priv-def_counter[port] = MLX4_SINK_COUNTER_INDEX(dev);
+   mlx4_warn(dev, can't allocate counter from old PF 
driver, using index %d\n,
+ MLX4_SINK_COUNTER_INDEX(dev));
+   err = 0;
} else {
mlx4_err(dev, %s: failed to allocate default counter 
port %d err %d\n,
 __func__, port + 1, err);

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: mlx4: failed to allocate default counter port 1

2015-07-01 Thread Sebastian Ott
On Wed, 1 Jul 2015, Or Gerlitz wrote:
 On 6/30/2015 5:17 PM, Sebastian Ott wrote:
  On Tue, 30 Jun 2015, Or Gerlitz wrote:
   On 6/30/2015 4:24 PM, Sebastian Ott wrote:
  Do you run the VF on the same system/kernel as the PF, or the VF is
  probed to
  VM which runs the latest kernel and the PF runsolder kernel (which?)
The latter case. The PF is driven by a much older Kernel running OFED
2.3.2.0.0.1
   
   Can you try running the inbox PF driver that comes with the PF kernel
   (what
   kernel is that?) I'd like to see we're OK there.
  Frankly, I don't know. Plus I also don't know how to build an ofed kernel.
 
 
 I didn't want you to build that package, but rather the outer way around,
 namely
 see what happens if uninstalling this package and running with the mlx4 inbox
 PF
 driver from the kernel provided from your distro of choice or an
 upstreamkernel installed
 by you. Anyway, I hope the below patch would provide a quick band-aid and let
 you to continue running upstream VFs over that PF config, let me know (I will
 be
 OOO till Thu-Sun). Once we see how this behaves, will take it from there.

Thanks for the patch. Unfortunately, that didn't work:

[  170.531076] mlx4_core :00:00.0: NOP command IRQ test passed
[  170.531291] mlx4_core :00:00.0: can't allocate counter from old PF 
driver, using index 255
[  170.531294] mlx4_core :00:00.0: mlx4_allocate_default_counters: default 
counter index 255 for port 1
[  170.531531] mlx4_core :00:00.0: can't allocate counter from old PF 
driver, using index 255
[  170.531534] mlx4_core :00:00.0: mlx4_allocate_default_counters: default 
counter index 255 for port 2
[  170.531535] mlx4_core :00:00.0: Failed to allocate default counters, 
aborting
[  170.587306] mlx4_core: probe of :00:00.0 failed with error -22

Regards,
Sebastian

 
 Or.
 
 
 diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c
 b/drivers/net/ethernet/mellanox/mlx4/main.c
 index 12fbfcb..a66cc6e 100644
 --- a/drivers/net/ethernet/mellanox/mlx4/main.c
 +++ b/drivers/net/ethernet/mellanox/mlx4/main.c
 @@ -2273,6 +2273,10 @@ static int mlx4_allocate_default_counters(struct
 mlx4_dev *dev)
 } else if (err == -ENOENT) {
 err = 0;
 continue;
 +   } else if (mlx4_is_slave(dev)  err == -EINVAL) {
 +   priv-def_counter[port] =
 MLX4_SINK_COUNTER_INDEX(dev);
 +   mlx4_warn(dev, can't allocate counter from old PF
 driver, using index %d\n,
 + MLX4_SINK_COUNTER_INDEX(dev));
 } else {
 mlx4_err(dev, %s: failed to allocate default counter
 port %d err %d\n,
  __func__, port + 1, err);
 
 
 

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: mlx4: failed to allocate default counter port 1

2015-06-30 Thread Sebastian Ott
On Tue, 30 Jun 2015, Or Gerlitz wrote:
 On Tue, Jun 30, 2015 at 1:45 PM, Sebastian Ott
 seb...@linux.vnet.ibm.com wrote:
  after the latest mellanox update the mlx4 driver fails to probe a VF:
  [   88.909562] mlx4_core :00:00.0: mlx4_allocate_default_counters: 
  failed to allocate default counter port 1 err -22
  [   88.909564] mlx4_core :00:00.0: Failed to allocate default counters, 
  aborting
  [   88.961735] mlx4_core: probe of :00:00.0 failed with error -22
 
  PFs still work. See below for more dmesg output - I also added a line of
  debug output...maybe this helps.
 
 Can you please send your lspci | grep nox listing? also what

:00:00.0 Ethernet controller: Mellanox Technologies MT27500 Family 
[ConnectX-3 Virtual Function]

 Firmware version you have there? e.g when you probe the PF with
 mlx4_core debug_level=1 can you sens us the lines that follow the PF
 probe, e.g as here + dump of all caps as you did for the VF

I have access to 2 machines and run a guest instance on both machines:
* on one the guest has acccess to a PF, but VF enablement is disallowed
* on the other the hypervisor controls the PF and the guests have only
  access to the VFs - so I cannot say much about the PF here

At least I found out the FW version - it's: 2.33.5100

Regards,
Sebastian

 
   952.367911] mlx4_core: Mellanox ConnectX core driver v2.2-1 (Feb, 2014)
 [  952.374606] mlx4_core: Initializing :06:00.0
 [  953.384332] mlx4_core :06:00.0: FW version 2.34.5000 (cmd intf
 rev 3), max commands 16
 [...]
 
 Also send us the output of dmesg | grep -i counter after such verbose load.
 
 thanks,
 
 Or.
 
 

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: mlx4: failed to allocate default counter port 1

2015-06-30 Thread Sebastian Ott
On Tue, 30 Jun 2015, Or Gerlitz wrote:
 On 6/30/2015 1:45 PM, Sebastian Ott wrote:
  [   88.909558] mlx4_slave_cmd op=3840, ret=-22, status=3
  [   88.909562] mlx4_core :00:00.0: mlx4_allocate_default_counters:
  failed to allocate default counter port 1 err -22
  [   88.909564] mlx4_core :00:00.0: Failed to allocate default counters,
  aborting
  [   88.961735] mlx4_core: probe of :00:00.0 failed with error -22
 
 Do you run the VF on the same system/kernel as the PF, or the VF is probed to
 VM which runs the latest kernel and the PF runsolder kernel (which?)

The latter case. The PF is driven by a much older Kernel running OFED
2.3.2.0.0.1

 Can you also hook the PF code that serves this flow to see where we actually
 fail? basically, we should be going this way mlx4_ALLOC_RES_wrapper --
 counter_alloc_res -- so I'd like to see which of the branches in
 counter_alloc_res fails...

Nope, I don't have direct access to the PF, sry.

Sebastian
 
 Or.
 
 
 
 

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: mlx4: failed to allocate default counter port 1

2015-06-30 Thread Sebastian Ott
On Tue, 30 Jun 2015, Or Gerlitz wrote:
 On 6/30/2015 4:24 PM, Sebastian Ott wrote:
   Do you run the VF on the same system/kernel as the PF, or the VF is
   probed to
   VM which runs the latest kernel and the PF runsolder kernel (which?)
  The latter case. The PF is driven by a much older Kernel running OFED
  2.3.2.0.0.1
 
 
 Can you try running the inbox PF driver that comes with the PF kernel (what
 kernel is that?) I'd like to see we're OK there.

Frankly, I don't know. Plus I also don't know how to build an ofed kernel.

Regards,
Sebastian

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


mlx4: failed to allocate default counter port 1

2015-06-30 Thread Sebastian Ott
Hello,

after the latest mellanox update the mlx4 driver fails to probe a VF:
[   88.909562] mlx4_core :00:00.0: mlx4_allocate_default_counters: failed 
to allocate default counter port 1 err -22
[   88.909564] mlx4_core :00:00.0: Failed to allocate default counters, 
aborting
[   88.961735] mlx4_core: probe of :00:00.0 failed with error -22

PFs still work. See below for more dmesg output - I also added a line of
debug output...maybe this helps.

Regards,
Sebastian

# git diff
diff --git a/drivers/net/ethernet/mellanox/mlx4/cmd.c 
b/drivers/net/ethernet/mellanox/mlx4/cmd.c
index 8204013..e0c41c3 100644
--- a/drivers/net/ethernet/mellanox/mlx4/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx4/cmd.c
@@ -565,6 +565,9 @@ static int mlx4_slave_cmd(struct mlx4_dev *dev, u64 
in_param, u64 *out_param,
}
}
ret = mlx4_status_to_errno(vhcr-status);
+   if (ret)
+   printk(KERN_WARNING%s op=%d, ret=%d, 
status=%d\n,
+  __func__, op, ret, vhcr-status);
} else {
if (dev-persist-state 
MLX4_DEVICE_STATE_INTERNAL_ERROR)
# git describe
v4.1-11355-g6aaf0da
# dmesg
[   88.518946] mlx4_core: Mellanox ConnectX core driver v2.2-1 (Feb, 2014)
[   88.518967] mlx4_core: Initializing :00:00.0
[   88.519101] mlx4_core :00:00.0: enabling device ( - 0002)
[   88.519661] mlx4_core :00:00.0: enabling bus mastering
[   88.520279] mlx4_core :00:00.0: Detected virtual function - running in 
slave mode
[   88.520404] mlx4_core :00:00.0: Sending reset
[   88.526726] mlx4_core :00:00.0: Sending vhcr0
[   88.539676] mlx4_core :00:00.0: BlueFlame not available
[   88.539678] mlx4_core :00:00.0: Base MM extensions: flags 31104ec2, rsvd 
L_Key 8000
[   88.539680] mlx4_core :00:00.0: Max ICM size 4294967296 MB
[   88.539682] mlx4_core :00:00.0: Max QPs: 16777216, reserved QPs: 64, 
entry size: 256
[   88.539683] mlx4_core :00:00.0: Max SRQs: 16777216, reserved SRQs: 64, 
entry size: 128
[   88.539685] mlx4_core :00:00.0: Max CQs: 16777216, reserved CQs: 128, 
entry size: 128
[   88.539687] mlx4_core :00:00.0: Num sys EQs: 1024, max EQs: 512, 
reserved EQs: 8, entry size: 128
[   88.539688] mlx4_core :00:00.0: reserved MPTs: 256, reserved MTTs: 64
[   88.539690] mlx4_core :00:00.0: Max PDs: 131072, reserved PDs: 4, 
reserved UARs: 2
[   88.539691] mlx4_core :00:00.0: Max QP/MCG: 131072, reserved MGMs: 0
[   88.539693] mlx4_core :00:00.0: Max CQEs: 4194304, max WQEs: 16384, max 
SRQ WQEs: 16384
[   88.539695] mlx4_core :00:00.0: Local CA ACK delay: 15, max MTU: 4096, 
port width cap: 3
[   88.539696] mlx4_core :00:00.0: Max SQ desc size: 1008, max SQ S/G: 62
[   88.539698] mlx4_core :00:00.0: Max RQ desc size: 512, max RQ S/G: 32
[   88.539699] mlx4_core :00:00.0: Max GSO size: 131072
[   88.539701] mlx4_core :00:00.0: Max counters: 256
[   88.539702] mlx4_core :00:00.0: Max RSS Table size: 256
[   88.539704] mlx4_core :00:00.0: DMFS high rate steer QPn base: 64
[   88.539705] mlx4_core :00:00.0: DMFS high rate steer QPn range: 254
[   88.539707] mlx4_core :00:00.0: QP Rate-Limit: #rates 1024, unit/val max 
3/40, min 1/512
[   88.539709] mlx4_core :00:00.0: DEV_CAP flags:
[   88.539710] mlx4_core :00:00.0: RC transport
[   88.539711] mlx4_core :00:00.0: UC transport
[   88.539713] mlx4_core :00:00.0: UD transport
[   88.539714] mlx4_core :00:00.0: XRC transport
[   88.539716] mlx4_core :00:00.0: SRQ support
[   88.539717] mlx4_core :00:00.0: IPoIB checksum offload
[   88.539719] mlx4_core :00:00.0: P_Key violation counter
[   88.539720] mlx4_core :00:00.0: Q_Key violation counter
[   88.539722] mlx4_core :00:00.0: Big LSO headers
[   88.539723] mlx4_core :00:00.0: MW support
[   88.539724] mlx4_core :00:00.0: APM support
[   88.539726] mlx4_core :00:00.0: Atomic ops support
[   88.539727] mlx4_core :00:00.0: Address vector port checking support
[   88.539729] mlx4_core :00:00.0: UD multicast support
[   88.539730] mlx4_core :00:00.0: IBoE support
[   88.539732] mlx4_core :00:00.0: Unicast loopback support
[   88.539733] mlx4_core :00:00.0: FCS header control
[   88.539735] mlx4_core :00:00.0: UDP RSS support
[   88.539736] mlx4_core :00:00.0: Unicast VEP steering support
[   88.539738] mlx4_core :00:00.0: Multicast VEP steering support
[   88.539739] mlx4_core :00:00.0: Counters support
[   88.539741] mlx4_core :00:00.0: RSS IP fragments support
[   88.539742] mlx4_core :00:00.0: Port ETS Scheduler support
[   88.539744] mlx4_core :00:00.0: Port link type sensing support
[   88.539745] mlx4_core