Re: [PATCH net] net: bridge: start hello timer only if device is up
On Thu, 1 Jun 2017, Nikolay Aleksandrov wrote: > When the transition of NO_STP -> KERNEL_STP was fixed by always calling > mod_timer in br_stp_start, it introduced a new regression which causes > the timer to be armed even when the bridge is down, and since we stop > the timers in its ndo_stop() function, they never get disabled if the > device is destroyed before it's upped. > > To reproduce: > $ while :; do ip l add br0 type bridge hello_time 100; brctl stp br0 on; > ip l del br0; done; > > CC: Xin Long <lucien@gmail.com> > CC: Ivan Vecera <c...@cera.cz> > CC: Sebastian Ott <seb...@linux.vnet.ibm.com> > Reported-by: Sebastian Ott <seb...@linux.vnet.ibm.com> > Fixes: 6d18c732b95c ("bridge: start hello_timer when enabling KERNEL_STP in > br_stp_start") > Signed-off-by: Nikolay Aleksandrov <niko...@cumulusnetworks.com> > --- > Sebastian it'd be great if you can test the patch as well. I did and can confirm that it fixes the problem. Thanks, Sebastian
Re: Oops with commit 6d18c73 bridge: start hello_timer when enabling KERNEL_STP in br_stp_start
On Thu, 1 Jun 2017, Xin Long wrote: > On Thu, Jun 1, 2017 at 12:32 AM, Sebastian Ott > <seb...@linux.vnet.ibm.com> wrote: > > [...] > I couldn't see any bridge-related thing here, and it couldn't be reproduced > with virbr0 (stp=1) on my box (on both s390x and x86_64), I guess there > is something else in you machine. > > With the latest upstream kernel, can you remove libvirt (virbr0) and boot your > machine normally, then: > # brctl addbr br0 > # ip link set br0 up > # brctl stp br0 on > > to check if it will still hang. Nope. That doesn't hang. > If it can't be reproduced in this way, pls add this on your kernel: > > --- a/net/bridge/br_stp_if.c > +++ b/net/bridge/br_stp_if.c > @@ -178,9 +178,11 @@ static void br_stp_start(struct net_bridge *br) > br->stp_enabled = BR_KERNEL_STP; > br_debug(br, "using kernel STP\n"); > > + WARN_ON(1); > /* To start timers on any ports left in blocking */ > mod_timer(>hello_timer, jiffies + br->hello_time); > br_port_state_selection(br); > + pr_warn("hello timer start done\n"); > } > > spin_unlock_bh(>lock); > diff --git a/net/bridge/br_stp_timer.c b/net/bridge/br_stp_timer.c > index 60b6fe2..c98b3e5 100644 > --- a/net/bridge/br_stp_timer.c > +++ b/net/bridge/br_stp_timer.c > @@ -40,7 +40,7 @@ static void br_hello_timer_expired(unsigned long arg) > if (br->dev->flags & IFF_UP) { > br_config_bpdu_generation(br); > > - if (br->stp_enabled == BR_KERNEL_STP) > + if (br->stp_enabled != BR_USER_STP) > mod_timer(>hello_timer, > round_jiffies(jiffies + br->hello_time)); > > > let's see if it hangs when starting the timer. Thanks. No hang either: [ 134.018104] [ cut here ] [ 134.018144] WARNING: CPU: 1 PID: 1339 at net/bridge/br_stp_if.c:181 br_stp_set_enabled+0x154/0x2b0 [bridge] [ 134.018149] Modules linked in: bridge stp llc rdma_ucm ib_ucm ib_uverbs [...] [ 134.018257] CPU: 1 PID: 1339 Comm: brctl Not tainted 4.12.0-rc3-00011-gf511c0b-dirty #587 [ 134.018262] Hardware name: IBM 2827 H66 705 (LPAR) [ 134.018266] task: d141c100 task.stack: d143 [ 134.018271] Krnl PSW : 0704f0018000 03ff802bc4c4 (br_stp_set_enabled+0x154/0x2b0 [bridge]) [ 134.018286]R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:3 PM:0 RI:0 EA:3 [ 134.018294] Krnl GPRS: c5eae501 05dc 0bb8 0001 [ 134.018298]03ff802bc42c d1433c78 0001 d3ad2d60 [ 134.018303]0002 03ff802c21a8 d3ad2d60 fffe [ 134.018308]d1671738 26a0 03ff802bc42c d1433c38 [ 134.018320] Krnl Code: 03ff802bc4b4: e54ca9180001mvhi 2328(%r10),1 03ff802bc4ba: c004brcl 0,3ff802bc4ba #03ff802bc4c0: a7f40001brc 15,3ff802bc4c2 >03ff802bc4c4: c418b5aalgrl %r1,3ff802b3018 03ff802bc4ca: 4120ac10la %r2,3088(%r10) 03ff802bc4ce: e3301004lg %r3,0(%r1) 03ff802bc4d4: e330a8d80008ag %r3,2264(%r10) 03ff802bc4da: c0e5bc8bbrasl %r14,3ff802b3df0 [ 134.018374] Call Trace: [ 134.018384] ([<03ff802bc42c>] br_stp_set_enabled+0xbc/0x2b0 [bridge]) [ 134.018393] [<03ff802c21d2>] set_stp_state+0x2a/0x40 [bridge] [ 134.018402] [<03ff802c0f30>] store_bridge_parm+0xa8/0xf8 [bridge] [ 134.018410] [<004012f2>] kernfs_fop_write+0x132/0x208 [ 134.018417] [<0036088e>] __vfs_write+0x36/0x140 [ 134.018422] [<00361b54>] vfs_write+0xbc/0x1a0 [ 134.018427] [<0036323e>] SyS_write+0x66/0xc0 [ 134.018434] [<008ccc80>] system_call+0xc4/0x28c [ 134.018438] 5 locks held by brctl/1339: [ 134.018443] #0: (sb_writers#5){.+.+.+}, at: [<00361b3e>] vfs_write+0xa6/0x1a0 [ 134.018462] #1: (>mutex){+.+.+.}, at: [<00401372>] kernfs_fop_write+0x1b2/0x208 [ 134.018478] #2: (s_active#116){.+.+.+}, at: [<0040137e>] kernfs_fop_write+0x1be/0x208 [ 134.018496] #3: (rtnl_mutex){+.+.+.}, at: [<03ff802c0f08>] store_bridge_parm+0x80/0xf8 [bridge] [ 134.018517] #4: (&(>lock)->rlock){+.}, at: [<03ff802bc42c>] br_stp_set_enabled+0xbc/0x2b0 [bridge] [ 134.018537] Last
Oops with commit 6d18c73 bridge: start hello_timer when enabling KERNEL_STP in br_stp_start
Hi, A system running v4.12-rc3-11-gf511c0b on s390 hangs after boot with no messages on the console. The message buffer obtained via a system dump looked like this: [...] [ 17.870712] virbr0: port 1(virbr0-nic) entered disabled state [ 19.618523] Unable to handle kernel pointer dereference in virtual kernel address space [ 250.028426] INFO: task jbd2/dasda1-8:100 blocked for more than 120 seconds. [ 250.028427] Not tainted 4.12.0-rc3-00011-gf511c0b #573 [ 250.028428] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 250.028429] jbd2/dasda1-8 D12808 100 2 0x [ 250.028437] Stack: [ 250.028437]e8c4f9b0 00233afe e8c48100 [ 250.028441]e8c4f978 001b1c98 e8c4f978 e8c4f9d8 [ 250.028444]0400efdcce00 e8c48890 efdcce18 [ 250.028447]e8c48100 efdcce00 e8ce8100 e73c6900 [ 250.028450]008da090 008c4f54 e8c4f9d8 e8c4fa60 [ 250.028453] Call Trace: [ 250.028458] ([<008c4f54>] __schedule+0xb14/0xc90) [ 250.028459] [<008c5164>] schedule+0x94/0xc0 [ 250.028462] [<001802ac>] io_schedule+0x34/0x58 [ 250.028464] [<002a44c2>] wait_on_page_bit+0x16a/0x198 [ 250.028465] [<002a4576>] __filemap_fdatawait_range+0x86/0x188 [ 250.028467] [<002a46a6>] filemap_fdatawait_range+0x2e/0x58 [ 250.028471] [<004719d4>] jbd2_journal_commit_transaction+0x10e4/0x2200 [ 250.028473] [<0047890a>] kjournald2+0xda/0x2c0 [ 250.028475] [<0016da5e>] kthread+0x166/0x178 [ 250.028477] [<008cce7a>] kernel_thread_starter+0x6/0xc [ 250.028479] [<008cce74>] kernel_thread_starter+0x0/0xc [ 250.028480] INFO: lockdep is turned off. [...] The system should have oopsed after [ 19.618523] Unable to handle kernel pointer dereference in virtual kernel address space not sure why it didn't. Anyway, I bisected that to: commit 6d18c732b95c0a9d35e9f978b4438bba15412284 Author: Xin LongDate: Fri May 19 22:20:29 2017 +0800 bridge: start hello_timer when enabling KERNEL_STP in br_stp_start Since commit 76b91c32dd86 ("bridge: stp: when using userspace stp stop kernel hello and hold timers"), bridge would not start hello_timer if stp_enabled is not KERNEL_STP when br_dev_open. The problem is even if users set stp_enabled with KERNEL_STP later, the timer will still not be started. It causes that KERNEL_STP can not really work. Users have to re-ifup the bridge to avoid this. This patch is to fix it by starting br->hello_timer when enabling KERNEL_STP in br_stp_start. As an improvement, it's also to start hello_timer again only when br->stp_enabled is KERNEL_STP in br_hello_timer_expired, there is no reason to start the timer again when it's NO_STP. Fixes: 76b91c32dd86 ("bridge: stp: when using userspace stp stop kernel hello and hold timers") Reported-by: Haidong Li Signed-off-by: Xin Long Acked-by: Nikolay Aleksandrov Reviewed-by: Ivan Vecera Signed-off-by: David S. Miller No clue why this broke my system. I reverted that commit on top of v4.12-rc3-11-gf511c0b to be extra sure and it booted normally. Full dmesg, config, and bisect log are attached. Regards, Sebastian[0.882328] Linux version 4.12.0-rc3-00011-gf511c0b (root@r35lp51) (gcc version 6.3.1 20161221 (Red Hat 6.3.1-1.0.ibm) (GCC) ) #573 SMP PREEMPT Wed May 31 13:07:36 CEST 2017 [0.882339] setup: Linux is running natively in 64-bit mode [0.882378] setup: The maximum memory size is 4096MB [0.882385] cma: Reserved 16 MiB at 0xff00 [0.882407] numa: NUMA mode: plain [0.882434] cpu: 7 configured CPUs, 0 standby CPUs [0.882450] cpu: The CPU configuration topology of the machine is: 0 0 0 4 6 6 / 3 [0.882693] Write protected kernel read-only data: 11536k [0.890690] Zone ranges: [0.890696] DMA [mem 0x-0x7fff] [0.890702] Normal [mem 0x8000-0x] [0.890707] Movable zone start for each node [0.890710] Early memory node ranges [0.890713] node 0: [mem 0x-0x] [0.890717] Initmem setup node 0 [mem 0x-0x] [0.890721] On node 0 totalpages: 1048576 [0.890725] DMA zone: 8192 pages used for memmap [0.890727] DMA zone: 0 pages reserved [0.890730] DMA zone: 524288 pages, LIFO batch:31 [0.895228] Normal zone: 8192 pages used for memmap [0.895231] Normal zone: 524288 pages, LIFO batch:31 [0.937694] percpu: Embedded 472 pages/cpu @ef2c8000 s1895168 r8192 d29952
mlx5: net_device.addr_list_lock usage before initialization
Hi, I ran into the following lockdep complaint: [7.059561] INFO: trying to register non-static key. [7.059566] the code is fine but needs lockdep annotation. [7.059570] turning off the locking correctness validator. [7.059579] CPU: 6 PID: 6 Comm: kworker/u32:0 Not tainted 4.9.0-02683-g784243e-dirty #77 [7.059582] Hardware name: IBM 2964 N96 704 (LPAR) [7.061260] Workqueue: mlx5e mlx5e_set_rx_mode_work [mlx5_core] [7.061268] Stack: [7.061270]f95739c0 f9573a50 0003 [7.061278]f9573af0 f9573a68 f9573a68 0020 [7.061286] 0020 000a 000a [7.061294]000c f9573ab8 [7.061301]008a1038 00112a50 f9573a50 f9573aa8 [7.061314] Call Trace: [7.061321] ([<0011292a>] show_trace+0x8a/0xe0) [7.061327] [<00112a00>] show_stack+0x80/0xd8 [7.061334] [<005cdce6>] dump_stack+0x96/0xd8 [7.061338] [<001ae352>] register_lock_class+0x1d2/0x530 [7.061341] [<001b33f6>] __lock_acquire+0xfe/0x7d8 [7.061345] [<001b4394>] lock_acquire+0x30c/0x358 [7.061352] [<0089454c>] _raw_spin_lock_bh+0x64/0xa0 [7.062171] [<03ff81465858>] mlx5e_set_rx_mode_work+0x248/0x490 [mlx5_core] [7.062178] [<00163864>] process_one_work+0x41c/0x830 [7.062181] [<00163f2c>] worker_thread+0x2b4/0x478 [7.062186] [<0016c46c>] kthread+0x15c/0x170 [7.062190] [<00895a52>] kernel_thread_starter+0x6/0xc [7.062193] [<00895a4c>] kernel_thread_starter+0x0/0xc [7.062196] INFO: lockdep is turned off. The problematic lock is net_device.addr_list_lock whose usage is asynchronously triggered by: mlx5e_add -> mlx5e_attach -> mlx5e_attach_netdev -> mlx5e_nic_enable [workq] mlx5e_set_rx_mode_work -> mlx5e_handle_netdev_addr -> mlx5e_sync_netdev_addr Initialization of this lock is triggered by: mlx5e_add -> register_netdev ...after the call to mlx5e_attach which is obviously racy. Regards, Sebastian
Re: [PATCH net-next V2 1/7] net/mlx5e: Implement Fragmented Work Queue (WQ)
Hi, On Wed, 30 Nov 2016, Saeed Mahameed wrote: > From: Tariq Toukan> > Add new type of struct mlx5_frag_buf which is used to allocate fragmented > buffers rather than contiguous, and make the Completion Queues (CQs) use > it as they are big (default of 2MB per CQ in Striding RQ). > > This fixes the failures of type: > "mlx5e_open_locked: mlx5e_open_channels failed, -12" > due to dma_zalloc_coherent insufficient contiguous coherent memory to > satisfy the driver's request when the user tries to setup more or larger > rings. Thanks for that patch! I can confirm that this fixes the lage allocation issue. Regards, Sebastian
mlx5: ifup failure due to huge allocation
Hi, Ifup on an interface provided by CX4 (MLX5 driver) on s390 fails with: [ 22.318553] [ cut here ] [ 22.318564] WARNING: CPU: 1 PID: 399 at mm/page_alloc.c:3421 __alloc_pages_nodemask+0x2ee/0x1298 [ 22.318568] Modules linked in: mlx4_ib ib_core mlx5_core mlx4_en mlx4_core [...] [ 22.318610] CPU: 1 PID: 399 Comm: NetworkManager Not tainted 4.8.0 #13 [ 22.318614] Hardware name: IBM 2964 N96 704 (LPAR) [ 22.318618] task: dbe1c008 task.stack: dd9e4000 [ 22.318622] Krnl PSW : 0704c0018000 002a427e (__alloc_pages_nodemask+0x2ee/0x1298) [ 22.318631]R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3 Krnl GPRS: 00ceb4d4 024080c0 0001 [ 22.318640]002a4204 a410 001f 0001 [ 22.318644]024080c0 0009 [ 22.318648]a400 0088ea30 002a4204 dd9e7060 [ 22.318660] Krnl Code: 002a4272: a7740592brc 7,2a4d96 002a4276: 92011000mvi 0(%r1),1 #002a427a: a7f40001brc 15,2a427c >002a427e: a7f4058cbrc 15,2a4d96 002a4282: 5830f0b4l %r3,180(%r15) 002a4286: 5030f0ecst %r3,236(%r15) 002a428a: 1823lr %r2,%r3 002a428c: a53e0048llilh %r3,72 [ 22.318695] Call Trace: [ 22.318700] ([<002a4204>] __alloc_pages_nodemask+0x274/0x1298) [ 22.318706] ([<0030dac0>] alloc_pages_current+0x1c0/0x268) [ 22.318712] ([<00135aa6>] s390_dma_alloc+0x6e/0x1e0) [ 22.318733] ([<03ff8015474c>] mlx5_dma_zalloc_coherent_node+0xb4/0xf8 [mlx5_core]) [ 22.318748] ([<03ff80154c58>] mlx5_buf_alloc_node+0x70/0x108 [mlx5_core]) [ 22.318765] ([<03ff8015fe06>] mlx5_cqwq_create+0xf6/0x180 [mlx5_core]) [ 22.318783] ([<03ff8016654c>] mlx5e_open_cq+0xac/0x1e0 [mlx5_core]) [ 22.318802] ([<03ff801693e6>] mlx5e_open_channels+0xe66/0xeb8 [mlx5_core]) [ 22.318820] ([<03ff8016982e>] mlx5e_open_locked+0x8e/0x1e0 [mlx5_core]) [ 22.318837] ([<03ff801699c6>] mlx5e_open+0x46/0x68 [mlx5_core]) [ 22.318844] ([<00748338>] __dev_open+0xa8/0x118) [ 22.318848] ([<0074867a>] __dev_change_flags+0xc2/0x190) [ 22.318853] ([<0074877e>] dev_change_flags+0x36/0x78) [ 22.318858] ([<0075bc8a>] do_setlink+0x332/0xb30) [ 22.318862] ([<0075de3a>] rtnl_newlink+0x3e2/0x820) [ 22.318867] ([<0075e46e>] rtnetlink_rcv_msg+0x1f6/0x248) [ 22.318873] ([<00782202>] netlink_rcv_skb+0x92/0x108) [ 22.318878] ([<0075c668>] rtnetlink_rcv+0x48/0x58) [ 22.318882] ([<00781ace>] netlink_unicast+0x14e/0x1f0) [ 22.318887] ([<00781f82>] netlink_sendmsg+0x32a/0x3b0) [ 22.318892] ([<0071d502>] sock_sendmsg+0x5a/0x80) [ 22.318897] ([<0071ed38>] ___sys_sendmsg+0x270/0x2a8) [ 22.318901] ([<0071fe80>] __sys_sendmsg+0x60/0x90) [ 22.318905] ([<007207c6>] SyS_socketcall+0x2be/0x388) [ 22.318912] ([<0086fcae>] system_call+0xd6/0x270) [ 22.318916] 3 locks held by NetworkManager/399: [ 22.318920] #0: (rtnl_mutex){+.+.+.}, at: [<0075c658>] rtnetlink_rcv+0x38/0x58 [ 22.318935] #1: (>state_lock){+.+.+.}, at: [<03ff801699bc>] mlx5e_open+0x3c/0x68 [mlx5_core] [ 22.318962] #2: (>alloc_mutex){+.+.+.}, at: [<03ff801546e0>] mlx5_dma_zalloc_coherent_node+0x48/0xf8 [mlx5_core] [ 22.318987] Last Breaking-Event-Address: [ 22.318992] [<002a427a>] __alloc_pages_nodemask+0x2ea/0x1298 [ 22.318996] ---[ end trace d2b54f5a0cd00b89 ]--- [ 22.319001] mlx5_core 0001:00:00.0: 0001:00:00.0:mlx5_cqwq_create:121:(pid 399): mlx5_buf_alloc_node() failed, -12 [ 22.320548] mlx5_core 0001:00:00.0 enP1s171: mlx5e_open_locked: mlx5e_open_channels failed, -12 This fails because the largest possible allocation on s390 is currently 1MB (order 8). Would it be possible to add the __GFP_NOWARN flag and try a smaller allocation if the big one failed? (The latter change also would make the device usable when it is added via hotplug and free memory is scattered). Regards, Sebastian
mlx4: panic during shutdown
Hi, After a userspace update (fedora 23->24) I reproducibly run into the following oops during shutdown (on s390): [ 71.054832] Unable to handle kernel pointer dereference in virtual kernel address space [ 71.054835] Failing address: 6b6b6b6b6b6b6000 TEID: 6b6b6b6b6b6b6803 [ 71.054838] Fault in home space mode while using kernel ASCE. [ 71.054847] AS:00f70007 R3:0024 [ 71.054883] Oops: 0038 ilc:3 [#1] PREEMPT SMP [ 71.054887] Modules linked in: mlx4_ib ib_core mlx4_en ptp pps_core mlx4_core [...] [ 71.054912] CPU: 8 PID: 809 Comm: kworker/8:6 Not tainted 4.8.0-02896-g7137af2-dirty #6 [ 71.054913] Hardware name: IBM 2964 N96 704 (LPAR) [ 71.054919] Workqueue: events linkwatch_event [ 71.054921] task: dbea0008 task.stack: dbea4000 [ 71.054923] Krnl PSW : 0704e0018000 03ff8007a496 (mlx4_en_get_phys_port_id+0x66/0xb0 [mlx4_en]) [ 71.054933]R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3 Krnl GPRS: 0080 0268 004e 001c33e0 [ 71.054937]03ff8007a486 00882790 6b6b6b6b6b6b6b6b 0010 [ 71.054939]dbea7b18 6b6b6b6b6b6b6b6b dbea7b18 e72e [ 71.054941]f15ec900 03ff8007a486 dbea79c8 [ 71.054950] Krnl Code: 03ff8007a486: e310b81c0d14lgf %r1,55324(%r11) 03ff8007a48c: a71b004baghi%r1,75 #03ff8007a490: eb110003000dsllg %r1,%r1,3 >03ff8007a496: e3119002ltg %r1,0(%r1,%r9) 03ff8007a49c: a7840015brc 8,3ff8007a4c6 03ff8007a4a0: 9208a020mvi 32(%r10),8 03ff8007a4a4: 4130a007la %r3,7(%r10) 03ff8007a4a8: a7290008lghi%r2,8 [ 71.054965] Call Trace: [ 71.054969] ([<03ff8007a486>] mlx4_en_get_phys_port_id+0x56/0xb0 [mlx4_en]) [ 71.054971] ([<00760b94>] rtnl_fill_ifinfo+0x4ec/0xc90) [ 71.054974] ([<00764fae>] rtmsg_ifinfo_build_skb+0x96/0xe8) [ 71.054976] ([<00765038>] rtmsg_ifinfo+0x38/0x78) [ 71.054978] ([<0074150e>] netdev_state_change+0x5e/0x70) [ 71.054981] ([<00765ca6>] linkwatch_do_dev+0x66/0xc8) [ 71.054983] ([<00765fd6>] __linkwatch_run_queue+0x13e/0x190) [ 71.054985] ([<00766070>] linkwatch_event+0x48/0x58) [ 71.054988] ([<00162a2e>] process_one_work+0x3fe/0x820) [ 71.054990] ([<001630e6>] worker_thread+0x296/0x460) [ 71.054992] ([<0016b41a>] kthread+0x112/0x120) [ 71.054996] ([<008762b2>] kernel_thread_starter+0x6/0xc) [ 71.054998] ([<008762ac>] kernel_thread_starter+0x0/0xc) [ 71.055000] INFO: lockdep is turned off. [ 71.055001] Last Breaking-Event-Address: [ 71.055004] [<00294480>] printk+0xc8/0xd0 [ 71.055006] [ 71.055008] Kernel panic - not syncing: Fatal exception: panic_on_oops This was observed with 4.8 but it's also reproducible on 4.9-rc1. In mlx4_en_get_phys_port_id (which looks like it's called from userspace via sysfs) the data behind mlx4_en_priv->mdev is already freed. The problem probably is that the lifetime of mlx4_en_priv->mdev seems to be shorter than that of struct net_device (and mlx4_en_get_phys_port_id can be called as long as struct net_device exists). Regards, Sebastian
Re: [PATCH] net/mlx4_en: fix off by one in error handling
On Wed, 14 Sep 2016, Tariq Toukan wrote: > On 14/09/2016 4:53 PM, Sebastian Ott wrote: > > On Wed, 14 Sep 2016, Tariq Toukan wrote: > > > On 14/09/2016 2:09 PM, Sebastian Ott wrote: > > > > If an error occurs in mlx4_init_eq_table the index used in the > > > > err_out_unmap label is one too big which results in a panic in > > > > mlx4_free_eq. This patch fixes the index in the error path. > > > You are right, but your change below does not cover all cases. > > > The full solution looks like this: > > > > > > @@ -1260,7 +1260,7 @@ int mlx4_init_eq_table(struct mlx4_dev *dev) > > > eq); > > > } > > > if (err) > > > - goto err_out_unmap; > > > + goto err_out_unmap_excluded; > > In this case a call to mlx4_create_eq failed. Do you really have to call > > mlx4_free_eq for this index again? > > We agree on this part, that's why here we should goto the _excluded_ label. > For all other parts, we should not exclude the eq in the highest index, and > thus we goto the _non_excluded_ label. But that's exactly what the original patch does. If the failure is within the for loop at index i, we do the cleanup starting at index i-1. If the failure is after the for loop then i == dev->caps.num_comp_vectors + 1 and we do the cleanup starting at index i == dev->caps.num_comp_vectors. In the latter case your patch would have an out of bounds array access. Regards, Sebastian
Re: [PATCH] net/mlx4_en: fix off by one in error handling
Hello Tariq, On Wed, 14 Sep 2016, Tariq Toukan wrote: > On 14/09/2016 2:09 PM, Sebastian Ott wrote: > > If an error occurs in mlx4_init_eq_table the index used in the > > err_out_unmap label is one too big which results in a panic in > > mlx4_free_eq. This patch fixes the index in the error path. > You are right, but your change below does not cover all cases. > The full solution looks like this: > > @@ -1260,7 +1260,7 @@ int mlx4_init_eq_table(struct mlx4_dev *dev) > eq); > } > if (err) > - goto err_out_unmap; > + goto err_out_unmap_excluded; In this case a call to mlx4_create_eq failed. Do you really have to call mlx4_free_eq for this index again? As far as I understood this code mlx4_create_eq cleans up when it fails and thus there is no need for an additional mlx4_free_eq call. Regards, Sebastian
[PATCH] net/mlx4_en: fix off by one in error handling
If an error occurs in mlx4_init_eq_table the index used in the err_out_unmap label is one too big which results in a panic in mlx4_free_eq. This patch fixes the index in the error path. Signed-off-by: Sebastian Ott <seb...@linux.vnet.ibm.com> --- drivers/net/ethernet/mellanox/mlx4/eq.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx4/eq.c b/drivers/net/ethernet/mellanox/mlx4/eq.c index f613977..cf8f8a7 100644 --- a/drivers/net/ethernet/mellanox/mlx4/eq.c +++ b/drivers/net/ethernet/mellanox/mlx4/eq.c @@ -1305,8 +1305,8 @@ int mlx4_init_eq_table(struct mlx4_dev *dev) return 0; err_out_unmap: - while (i >= 0) - mlx4_free_eq(dev, >eq_table.eq[i--]); + while (i > 0) + mlx4_free_eq(dev, >eq_table.eq[--i]); #ifdef CONFIG_RFS_ACCEL for (i = 1; i <= dev->caps.num_ports; i++) { if (mlx4_priv(dev)->port[i].rmap) { -- 2.5.5
ipv6 csum failures on MLX4 VFs
Hi, I receive tons of these csum failure warnings. The patch mentioned here: https://patchwork.ozlabs.org/patch/512005/ seems to work. Will something like that be integrated upstream? Sebastian
Re: mlx4: failed to allocate default counter port 1
On Wed, 1 Jul 2015, Or Gerlitz wrote: On 6/30/2015 5:17 PM, Sebastian Ott wrote: On Tue, 30 Jun 2015, Or Gerlitz wrote: On 6/30/2015 4:24 PM, Sebastian Ott wrote: Do you run the VF on the same system/kernel as the PF, or the VF is probed to VM which runs the latest kernel and the PF runsolder kernel (which?) The latter case. The PF is driven by a much older Kernel running OFED 2.3.2.0.0.1 Can you try running the inbox PF driver that comes with the PF kernel (what kernel is that?) I'd like to see we're OK there. Frankly, I don't know. Plus I also don't know how to build an ofed kernel. I didn't want you to build that package, but rather the outer way around, namely see what happens if uninstalling this package and running with the mlx4 inbox PF driver from the kernel provided from your distro of choice or an upstreamkernel installed by you. Anyway, I hope the below patch would provide a quick band-aid and let you to continue running upstream VFs over that PF config, let me know (I will be OOO till Thu-Sun). Once we see how this behaves, will take it from there. Any updates on this one? Regards, Sebastian -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: mlx4: failed to allocate default counter port 1
On Wed, 1 Jul 2015, Or Gerlitz wrote: On Wed, Jul 1, 2015 at 5:18 PM, Sebastian Ott seb...@linux.vnet.ibm.com wrote: OK, using this patch it worked: yep, I forgot to recap err to zero. By it worked you mean the VF is live and kicking, all functionality you had before the 4.2 merge window is back again? Yes. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: mlx4: failed to allocate default counter port 1
On Wed, 1 Jul 2015, Sebastian Ott wrote: On Wed, 1 Jul 2015, Or Gerlitz wrote: On 6/30/2015 5:17 PM, Sebastian Ott wrote: On Tue, 30 Jun 2015, Or Gerlitz wrote: On 6/30/2015 4:24 PM, Sebastian Ott wrote: Do you run the VF on the same system/kernel as the PF, or the VF is probed to VM which runs the latest kernel and the PF runsolder kernel (which?) The latter case. The PF is driven by a much older Kernel running OFED 2.3.2.0.0.1 Can you try running the inbox PF driver that comes with the PF kernel (what kernel is that?) I'd like to see we're OK there. Frankly, I don't know. Plus I also don't know how to build an ofed kernel. I didn't want you to build that package, but rather the outer way around, namely see what happens if uninstalling this package and running with the mlx4 inbox PF driver from the kernel provided from your distro of choice or an upstreamkernel installed by you. Anyway, I hope the below patch would provide a quick band-aid and let you to continue running upstream VFs over that PF config, let me know (I will be OOO till Thu-Sun). Once we see how this behaves, will take it from there. Thanks for the patch. Unfortunately, that didn't work: OK, using this patch it worked: diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c index 12fbfcb..29c2a01 100644 --- a/drivers/net/ethernet/mellanox/mlx4/main.c +++ b/drivers/net/ethernet/mellanox/mlx4/main.c @@ -2273,6 +2273,11 @@ static int mlx4_allocate_default_counters(struct mlx4_dev *dev) } else if (err == -ENOENT) { err = 0; continue; + } else if (mlx4_is_slave(dev) err == -EINVAL) { + priv-def_counter[port] = MLX4_SINK_COUNTER_INDEX(dev); + mlx4_warn(dev, can't allocate counter from old PF driver, using index %d\n, + MLX4_SINK_COUNTER_INDEX(dev)); + err = 0; } else { mlx4_err(dev, %s: failed to allocate default counter port %d err %d\n, __func__, port + 1, err); -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: mlx4: failed to allocate default counter port 1
On Wed, 1 Jul 2015, Or Gerlitz wrote: On 6/30/2015 5:17 PM, Sebastian Ott wrote: On Tue, 30 Jun 2015, Or Gerlitz wrote: On 6/30/2015 4:24 PM, Sebastian Ott wrote: Do you run the VF on the same system/kernel as the PF, or the VF is probed to VM which runs the latest kernel and the PF runsolder kernel (which?) The latter case. The PF is driven by a much older Kernel running OFED 2.3.2.0.0.1 Can you try running the inbox PF driver that comes with the PF kernel (what kernel is that?) I'd like to see we're OK there. Frankly, I don't know. Plus I also don't know how to build an ofed kernel. I didn't want you to build that package, but rather the outer way around, namely see what happens if uninstalling this package and running with the mlx4 inbox PF driver from the kernel provided from your distro of choice or an upstreamkernel installed by you. Anyway, I hope the below patch would provide a quick band-aid and let you to continue running upstream VFs over that PF config, let me know (I will be OOO till Thu-Sun). Once we see how this behaves, will take it from there. Thanks for the patch. Unfortunately, that didn't work: [ 170.531076] mlx4_core :00:00.0: NOP command IRQ test passed [ 170.531291] mlx4_core :00:00.0: can't allocate counter from old PF driver, using index 255 [ 170.531294] mlx4_core :00:00.0: mlx4_allocate_default_counters: default counter index 255 for port 1 [ 170.531531] mlx4_core :00:00.0: can't allocate counter from old PF driver, using index 255 [ 170.531534] mlx4_core :00:00.0: mlx4_allocate_default_counters: default counter index 255 for port 2 [ 170.531535] mlx4_core :00:00.0: Failed to allocate default counters, aborting [ 170.587306] mlx4_core: probe of :00:00.0 failed with error -22 Regards, Sebastian Or. diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c index 12fbfcb..a66cc6e 100644 --- a/drivers/net/ethernet/mellanox/mlx4/main.c +++ b/drivers/net/ethernet/mellanox/mlx4/main.c @@ -2273,6 +2273,10 @@ static int mlx4_allocate_default_counters(struct mlx4_dev *dev) } else if (err == -ENOENT) { err = 0; continue; + } else if (mlx4_is_slave(dev) err == -EINVAL) { + priv-def_counter[port] = MLX4_SINK_COUNTER_INDEX(dev); + mlx4_warn(dev, can't allocate counter from old PF driver, using index %d\n, + MLX4_SINK_COUNTER_INDEX(dev)); } else { mlx4_err(dev, %s: failed to allocate default counter port %d err %d\n, __func__, port + 1, err); -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: mlx4: failed to allocate default counter port 1
On Tue, 30 Jun 2015, Or Gerlitz wrote: On Tue, Jun 30, 2015 at 1:45 PM, Sebastian Ott seb...@linux.vnet.ibm.com wrote: after the latest mellanox update the mlx4 driver fails to probe a VF: [ 88.909562] mlx4_core :00:00.0: mlx4_allocate_default_counters: failed to allocate default counter port 1 err -22 [ 88.909564] mlx4_core :00:00.0: Failed to allocate default counters, aborting [ 88.961735] mlx4_core: probe of :00:00.0 failed with error -22 PFs still work. See below for more dmesg output - I also added a line of debug output...maybe this helps. Can you please send your lspci | grep nox listing? also what :00:00.0 Ethernet controller: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function] Firmware version you have there? e.g when you probe the PF with mlx4_core debug_level=1 can you sens us the lines that follow the PF probe, e.g as here + dump of all caps as you did for the VF I have access to 2 machines and run a guest instance on both machines: * on one the guest has acccess to a PF, but VF enablement is disallowed * on the other the hypervisor controls the PF and the guests have only access to the VFs - so I cannot say much about the PF here At least I found out the FW version - it's: 2.33.5100 Regards, Sebastian 952.367911] mlx4_core: Mellanox ConnectX core driver v2.2-1 (Feb, 2014) [ 952.374606] mlx4_core: Initializing :06:00.0 [ 953.384332] mlx4_core :06:00.0: FW version 2.34.5000 (cmd intf rev 3), max commands 16 [...] Also send us the output of dmesg | grep -i counter after such verbose load. thanks, Or. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: mlx4: failed to allocate default counter port 1
On Tue, 30 Jun 2015, Or Gerlitz wrote: On 6/30/2015 1:45 PM, Sebastian Ott wrote: [ 88.909558] mlx4_slave_cmd op=3840, ret=-22, status=3 [ 88.909562] mlx4_core :00:00.0: mlx4_allocate_default_counters: failed to allocate default counter port 1 err -22 [ 88.909564] mlx4_core :00:00.0: Failed to allocate default counters, aborting [ 88.961735] mlx4_core: probe of :00:00.0 failed with error -22 Do you run the VF on the same system/kernel as the PF, or the VF is probed to VM which runs the latest kernel and the PF runsolder kernel (which?) The latter case. The PF is driven by a much older Kernel running OFED 2.3.2.0.0.1 Can you also hook the PF code that serves this flow to see where we actually fail? basically, we should be going this way mlx4_ALLOC_RES_wrapper -- counter_alloc_res -- so I'd like to see which of the branches in counter_alloc_res fails... Nope, I don't have direct access to the PF, sry. Sebastian Or. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: mlx4: failed to allocate default counter port 1
On Tue, 30 Jun 2015, Or Gerlitz wrote: On 6/30/2015 4:24 PM, Sebastian Ott wrote: Do you run the VF on the same system/kernel as the PF, or the VF is probed to VM which runs the latest kernel and the PF runsolder kernel (which?) The latter case. The PF is driven by a much older Kernel running OFED 2.3.2.0.0.1 Can you try running the inbox PF driver that comes with the PF kernel (what kernel is that?) I'd like to see we're OK there. Frankly, I don't know. Plus I also don't know how to build an ofed kernel. Regards, Sebastian -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
mlx4: failed to allocate default counter port 1
Hello, after the latest mellanox update the mlx4 driver fails to probe a VF: [ 88.909562] mlx4_core :00:00.0: mlx4_allocate_default_counters: failed to allocate default counter port 1 err -22 [ 88.909564] mlx4_core :00:00.0: Failed to allocate default counters, aborting [ 88.961735] mlx4_core: probe of :00:00.0 failed with error -22 PFs still work. See below for more dmesg output - I also added a line of debug output...maybe this helps. Regards, Sebastian # git diff diff --git a/drivers/net/ethernet/mellanox/mlx4/cmd.c b/drivers/net/ethernet/mellanox/mlx4/cmd.c index 8204013..e0c41c3 100644 --- a/drivers/net/ethernet/mellanox/mlx4/cmd.c +++ b/drivers/net/ethernet/mellanox/mlx4/cmd.c @@ -565,6 +565,9 @@ static int mlx4_slave_cmd(struct mlx4_dev *dev, u64 in_param, u64 *out_param, } } ret = mlx4_status_to_errno(vhcr-status); + if (ret) + printk(KERN_WARNING%s op=%d, ret=%d, status=%d\n, + __func__, op, ret, vhcr-status); } else { if (dev-persist-state MLX4_DEVICE_STATE_INTERNAL_ERROR) # git describe v4.1-11355-g6aaf0da # dmesg [ 88.518946] mlx4_core: Mellanox ConnectX core driver v2.2-1 (Feb, 2014) [ 88.518967] mlx4_core: Initializing :00:00.0 [ 88.519101] mlx4_core :00:00.0: enabling device ( - 0002) [ 88.519661] mlx4_core :00:00.0: enabling bus mastering [ 88.520279] mlx4_core :00:00.0: Detected virtual function - running in slave mode [ 88.520404] mlx4_core :00:00.0: Sending reset [ 88.526726] mlx4_core :00:00.0: Sending vhcr0 [ 88.539676] mlx4_core :00:00.0: BlueFlame not available [ 88.539678] mlx4_core :00:00.0: Base MM extensions: flags 31104ec2, rsvd L_Key 8000 [ 88.539680] mlx4_core :00:00.0: Max ICM size 4294967296 MB [ 88.539682] mlx4_core :00:00.0: Max QPs: 16777216, reserved QPs: 64, entry size: 256 [ 88.539683] mlx4_core :00:00.0: Max SRQs: 16777216, reserved SRQs: 64, entry size: 128 [ 88.539685] mlx4_core :00:00.0: Max CQs: 16777216, reserved CQs: 128, entry size: 128 [ 88.539687] mlx4_core :00:00.0: Num sys EQs: 1024, max EQs: 512, reserved EQs: 8, entry size: 128 [ 88.539688] mlx4_core :00:00.0: reserved MPTs: 256, reserved MTTs: 64 [ 88.539690] mlx4_core :00:00.0: Max PDs: 131072, reserved PDs: 4, reserved UARs: 2 [ 88.539691] mlx4_core :00:00.0: Max QP/MCG: 131072, reserved MGMs: 0 [ 88.539693] mlx4_core :00:00.0: Max CQEs: 4194304, max WQEs: 16384, max SRQ WQEs: 16384 [ 88.539695] mlx4_core :00:00.0: Local CA ACK delay: 15, max MTU: 4096, port width cap: 3 [ 88.539696] mlx4_core :00:00.0: Max SQ desc size: 1008, max SQ S/G: 62 [ 88.539698] mlx4_core :00:00.0: Max RQ desc size: 512, max RQ S/G: 32 [ 88.539699] mlx4_core :00:00.0: Max GSO size: 131072 [ 88.539701] mlx4_core :00:00.0: Max counters: 256 [ 88.539702] mlx4_core :00:00.0: Max RSS Table size: 256 [ 88.539704] mlx4_core :00:00.0: DMFS high rate steer QPn base: 64 [ 88.539705] mlx4_core :00:00.0: DMFS high rate steer QPn range: 254 [ 88.539707] mlx4_core :00:00.0: QP Rate-Limit: #rates 1024, unit/val max 3/40, min 1/512 [ 88.539709] mlx4_core :00:00.0: DEV_CAP flags: [ 88.539710] mlx4_core :00:00.0: RC transport [ 88.539711] mlx4_core :00:00.0: UC transport [ 88.539713] mlx4_core :00:00.0: UD transport [ 88.539714] mlx4_core :00:00.0: XRC transport [ 88.539716] mlx4_core :00:00.0: SRQ support [ 88.539717] mlx4_core :00:00.0: IPoIB checksum offload [ 88.539719] mlx4_core :00:00.0: P_Key violation counter [ 88.539720] mlx4_core :00:00.0: Q_Key violation counter [ 88.539722] mlx4_core :00:00.0: Big LSO headers [ 88.539723] mlx4_core :00:00.0: MW support [ 88.539724] mlx4_core :00:00.0: APM support [ 88.539726] mlx4_core :00:00.0: Atomic ops support [ 88.539727] mlx4_core :00:00.0: Address vector port checking support [ 88.539729] mlx4_core :00:00.0: UD multicast support [ 88.539730] mlx4_core :00:00.0: IBoE support [ 88.539732] mlx4_core :00:00.0: Unicast loopback support [ 88.539733] mlx4_core :00:00.0: FCS header control [ 88.539735] mlx4_core :00:00.0: UDP RSS support [ 88.539736] mlx4_core :00:00.0: Unicast VEP steering support [ 88.539738] mlx4_core :00:00.0: Multicast VEP steering support [ 88.539739] mlx4_core :00:00.0: Counters support [ 88.539741] mlx4_core :00:00.0: RSS IP fragments support [ 88.539742] mlx4_core :00:00.0: Port ETS Scheduler support [ 88.539744] mlx4_core :00:00.0: Port link type sensing support [ 88.539745] mlx4_core