Re: netconsole deadlock with virtnet

2020-11-24 Thread Jason Wang
On 2020/11/25 上午12:20, Jakub Kicinski wrote: On Tue, 24 Nov 2020 11:22:03 +0800 Jason Wang wrote: Perhaps you need the trylock in virtnet_poll_tx()? That could work. Best if we used normal lock if !!budget, and trylock when budget is 0. But maybe that's too hairy. If we use trylock, we

Re: netconsole deadlock with virtnet

2020-11-24 Thread Jason Wang
On 2020/11/24 下午10:31, Steven Rostedt wrote: On Tue, 24 Nov 2020 11:22:03 +0800 Jason Wang wrote: Btw, have a quick search, there are several other drivers that uses tx lock in the tx NAPI. tx NAPI is not the issue. The issue is that write_msg() (in netconsole.c) calls this polling logic

Re: netconsole deadlock with virtnet

2020-11-24 Thread Steven Rostedt
On Tue, 24 Nov 2020 11:22:03 +0800 Jason Wang wrote: > Btw, have a quick search, there are several other drivers that uses tx > lock in the tx NAPI. tx NAPI is not the issue. The issue is that write_msg() (in netconsole.c) calls this polling logic with the target_list_lock held. Are those

Re: netconsole deadlock with virtnet

2020-11-24 Thread Leon Romanovsky
On Tue, Nov 24, 2020 at 04:57:23PM +0800, Jason Wang wrote: > > On 2020/11/24 下午4:01, Leon Romanovsky wrote: > > On Tue, Nov 24, 2020 at 11:22:03AM +0800, Jason Wang wrote: > > > On 2020/11/24 上午3:21, Jakub Kicinski wrote: > > > > On Mon, 23 Nov 2020 14:09:34 -0500 Steven Rostedt wrote: > > > > >

Re: netconsole deadlock with virtnet

2020-11-24 Thread Jason Wang
On 2020/11/24 下午4:01, Leon Romanovsky wrote: On Tue, Nov 24, 2020 at 11:22:03AM +0800, Jason Wang wrote: On 2020/11/24 上午3:21, Jakub Kicinski wrote: On Mon, 23 Nov 2020 14:09:34 -0500 Steven Rostedt wrote: On Mon, 23 Nov 2020 10:52:52 -0800 Jakub Kicinski wrote: On Mon, 23 Nov 2020

Re: netconsole deadlock with virtnet

2020-11-24 Thread Leon Romanovsky
On Tue, Nov 24, 2020 at 11:22:03AM +0800, Jason Wang wrote: > > On 2020/11/24 上午3:21, Jakub Kicinski wrote: > > On Mon, 23 Nov 2020 14:09:34 -0500 Steven Rostedt wrote: > > > On Mon, 23 Nov 2020 10:52:52 -0800 > > > Jakub Kicinski wrote: > > > > > > > On Mon, 23 Nov 2020 09:31:28 -0500 Steven

Re: netconsole deadlock with virtnet

2020-11-23 Thread Jason Wang
On 2020/11/24 上午3:21, Jakub Kicinski wrote: On Mon, 23 Nov 2020 14:09:34 -0500 Steven Rostedt wrote: On Mon, 23 Nov 2020 10:52:52 -0800 Jakub Kicinski wrote: On Mon, 23 Nov 2020 09:31:28 -0500 Steven Rostedt wrote: On Mon, 23 Nov 2020 13:08:55 +0200 Leon Romanovsky wrote: [

Re: netconsole deadlock with virtnet

2020-11-23 Thread Steven Rostedt
On Mon, 23 Nov 2020 10:52:52 -0800 Jakub Kicinski wrote: > On Mon, 23 Nov 2020 09:31:28 -0500 Steven Rostedt wrote: > > On Mon, 23 Nov 2020 13:08:55 +0200 > > Leon Romanovsky wrote: > > > > > > > [ 10.028024] Chain exists of: > > > [ 10.028025] console_owner --> target_list_lock -->

Re: netconsole deadlock with virtnet

2020-11-23 Thread Steven Rostedt
On Mon, 23 Nov 2020 13:08:55 +0200 Leon Romanovsky wrote: > [ 10.028024] Chain exists of: > [ 10.028025] console_owner --> target_list_lock --> _xmit_ETHER#2 Note, the problem is that we have a location that grabs the xmit_lock while holding target_list_lock (and possibly

Re: netconsole deadlock with virtnet

2020-11-23 Thread Leon Romanovsky
On Wed, Nov 18, 2020 at 09:12:57AM -0500, Steven Rostedt wrote: > > [ Adding netdev as perhaps someone there knows ] > > On Wed, 18 Nov 2020 12:09:59 +0800 > Jason Wang wrote: > > > > This CPU0 lock(_xmit_ETHER#2) -> hard IRQ -> lock(console_owner) is > > > basically > > > soft IRQ ->

Re: netconsole deadlock with virtnet

2020-11-22 Thread Leon Romanovsky
On Thu, Nov 19, 2020 at 01:55:53PM +0100, Petr Mladek wrote: > On Tue 2020-11-17 09:33:25, Steven Rostedt wrote: > > On Tue, 17 Nov 2020 12:23:41 +0200 > > Leon Romanovsky wrote: > > > > > Hi, > > > > > > Approximately two weeks ago, our regression team started to experience > > > those > > >

Re: netconsole deadlock with virtnet

2020-11-19 Thread Petr Mladek via Virtualization
On Tue 2020-11-17 09:33:25, Steven Rostedt wrote: > On Tue, 17 Nov 2020 12:23:41 +0200 > Leon Romanovsky wrote: > > > Hi, > > > > Approximately two weeks ago, our regression team started to experience those > > netconsole splats. The tested code is Linus's master (-rc4) + netdev > > net-next >

Re: netconsole deadlock with virtnet

2020-11-18 Thread Steven Rostedt
[ Adding netdev as perhaps someone there knows ] On Wed, 18 Nov 2020 12:09:59 +0800 Jason Wang wrote: > > This CPU0 lock(_xmit_ETHER#2) -> hard IRQ -> lock(console_owner) is > > basically > > soft IRQ -> lock(_xmit_ETHER#2) -> hard IRQ -> printk() > > > > Then CPU1 spins on xmit, which is

Re: netconsole deadlock with virtnet

2020-11-17 Thread Jason Wang
On 2020/11/18 上午11:15, Sergey Senozhatsky wrote: On (20/11/18 11:46), Sergey Senozhatsky wrote: [..] Because I'm not sure where the xmit_lock is taken while holding the target_list_lock. I don't see where does this happen. It seems to me that the report is not about broken locking order, but

Re: netconsole deadlock with virtnet

2020-11-17 Thread Sergey Senozhatsky
On (20/11/18 11:46), Sergey Senozhatsky wrote: [..] > > Because I'm not sure where the xmit_lock is taken while holding the > > target_list_lock. > > I don't see where does this happen. It seems to me that the report > is not about broken locking order, but more about: > - soft-irq can be

Re: netconsole deadlock with virtnet

2020-11-17 Thread Sergey Senozhatsky
On (20/11/17 09:33), Steven Rostedt wrote: > > [ 21.149601] IN-HARDIRQ-W at: > > [ 21.149602] __lock_acquire+0xa78/0x1a94 > > [ 21.149603] lock_acquire.part.0+0x170/0x360 > > [ 21.149604] lock_acquire+0x68/0x8c

Re: netconsole deadlock with virtnet

2020-11-17 Thread Leon Romanovsky
On Tue, Nov 17, 2020 at 09:33:25AM -0500, Steven Rostedt wrote: > On Tue, 17 Nov 2020 12:23:41 +0200 > Leon Romanovsky wrote: > > > Hi, > > > > Approximately two weeks ago, our regression team started to experience those > > netconsole splats. The tested code is Linus's master (-rc4) + netdev >

Re: netconsole deadlock with virtnet

2020-11-17 Thread Steven Rostedt
On Tue, 17 Nov 2020 12:23:41 +0200 Leon Romanovsky wrote: > Hi, > > Approximately two weeks ago, our regression team started to experience those > netconsole splats. The tested code is Linus's master (-rc4) + netdev net-next > + netdev net-rc. > > Such splats are random and we can't bisect