RE: Boot regression (was "Re: [PATCH] genhd: Do not hold event lock when scheduling workqueue elements")

2017-02-15 Thread Dexuan Cui
> From: h...@lst.de [mailto:h...@lst.de] > Sent: Wednesday, February 15, 2017 00:35 > > I tested today's linux-next (next-20170214) + the 2 patches just now and > got > > a weird result: > > sometimes the VM stills hung with a new calltrace (BUG: spinlock bad > > magic) , but sometimes the VM did b

Re: Boot regression (was "Re: [PATCH] genhd: Do not hold event lock when scheduling workqueue elements")

2017-02-14 Thread h...@lst.de
> I tested today's linux-next (next-20170214) + the 2 patches just now and got > a weird result: > sometimes the VM stills hung with a new calltrace (BUG: spinlock bad > magic) , but sometimes the VM did boot up despite the new calltrace! > > Attached is the log of a "good" boot. > > It looks we

RE: Boot regression (was "Re: [PATCH] genhd: Do not hold event lock when scheduling workqueue elements")

2017-02-14 Thread Dexuan Cui
ernel.org; Nick Meier > ; Alex Ng (LIS) ; Long Li > ; Adrian Suhov (Cloudbase Solutions SRL) ads...@microsoft.com>; Chris Valean (Cloudbase Solutions SRL) chv...@microsoft.com> > Subject: Re: Boot regression (was "Re: [PATCH] genhd: Do not hold event lock > when scheduling workq

Re: Boot regression (was "Re: [PATCH] genhd: Do not hold event lock when scheduling workqueue elements")

2017-02-14 Thread h...@lst.de
On Tue, Feb 14, 2017 at 02:46:41PM +, Dexuan Cui wrote: > > From: h...@lst.de [mailto:h...@lst.de] > > Sent: Tuesday, February 14, 2017 22:29 > > To: Dexuan Cui > > Subject: Re: Boot regression (was "Re: [PATCH] genhd: Do not hold event lock > >

RE: Boot regression (was "Re: [PATCH] genhd: Do not hold event lock when scheduling workqueue elements")

2017-02-14 Thread Dexuan Cui
> From: h...@lst.de [mailto:h...@lst.de] > Sent: Tuesday, February 14, 2017 22:29 > To: Dexuan Cui > Subject: Re: Boot regression (was "Re: [PATCH] genhd: Do not hold event lock > when scheduling workqueue elements") > > Ok, thanks for testing. Can you try the p

Re: Boot regression (was "Re: [PATCH] genhd: Do not hold event lock when scheduling workqueue elements")

2017-02-14 Thread h...@lst.de
Ok, thanks for testing. Can you try the patch below? It fixes a clear problem which was partially papered over before the commit you bisected to, although it can't explain why blk-mq still works. >From e4a66856fa2d92c0298000de658365f31bea60cd Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Dat

RE: Boot regression (was "Re: [PATCH] genhd: Do not hold event lock when scheduling workqueue elements")

2017-02-14 Thread Dexuan Cui
> From: h...@lst.de [mailto:h...@lst.de] > > Hi Dexuan, > > can you try the hack below for now? I disable the TUR call from > sd_check_events, which I think your VM is hanging on. The checks > it does on the sense data look a bit fishy, but so far I've not > identified a possible root cause. >

Re: Boot regression (was "Re: [PATCH] genhd: Do not hold event lock when scheduling workqueue elements")

2017-02-14 Thread h...@lst.de
Hi Dexuan, can you try the hack below for now? I disable the TUR call from sd_check_events, which I think your VM is hanging on. The checks it does on the sense data look a bit fishy, but so far I've not identified a possible root cause. diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index

Re: Boot regression (was "Re: [PATCH] genhd: Do not hold event lock when scheduling workqueue elements")

2017-02-09 Thread h...@lst.de
Hi Dexuan, I've spent some time with the logs and looking over the code and couldn't find any smoking gun. I start to wonder if it might just be a timing issue? Can you try one or two things for me: 1) run with the blk-mq I/O path for scsi by either enabling it a boot / module load time wi

RE: Boot regression (was "Re: [PATCH] genhd: Do not hold event lock when scheduling workqueue elements")

2017-02-08 Thread Dexuan Cui
nel.org > Subject: Re: Boot regression (was "Re: [PATCH] genhd: Do not hold event lock > when scheduling workqueue elements") > > On Wed, Feb 08, 2017 at 10:43:59AM -0700, Jens Axboe wrote: > > I've changed the subject line, this issue has nothing to do with the >

Re: Boot regression (was "Re: [PATCH] genhd: Do not hold event lock when scheduling workqueue elements")

2017-02-08 Thread h...@lst.de
On Wed, Feb 08, 2017 at 10:43:59AM -0700, Jens Axboe wrote: > I've changed the subject line, this issue has nothing to do with the > issue that Hannes was attempting to fix. Nothing really useful in the thread. Dexuan, can you throw in some prints to see which command times out?

Boot regression (was "Re: [PATCH] genhd: Do not hold event lock when scheduling workqueue elements")

2017-02-08 Thread Jens Axboe
kernel.org; >> j...@kernel.org >> Subject: Re: [PATCH] genhd: Do not hold event lock when scheduling workqueue >> elements >> >> On 02/06/2017 11:29 PM, Dexuan Cui wrote: >>>> From: linux-block-ow...@vger.kernel.org [mailto:linux-block- >>>

RE: [PATCH] genhd: Do not hold event lock when scheduling workqueue elements

2017-02-08 Thread Dexuan Cui
> From: Jens Axboe [mailto:ax...@kernel.dk] > Sent: Wednesday, February 8, 2017 00:09 > To: Dexuan Cui ; Bart Van Assche > ; h...@suse.com; h...@suse.de > Cc: h...@lst.de; linux-ker...@vger.kernel.org; linux-block@vger.kernel.org; > j...@kernel.org > Subject: Re: [PATCH] gen

Re: [PATCH] genhd: Do not hold event lock when scheduling workqueue elements

2017-02-07 Thread Jens Axboe
On 02/06/2017 11:29 PM, Dexuan Cui wrote: >> From: linux-block-ow...@vger.kernel.org [mailto:linux-block- >> ow...@vger.kernel.org] On Behalf Of Dexuan Cui >> with the linux-next kernel. >> >> I can boot the guest with linux-next's next-20170130 without any issue, >> but since next-20170131 I haven

RE: [PATCH] genhd: Do not hold event lock when scheduling workqueue elements

2017-02-06 Thread Dexuan Cui
> From: linux-block-ow...@vger.kernel.org [mailto:linux-block- > ow...@vger.kernel.org] On Behalf Of Dexuan Cui > with the linux-next kernel. > > I can boot the guest with linux-next's next-20170130 without any issue, > but since next-20170131 I haven't succeeded in booting the guest. > > With ne

Re: [PATCH] genhd: Do not hold event lock when scheduling workqueue elements

2017-02-06 Thread Bart Van Assche
On Tue, 2017-02-07 at 02:23 +, Dexuan Cui wrote: > Any news on this thread? > > The issue is still blocking Linux from booting up normally in my test. :-( > > Have we identified the faulty patch? > If so, at least I can try to revert it to boot up. It's interesting that you have a reproducib

RE: [PATCH] genhd: Do not hold event lock when scheduling workqueue elements

2017-02-06 Thread Dexuan Cui
l.org; linux-block@vger.kernel.org; > j...@kernel.org > Subject: RE: [PATCH] genhd: Do not hold event lock when scheduling workqueue > elements > > > From: linux-kernel-ow...@vger.kernel.org [mailto:linux-kernel- > > ow...@vger.kernel.org] On Behalf Of Hannes Reinecke > > Sent: We

RE: [PATCH] genhd: Do not hold event lock when scheduling workqueue elements

2017-02-03 Thread Dexuan Cui
er.kernel.org; > j...@kernel.org > Subject: Re: [PATCH] genhd: Do not hold event lock when scheduling workqueue > elements > > On 01/31/2017 01:31 AM, Bart Van Assche wrote: > > On Wed, 2017-01-18 at 10:48 +0100, Hannes Reinecke wrote: > >> @@ -1488,26 +1487,13 @@

Re: [PATCH] genhd: Do not hold event lock when scheduling workqueue elements

2017-01-31 Thread Hannes Reinecke
On 01/31/2017 01:31 AM, Bart Van Assche wrote: > On Wed, 2017-01-18 at 10:48 +0100, Hannes Reinecke wrote: >> @@ -1488,26 +1487,13 @@ static unsigned long disk_events_poll_jiffies(struct >> gendisk *disk) >> void disk_block_events(struct gendisk *disk) >> { >> struct disk_events *ev = di

Re: [PATCH] genhd: Do not hold event lock when scheduling workqueue elements

2017-01-30 Thread Bart Van Assche
On Wed, 2017-01-18 at 10:48 +0100, Hannes Reinecke wrote: > @@ -1488,26 +1487,13 @@ static unsigned long disk_events_poll_jiffies(struct > gendisk *disk) >  void disk_block_events(struct gendisk *disk) >  { > struct disk_events *ev = disk->ev; > -   unsigned long flags; > -   bool

[PATCH] genhd: Do not hold event lock when scheduling workqueue elements

2017-01-18 Thread Hannes Reinecke
When scheduling workqueue elements the callback function might be called directly, so holding the event lock is potentially dangerous as it might lead to a deadlock: [ 989.542827] INFO: task systemd-udevd:459 blocked for more than 480 seconds. [ 989.609721] Not tainted 4.10.0-rc4+ #546 [