Hi Bart,
On Mon, May 14, 2018 at 11:46:33AM -0700, Bart Van Assche wrote:
[...]
> diff --git a/Documentation/features/locking/cmpxchg64/arch-support.txt
> b/Documentation/features/locking/cmpxchg64/arch-support.txt
> new file mode 100644
> index ..65b3290ce5d5
> --- /dev/null
> +++
On Mon, May 14, 2018 at 11:00:08AM -0500, Goldwyn Rodrigues wrote:
> > + if (iop || i_blocksize(inode) == PAGE_SIZE)
> > + return iop;
>
> Why is this an equal comparison operator? Shouldn't this be >= to
> include filesystem blocksize greater than PAGE_SIZE?
Which filesystems would
On Tue, 2018-05-15 at 12:54 +1000, Michael Ellerman wrote:
> Bart Van Assche writes:
> >
> > +---
> > +| arch |status|
> > +---
> > +| alpha: | ok |
> > +| arc: | TODO |
> > +|
On 05/15/2018 02:26 AM, Christoph Hellwig wrote:
> On Mon, May 14, 2018 at 11:00:08AM -0500, Goldwyn Rodrigues wrote:
>>> + if (iop || i_blocksize(inode) == PAGE_SIZE)
>>> + return iop;
>>
>> Why is this an equal comparison operator? Shouldn't this be >= to
>> include filesystem
On Tue, May 15, 2018 at 8:33 AM, Keith Busch
wrote:
> On Tue, May 15, 2018 at 07:47:07AM +0800, Ming Lei wrote:
>> > > > [ 760.727485] nvme nvme1: EH 0: after recovery -19
>> > > > [ 760.727488] nvme nvme1: EH: fail controller
>> > >
>> > > The above issue(hang in
On Mon, 14 May 2018, Bart Van Assche wrote:
> Recently the blk-mq timeout handling code was reworked. See also Tejun
> Heo, "[PATCHSET v4] blk-mq: reimplement timeout handling", 08 Jan 2018
> (https://www.mail-archive.com/linux-block@vger.kernel.org/msg16985.html).
> This patch reworks the blk-mq
Hi ming
On 05/15/2018 08:33 AM, Ming Lei wrote:
> We still have to quiesce admin queue before canceling request, so looks
> the following patch is better, so please ignore the above patch and try
> the following one and see if your hang can be addressed:
>
> diff --git a/drivers/nvme/host/pci.c
On Tue, May 15, 2018 at 06:02:13PM +0800, jianchao.wang wrote:
> Hi ming
>
> On 05/11/2018 08:29 PM, Ming Lei wrote:
> > +static void nvme_eh_done(struct nvme_eh_work *eh_work, int result)
> > +{
> > + struct nvme_dev *dev = eh_work->dev;
> > + bool top_eh;
> > +
> > + spin_lock(>eh_lock);
Hi ming
On 05/11/2018 08:29 PM, Ming Lei wrote:
> +static void nvme_eh_done(struct nvme_eh_work *eh_work, int result)
> +{
> + struct nvme_dev *dev = eh_work->dev;
> + bool top_eh;
> +
> + spin_lock(>eh_lock);
> + top_eh = list_is_last(_work->list, >eh_head);
> +
I managed to obtain SysRq-t when khungtaskd fires using debug printk()
( https://groups.google.com/forum/#!topic/syzkaller-bugs/OTuOsVebAiE ).
Only 4 threads shown below seems to be relevant to this problem.
[ 246.929688] taskPC stack pid father
[ 249.888937]
On Tue, May 15, 2018 at 05:56:14PM +0800, jianchao.wang wrote:
> Hi ming
>
> On 05/15/2018 08:33 AM, Ming Lei wrote:
> > We still have to quiesce admin queue before canceling request, so looks
> > the following patch is better, so please ignore the above patch and try
> > the following one and
On 5/11/18 12:29 AM, Christoph Hellwig wrote:
> On Thu, May 10, 2018 at 03:49:53PM -0600, Andreas Dilger wrote:
>> Would it make sense to change the bio_add_page() and bio_add_pc_page()
>> to use the more common convention instead of continuing the spread of
>> this non-standard calling
From: Jens Axboe
Date: Tue, 15 May 2018 12:51:20 -0600
> On 5/15/18 12:32 PM, Christoph Hellwig wrote:
>> No bio in this whole function, use req->bio instead.
>
> Applied.
>
>> Looks like no one except for Guenters build bot cared. I wonder if we
>> should just get rid of the
No bio in this whole function, use req->bio instead.
Fixes: 37a5b5c6 ("jsflash: handle highmem pages");
Reported-by: Guenter Roeck
Signed-off-by: Christoph Hellwig
---
Looks like no one except for Guenters build bot cared. I wonder if we
should just get rid of
On 5/15/18 12:32 PM, Christoph Hellwig wrote:
> No bio in this whole function, use req->bio instead.
Applied.
> Looks like no one except for Guenters build bot cared. I wonder if we
> should just get rid of the driver given that it doesn't look in a good
> shape at all based on his build logs
On 5/15/18 12:58 PM, David Miller wrote:
> From: Jens Axboe
> Date: Tue, 15 May 2018 12:51:20 -0600
>
>> On 5/15/18 12:32 PM, Christoph Hellwig wrote:
>>> No bio in this whole function, use req->bio instead.
>>
>> Applied.
>>
>>> Looks like no one except for Guenters build bot
From: Jens Axboe
Date: Tue, 15 May 2018 13:00:36 -0600
> On 5/15/18 12:58 PM, David Miller wrote:
>> From: Jens Axboe
>> Date: Tue, 15 May 2018 12:51:20 -0600
>>
>>> On 5/15/18 12:32 PM, Christoph Hellwig wrote:
No bio in this whole function, use req->bio
On 5/15/18 1:51 PM, David Miller wrote:
> From: Jens Axboe
> Date: Tue, 15 May 2018 13:00:36 -0600
>
>> On 5/15/18 12:58 PM, David Miller wrote:
>>> From: Jens Axboe
>>> Date: Tue, 15 May 2018 12:51:20 -0600
>>>
On 5/15/18 12:32 PM, Christoph Hellwig
On Tue, 17 Apr 2018 11:15:43 +0200, Michael Tretter wrote:
> Hi Josef,
>
> On Sat, 14 Apr 2018 01:10:27 +, Josef Bacik wrote:
> > Yeah sorry I screwed that up again. I’m wondering if we can just
> > drop this altogether and leave the zero setting in the config put
> > that we already have.
Author: huhai
Date: Tue May 15 15:15:06 2018 +0800
blk-mq: remove unnecessary judgement from blk_mq_make_request
Whether q->elevator is true or not, we can use blk_mq_sched_insert_request
to complete the work.
Signed-off-by: huhai
Hi Ming
On 05/15/2018 08:56 PM, Ming Lei wrote:
> Looks a nice fix on nvme_create_queue(), but seems the change on
> adapter_alloc_cq() is missed in above patch.
>
> Could you prepare a formal one so that I may integrate it to V6?
Please refer to
Thanks
Jianchao
>From
Author: huhai
Date: Wed May 16 10:34:22 2018 +0800
blk-mq: for sync case, whether it is mq or sq make_request instances, we
should send the request directly
For sq make_request instances, we should issue sync request directly too,
otherwise it will break
Author: huhai
Date: Wed May 16 10:34:22 2018 +0800
blk-mq: for sync case, whether it is mq or sq make_request instances, we
should send the request directly
For sq make_request instances, we should issue sync request directly too,
otherwise it will break
Either the admin or normal IO in reset context may be timed out because
controller error happens. When this timeout happens, we may have to
start controller recovery again.
This patch introduces 'reset_lock' and holds this lock when running reset,
so that we may support nested reset in the
From: "jianchao.wang"
Currently nvmeq->cq_vector is set before alloc cq/sq. If the alloc
cq/sq command timeout, nvme_suspend_queue will invoke free_irq
for the nvmeq because the cq_vector is valid, this will cause warning
'Trying to free already-free IRQ xxx'. set
When admin commands are used in EH for recovering controller, we have to
cover their timeout and can't depend on block's timeout since deadlock may
be caused when these commands are timed-out by block layer again.
Cc: James Smart
Cc: Jianchao Wang
Turns out the current way can't drain timout completely because mod_timer()
can be triggered in the work func, which can be just run inside the synced
timeout work:
del_timer_sync(>timeout);
cancel_work_sync(>timeout_work);
This patch introduces one flag of 'timeout_off' for
Hi,
The 1st patch introduces blk_quiesce_timeout() and blk_unquiesce_timeout()
for NVMe, meantime fixes blk_sync_queue().
The 2nd and 3rd patches fix race between nvme_dev_disable() and
controller reset, and avoids double irq freeing and IO hang after
queues are killed.
The 4th patch covers
When controller is being reset, timeout still may be triggered,
for handling this situation, the contoller state has to be
changed to NVME_CTRL_RESETTING first.
So introduce nvme_force_change_ctrl_state() for this purpose.
Cc: James Smart
Cc: Jianchao Wang
When one req is timed out, now nvme_timeout() handles it by the
following way:
nvme_dev_disable(dev, false);
nvme_reset_ctrl(>ctrl);
return BLK_EH_HANDLED.
There are several issues about the above approach:
1) IO may fail during resetting
Admin IO timeout may be
If it fails to update controller state into LIVE or ADMIN_ONLY, the
controller will be removed, so not necessary to unfreeze queue any
more.
This way will make the following patch easier to not leak the
freeze couner.
Cc: James Smart
Cc: Jianchao Wang
In nvme_dev_disable() called during shutting down controler,
nvme_wait_freeze_timeout() may be done on the controller not
frozen yet, so add the check for avoiding the case.
Cc: James Smart
Cc: Jianchao Wang
Cc: Christoph Hellwig
Once nested EH is introduced, we may not need to handle error
in the inner EH, so move error handling out of nvme_reset_dev().
Meantime return the reset result to caller.
Cc: James Smart
Cc: Jianchao Wang
Cc: Christoph Hellwig
Given timeout event can come during reset, nvme_dev_disable() shouldn't
keep admin queue as quiesced after controller is shutdown. Otherwise
it may block admin IO in reset, and cause reset hang forever.
This patch fixes this issue by unquiescing admin queue at the end
of nvme_dev_disable().
Cc:
When nvme_dev_disable() is used for error recovery, we should always
freeze queues before shutdown controller:
- reset handler supposes queues are frozen, and will wait_freeze &
unfreeze them explicitly, if queues aren't frozen during nvme_dev_disable(),
reset handler may wait forever even though
Bart Van Assche writes:
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index c32a181a7cbb..901365d12dcd 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -149,6 +149,7 @@ config PPC
> select ARCH_HAS_UACCESS_FLUSHCACHE if PPC64
On Wed, May 16, 2018 at 10:04:20AM +0800, Ming Lei wrote:
> On Tue, May 15, 2018 at 05:56:14PM +0800, jianchao.wang wrote:
> > Hi ming
> >
> > On 05/15/2018 08:33 AM, Ming Lei wrote:
> > > We still have to quiesce admin queue before canceling request, so looks
> > > the following patch is better,
Hi ming
On 05/16/2018 10:09 AM, Ming Lei wrote:
> So could you check if only the patch("unquiesce admin queue after shutdown
> controller") can fix your IO hang issue?
I indeed tested this before fix the warning.
It could fix the io hung issue. :)
Thanks
Jianchao
On Tue, May 15, 2018 at 05:56:14PM +0800, jianchao.wang wrote:
> Hi ming
>
> On 05/15/2018 08:33 AM, Ming Lei wrote:
> > We still have to quiesce admin queue before canceling request, so looks
> > the following patch is better, so please ignore the above patch and try
> > the following one and
Recently the blk-mq timeout handling code was reworked. See also Tejun
Heo, "[PATCHSET v4] blk-mq: reimplement timeout handling", 08 Jan 2018
(https://www.mail-archive.com/linux-block@vger.kernel.org/msg16985.html).
This patch reworks the blk-mq timeout handling code again. The timeout
handling
The next patch in this series introduces a call to cmpxchg64()
in the block layer core for those architectures on which this
functionality is available. Make it possible to test whether
cmpxchg64() is available by introducing CONFIG_ARCH_HAVE_CMPXCHG64.
Signed-off-by: Bart Van Assche
Hello Jens,
This is the ninth incarnation of the blk-mq timeout handling rework. All
previously posted comments have been addressed. Please consider this patch
series for inclusion in the upstream kernel.
Bart.
Changes compared to v8:
- Split into two patches.
- Moved the spin_lock_init() call
The next patch in this series introduces a call to cmpxchg64()
in the block layer core for those architectures on which this
functionality is available. Make it possible to test whether
cmpxchg64() is available by introducing CONFIG_ARCH_HAVE_CMPXCHG64.
Signed-off-by: Bart Van Assche
Hello Jens,
This is the tenth incarnation of the blk-mq timeout handling rework. All
previously posted comments should have been addressed. Please consider this
patch series for inclusion in the upstream kernel.
Bart.
Changes compared to v9:
- Addressed multiple comments related to patch 1/2:
On Mon, May 14, 2018 at 06:33:35PM -0600, Keith Busch wrote:
> On Tue, May 15, 2018 at 07:47:07AM +0800, Ming Lei wrote:
> > > > > [ 760.727485] nvme nvme1: EH 0: after recovery -19
> > > > > [ 760.727488] nvme nvme1: EH: fail controller
> > > >
> > > > The above issue(hang in nvme_remove()) is
On 5/9/2018 1:17 PM, Christoph Hellwig wrote:
For the upcoming removal of buffer heads in XFS we need to keep track of
the number of outstanding writeback requests per page. For this we need
to know if bio_add_page merged a region with the previous bvec or not.
Instead of adding additional
On Tue, May 15, 2018 at 08:47:25AM -0500, Goldwyn Rodrigues wrote:
> On 05/15/2018 02:26 AM, Christoph Hellwig wrote:
> > On Mon, May 14, 2018 at 11:00:08AM -0500, Goldwyn Rodrigues wrote:
> >>> + if (iop || i_blocksize(inode) == PAGE_SIZE)
> >>> + return iop;
> >>
> >> Why is this an
47 matches
Mail list logo