We've triggered a WARNING in blk_throtl_bio when throttling writeback
io, which complains blkg->refcnt is already 0 when calling blkg_get, and
then kernel crashes with invalid page request.
After investigating this issue, we've found there is a race between
blkcg_bio_issue_check and cgroup_rmdir. T
> Il giorno 06 feb 2018, alle ore 19:35, Oleksandr Natalenko
> ha scritto:
>
> Hi.
>
> 06.02.2018 15:50, Paolo Valente wrote:
>> Could you please do a
>> gdb /block/bfq-iosched.o # or vmlinux.o if bfq is builtin
>> list *(bfq_finish_requeue_request+0x54)
>> list *(bfq_put_queue+0x10b)
>> for
On Wed, 2018-02-07 at 10:08 +0100, Paolo Valente wrote:
>
> The first piece of information I need is whether this failure happens
> even without "BFQ hierarchical scheduling support".
I presume you mean BFQ_GROUP_IOSCHED, which I do not have enabled.
-Mike
On Wed, 2018-02-07 at 10:45 +0100, Paolo Valente wrote:
>
> > Il giorno 07 feb 2018, alle ore 10:23, Mike Galbraith ha
> > scritto:
> >
> > On Wed, 2018-02-07 at 10:08 +0100, Paolo Valente wrote:
> >>
> >> The first piece of information I need is whether this failure happens
> >> even without
> Il giorno 07 feb 2018, alle ore 11:15, Mike Galbraith ha
> scritto:
>
> On Wed, 2018-02-07 at 10:45 +0100, Paolo Valente wrote:
>>
>>> Il giorno 07 feb 2018, alle ore 10:23, Mike Galbraith ha
>>> scritto:
>>>
>>> On Wed, 2018-02-07 at 10:08 +0100, Paolo Valente wrote:
The firs
On Wed, 2018-02-07 at 11:27 +0100, Paolo Valente wrote:
>
> 1. Could you paste a stack trace for this OOPS, just to understand how we
> get there?
[ 442.421058] kernel BUG at block/bfq-iosched.c:4742!
[ 442.421762] invalid opcode: [#1] SMP PTI
[ 442.422436] Dumping ftrace buffer:
[ 442.4
On Tue, Feb 6, 2018 at 5:10 PM, Jason Gunthorpe wrote:
> On Tue, Feb 06, 2018 at 01:01:23PM +0100, Roman Penyaev wrote:
>
>> >> +static int ibtrs_ib_dev_init(struct ibtrs_ib_dev *d, struct ib_device
>> >> *dev)
>> >> +{
>> >> + int err;
>> >> +
>> >> + d->pd = ib_alloc_pd(dev, IB_PD_UN
On 30/01/2018 10:33, John Garry wrote:
On 30/01/2018 01:24, Ming Lei wrote:
On Mon, Jan 29, 2018 at 12:56:30PM -0800, James Bottomley wrote:
On Mon, 2018-01-29 at 23:46 +0800, Ming Lei wrote:
[...]
2. When to enable SCSI_MQ at default again?
I'm not sure there's much to discuss ... I think t
On Wed, 2018-02-07 at 11:27 +0100, Paolo Valente wrote:
>
> 2. Could you please turn that BUG_ON into:
> if (!(rq->rq_flags & RQF_ELVPRIV))
> return;
> and see what happens?
That seems to make it forgets how to make boom.
-Mike
> Il giorno 07 feb 2018, alle ore 12:03, Mike Galbraith ha
> scritto:
>
> On Wed, 2018-02-07 at 11:27 +0100, Paolo Valente wrote:
>>
>> 2. Could you please turn that BUG_ON into:
>> if (!(rq->rq_flags & RQF_ELVPRIV))
>> return;
>> and see what happens?
>
> That seems to make it forgets
On Wed, 2018-02-07 at 12:12 +0100, Paolo Valente wrote:
> Just to be certain, before submitting a new patch: you changed *only*
> the BUG_ON at line 4742, on top of my instrumentation patch.
Nah, I completely rewrite it with only a little help from an ouija
board to compensate for missing (all) k
On Mon 05-02-18 17:58:12, Bart Van Assche wrote:
> On Sat, 2018-02-03 at 10:51 +0800, Joseph Qi wrote:
> > Hi Bart,
> >
> > On 18/2/3 00:21, Bart Van Assche wrote:
> > > On Fri, 2018-02-02 at 09:02 +0800, Joseph Qi wrote:
> > > > We triggered this race when using single queue. I'm not sure if it
>
On Wed, Feb 07, 2018 at 07:50:21AM +0100, Hannes Reinecke wrote:
> Hi all,
>
> [ .. ]
> >>
> >> Could you share us your patch for enabling global_tags/MQ on
> > megaraid_sas
> >> so that I can reproduce your test?
> >>
> >>> See below perf top data. "bt_iter" is consuming 4 times more CPU.
> >>
>
On Tue, Feb 6, 2018 at 5:01 PM, Bart Van Assche wrote:
> On Tue, 2018-02-06 at 14:12 +0100, Roman Penyaev wrote:
>> On Mon, Feb 5, 2018 at 1:16 PM, Sagi Grimberg wrote:
>> > [ ... ]
>> > - srp/scst comparison is really not fair having it in legacy request
>> > mode. Can you please repeat it and
Hi Sagi and all,
On Mon, Feb 5, 2018 at 1:30 PM, Sagi Grimberg wrote:
> Hi Roman and the team (again), replying to my own email :)
>
> I forgot to mention that first of all thank you for upstreaming
> your work! I fully support your goal to have your production driver
> upstream to minimize your
> -Original Message-
> From: Ming Lei [mailto:ming@redhat.com]
> Sent: Wednesday, February 7, 2018 5:53 PM
> To: Hannes Reinecke
> Cc: Kashyap Desai; Jens Axboe; linux-block@vger.kernel.org; Christoph
> Hellwig; Mike Snitzer; linux-s...@vger.kernel.org; Arun Easi; Omar
Sandoval;
> Marti
Hi.
07.02.2018 12:27, Paolo Valente wrote:
Hi Oleksandr, Holger,
before I prepare a V2 candidate patch, could you please test my
instrumentation patch too, with the above change made. For your
convenience, I have attached a compressed archive with both the
instrumentation patch and a patch maki
On 02/06/18 15:18, Jens Axboe wrote:
GLOBAL implies that it's, strangely enough, global. That isn't really the
case. Why not call this BLK_MQ_F_HOST_TAGS or something like that? I'd
welcome better names, but global doesn't seem to be a great choice.
BLK_MQ_F_SET_TAGS?
I like the name BLK_MQ_F_
On Wed, 2018-02-07 at 13:57 +0100, Roman Penyaev wrote:
> On Tue, Feb 6, 2018 at 5:01 PM, Bart Van Assche
> wrote:
> > On Tue, 2018-02-06 at 14:12 +0100, Roman Penyaev wrote:
> > Something else I would like to understand better is how much of the latency
> > gap between NVMeOF/SRP and IBNBD can b
On Mon, 5 Feb 2018, Bart Van Assche wrote:
> That approach may work well for your employer but sorry I don't think this is
> sufficient for an upstream driver. I think that most users who configure a
> network storage target expect full control over which storage devices are
> exported
> and also
On Mon, 2018-02-05 at 23:20 +0800, Ming Lei wrote:
> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
> index 55c0a745b427..385bbec73804 100644
> --- a/block/blk-mq-sched.c
> +++ b/block/blk-mq-sched.c
> @@ -81,6 +81,17 @@ static bool blk_mq_sched_restart_hctx(struct blk_mq_hw_ctx
> *hctx)
Hello, Bart.
On Tue, Feb 06, 2018 at 05:11:33PM -0800, Bart Van Assche wrote:
> The following race can occur between the code that resets the timer
> and completion handling:
> - The code that handles BLK_EH_RESET_TIMER resets aborted_gstate.
> - A completion occurs and blk_mq_complete_request() c
On Wed, Feb 7, 2018 at 5:35 PM, Christopher Lameter wrote:
> On Mon, 5 Feb 2018, Bart Van Assche wrote:
>
>> That approach may work well for your employer but sorry I don't think this is
>> sufficient for an upstream driver. I think that most users who configure a
>> network storage target expect
On Wed, 2018-02-07 at 09:06 -0800, Tejun Heo wrote:
> On Tue, Feb 06, 2018 at 05:11:33PM -0800, Bart Van Assche wrote:
> > The following race can occur between the code that resets the timer
> > and completion handling:
> > - The code that handles BLK_EH_RESET_TIMER resets aborted_gstate.
> > - A c
LGTM-- in my staging branch.
Reviewed-by: Michael Lyle
On 02/05/2018 08:20 PM, Coly Li wrote:
> dc->writeback_rate_update_seconds can be set via sysfs and its value can
> be set to [1, ULONG_MAX]. It does not make sense to set such a large
> value, 60 seconds is long enough value considering th
On Wed, 2018-02-07 at 18:18 +0100, Roman Penyaev wrote:
> So the question is: are there real life setups where
> some of the local IB network members can be untrusted?
Hello Roman,
You may want to read more about the latest evolutions with regard to network
security. An article that I can recomme
> Il giorno 07 feb 2018, alle ore 16:50, Oleksandr Natalenko
> ha scritto:
>
> Hi.
>
> 07.02.2018 12:27, Paolo Valente wrote:
>> Hi Oleksandr, Holger,
>> before I prepare a V2 candidate patch, could you please test my
>> instrumentation patch too, with the above change made. For your
>> conv
Reviewed-by: Michael Lyle
On 02/05/2018 08:20 PM, Coly Li wrote:
> In patch "bcache: fix cached_dev->count usage for bch_cache_set_error()",
> cached_dev_get() is called when creating dc->writeback_thread, and
> cached_dev_put() is called when exiting dc->writeback_thread. This
> modification wor
Hello, Bart.
On Wed, Feb 07, 2018 at 05:27:10PM +, Bart Van Assche wrote:
> Even with the above change I think that there is still a race between the
> code that handles timer resets and the completion handler. Anyway, the test
Can you elaborate the scenario a bit further? If you're referrin
I agree about the typos-- after they're fixed,
Reviewed-by: Michael Lyle
On 02/05/2018 08:20 PM, Coly Li wrote:
> When there are too many I/O errors on cache device, current bcache code
> will retire the whole cache set, and detach all bcache devices. But the
> detached bcache devices are not st
detatched should be detached, otherwise lgtm.
Reviewed-by: Michael Lyle
On 02/05/2018 08:20 PM, Coly Li wrote:
> From: Tang Junhui
>
> When we run IO in a detached device, and run iostat to shows IO status,
> normally it will show like bellow (Omitted some fields):
> Device: ... avgrq-sz avgq
Hi,
fixed version, after bug reports and fixes in [1].
Thanks,
Paolo
[1] https://lkml.org/lkml/2018/2/5/599
Paolo Valente (1):
block, bfq: add requeue-request hook
block/bfq-iosched.c | 109
1 file changed, 84 insertions(+), 25 deletions(
Commit 'a6a252e64914 ("blk-mq-sched: decide how to handle flush rq via
RQF_FLUSH_SEQ")' makes all non-flush re-prepared requests for a device
be re-inserted into the active I/O scheduler for that device. As a
consequence, I/O schedulers may get the same request inserted again,
even several times, w
On 2/7/18 11:00 AM, Paolo Valente wrote:
> Commit 'a6a252e64914 ("blk-mq-sched: decide how to handle flush rq via
> RQF_FLUSH_SEQ")' makes all non-flush re-prepared requests for a device
> be re-inserted into the active I/O scheduler for that device. As a
> consequence, I/O schedulers may get the s
On Wed, 2018-02-07 at 09:35 -0800, t...@kernel.org wrote:
> On Wed, Feb 07, 2018 at 05:27:10PM +, Bart Van Assche wrote:
> > Even with the above change I think that there is still a race between the
> > code that handles timer resets and the completion handler.
>
> Can you elaborate the scenar
On Wed, 2018-02-07 at 09:06 -0800, Tejun Heo wrote:
> Can you see whether by any chance the following patch fixes the issue?
> If not, can you share the repro case?
>
> Thanks.
>
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index df93102..651d18c 100644
> --- a/block/blk-mq.c
> +++ b/block/bl
Hi Jens---
Here are a few more changes for 4.16 from Coly Li and Tang Junhui.
Probably the most "functional" is 2/8, which makes a change to improve a
performance bug under sustained small writes. Still, it's a small, safe
change that seems good to take. The rest are small bugfixes.
All have s
From: Tang Junhui
Sometimes, Journal takes up a lot of CPU, we need statistics
to know what's the journal is doing. So this patch provide
some journal statistics:
1) reclaim: how many times the journal try to reclaim resource,
usually the journal bucket or/and the pin are exhausted.
2) flush_w
From: Coly Li
dc->writeback_rate_update_seconds can be set via sysfs and its value can
be set to [1, ULONG_MAX]. It does not make sense to set such a large
value, 60 seconds is long enough value considering the default 5 seconds
works well for long time.
Because dc->writeback_rate_update is a s
From: Tang Junhui
After long time small writing I/O running, we found the occupancy of CPU
is very high and I/O performance has been reduced by about half:
[root@ceph151 internal]# top
top - 15:51:05 up 1 day,2:43, 4 users, load average: 16.89, 15.15, 16.53
Tasks: 2063 total, 4 running, 2059
From: Coly Li
Struct cache uses io_errors for two purposes,
- Error decay: when cache set error_decay is set, io_errors is used to
generate a small piece of delay when I/O error happens.
- I/O errors counter: in order to generate big enough value for error
decay, I/O errors counter value is s
From: Coly Li
Kernel thread routine bch_writeback_thread() has the following code block,
447 down_write(&dc->writeback_lock);
448~450 if (check conditions) {
451 up_write(&dc->writeback_lock);
452 set_current_state(TASK_INTERRUPTIBLE);
453
454
From: Tang Junhui
back-end device sdm has already attached a cache_set with ID
f67ebe1f-f8bc-4d73-bfe5-9dc88607f119, then try to attach with
another cache set, and it returns with an error:
[root]# cd /sys/block/sdm/bcache
[root]# echo 5ccd0a63-148e-48b8-afa2-aca9cbd6279f > attach
-bash: echo: wr
From: Tang Junhui
I attach a back-end device to a cache set, and the cache set is not
registered yet, this back-end device did not attach successfully, and no
error returned:
[root]# echo 87859280-fec6-4bcc-20df7ca8f86b > /sys/block/sde/bcache/attach
[root]#
In sysfs_attach(), the return value "
From: Tang Junhui
After long time running of random small IO writing,
I reboot the machine, and after the machine power on,
I found bcache got stuck, the stack is:
[root@ceph153 ~]# cat /proc/2510/task/*/stack
[] closure_sync+0x25/0x90 [bcache]
[] bch_journal+0x118/0x2b0 [bcache]
[] bch_journal_m
Hello, Bart.
On Wed, Feb 07, 2018 at 06:14:13PM +, Bart Van Assche wrote:
> When I wrote my comment I was not sure whether or not non-reentrancy is
> guaranteed for work queue items. However, according to what I found in the
> workqueue implementation I think that is guaranteed. So it shouldn'
Hello,
On Wed, Feb 07, 2018 at 07:03:56PM +, Bart Van Assche wrote:
> I tried the above patch but already during the first iteration of the test I
> noticed that the test hung, probably due to the following request that got
> stuck:
>
> $ (cd /sys/kernel/debug/block && grep -aH . */*/*/rq_li
On 2/7/18 12:41 PM, Michael Lyle wrote:
> Hi Jens---
>
> Here are a few more changes for 4.16 from Coly Li and Tang Junhui.
>
> Probably the most "functional" is 2/8, which makes a change to improve a
> performance bug under sustained small writes. Still, it's a small, safe
> change that seems g
> Il giorno 07 feb 2018, alle ore 19:06, Jens Axboe ha
> scritto:
>
> On 2/7/18 11:00 AM, Paolo Valente wrote:
>> Commit 'a6a252e64914 ("blk-mq-sched: decide how to handle flush rq via
>> RQF_FLUSH_SEQ")' makes all non-flush re-prepared requests for a device
>> be re-inserted into the active I
On Wed, 2018-02-07 at 12:09 -0800, t...@kernel.org wrote:
> Hello,
>
> On Wed, Feb 07, 2018 at 07:03:56PM +, Bart Van Assche wrote:
> > I tried the above patch but already during the first iteration of the test I
> > noticed that the test hung, probably due to the following request that got
>
Commit 'a6a252e64914 ("blk-mq-sched: decide how to handle flush rq via
RQF_FLUSH_SEQ")' makes all non-flush re-prepared requests for a device
be re-inserted into the active I/O scheduler for that device. As a
consequence, I/O schedulers may get the same request inserted again,
even several times, w
Hello, Joseph.
On Wed, Feb 07, 2018 at 04:40:02PM +0800, Joseph Qi wrote:
> writeback kworker
> blkcg_bio_issue_check
> rcu_read_lock
> blkg_lookup
> <<< *race window*
> blk_throtl_bio
> spin_lock_irq(q->queue_lock)
> spin_unlock_irq(q->queue_lock)
> rcu_read_unlo
Hello,
On Wed, Feb 07, 2018 at 09:02:21PM +, Bart Van Assche wrote:
> The patch that I used in my test had an smp_wmb() call (see also below).
> Anyway,
> I will see whether I can extract more state information through debugfs.
>
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index ef4f6df
On 2/7/18 2:19 PM, Paolo Valente wrote:
> Commit 'a6a252e64914 ("blk-mq-sched: decide how to handle flush rq via
> RQF_FLUSH_SEQ")' makes all non-flush re-prepared requests for a device
> be re-inserted into the active I/O scheduler for that device. As a
> consequence, I/O schedulers may get the sa
On Wed, 2018-02-07 at 12:07 -0800, t...@kernel.org wrote:
> Ah, you're right. u64_stat_sync doesn't imply barriers, so we want
> something like the following.
>
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index df93102..d6edf3b 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -593,7
On Wed, 2018-02-07 at 23:48 +, Bart Van Assche wrote:
> With this patch applied I see requests for which it seems like the timeout
> handler
> did not get invoked: [ ... ]
I just noticed the following in the system log, which is probably the reason
why some
requests got stuck:
Feb 7 15:16:
Hi Kashyap,
On Wed, Feb 07, 2018 at 07:44:04PM +0530, Kashyap Desai wrote:
> > -Original Message-
> > From: Ming Lei [mailto:ming@redhat.com]
> > Sent: Wednesday, February 7, 2018 5:53 PM
> > To: Hannes Reinecke
> > Cc: Kashyap Desai; Jens Axboe; linux-block@vger.kernel.org; Christoph
On Tue, 2018-02-06 at 00:03 +0200, Sagi Grimberg wrote:
> +static int irq_am_register_debugfs(struct irq_am *am)
> +{
> + char name[20];
> +
> + snprintf(name, sizeof(name), "am%u", am->id);
> + am->debugfs_dir = debugfs_create_dir(name,
> + irq_am_debugfs_ro
On Tue, 2018-02-06 at 00:03 +0200, Sagi Grimberg wrote:
> +void irq_poll_init_am(struct irq_poll *iop, unsigned int nr_events,
> +unsigned short nr_levels, unsigned short start_level, irq_poll_am_fn
> *amfn)
> +{
> + iop->amfn = amfn;
> + irq_am_init(&iop->am, nr_events, nr_levels,
Hi Tejun,
Thanks very much for reviewing this patch.
On 18/2/8 05:38, Tejun Heo wrote:
> Hello, Joseph.
>
> On Wed, Feb 07, 2018 at 04:40:02PM +0800, Joseph Qi wrote:
>> writeback kworker
>> blkcg_bio_issue_check
>> rcu_read_lock
>> blkg_lookup
>> <<< *race window*
>> blk_throtl
Hi Tejun,
Could you please kindly review this patch or give some advice?
Thanks.
Jiufei Xue
On 2018/1/23 上午10:08, xuejiufei wrote:
> Cgroup writeback is supported since v4.2. But there exists a problem
> in the following case.
>
> A cgroup may send both buffer and direct/sync IOs. The foregroun
On 02/07/2018 03:14 PM, Kashyap Desai wrote:
>> -Original Message-
>> From: Ming Lei [mailto:ming@redhat.com]
>> Sent: Wednesday, February 7, 2018 5:53 PM
>> To: Hannes Reinecke
>> Cc: Kashyap Desai; Jens Axboe; linux-block@vger.kernel.org; Christoph
>> Hellwig; Mike Snitzer; linux-s...
> Il giorno 07 feb 2018, alle ore 23:18, Jens Axboe ha
> scritto:
>
> On 2/7/18 2:19 PM, Paolo Valente wrote:
>> Commit 'a6a252e64914 ("blk-mq-sched: decide how to handle flush rq via
>> RQF_FLUSH_SEQ")' makes all non-flush re-prepared requests for a device
>> be re-inserted into the active I/
63 matches
Mail list logo