Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)

2017-11-23 Thread Christoph Hellwig
Ok, it helps to make sure we're actually doing I/O from the CPU,
I've reproduced it now.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)

2017-11-23 Thread Christoph Hellwig
I can't reproduce it in my VM with adding a new CPU.  Do you have
any interesting blk-mq like actually using multiple queues?  I'll
give that a spin next.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)

2017-11-22 Thread Jens Axboe
On 11/22/2017 12:28 AM, Christoph Hellwig wrote:
> Jens, please don't just revert the commit in your for-linus tree.
> 
> On its own this will totally mess up the interrupt assignments.  Give
> me a bit of time to sort this out properly.

I wasn't going to push it until I heard otherwise. I'll just pop it
off, for-linus isn't a stable branch.

-- 
Jens Axboe

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)

2017-11-21 Thread Christoph Hellwig
Jens, please don't just revert the commit in your for-linus tree.

On its own this will totally mess up the interrupt assignments.  Give
me a bit of time to sort this out properly.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)

2017-11-21 Thread Jens Axboe
On 11/21/2017 01:31 PM, Christian Borntraeger wrote:
> 
> 
> On 11/21/2017 09:21 PM, Jens Axboe wrote:
>> On 11/21/2017 01:19 PM, Christian Borntraeger wrote:
>>>
>>> On 11/21/2017 09:14 PM, Jens Axboe wrote:
 On 11/21/2017 01:12 PM, Christian Borntraeger wrote:
>
>
> On 11/21/2017 08:30 PM, Jens Axboe wrote:
>> On 11/21/2017 12:15 PM, Christian Borntraeger wrote:
>>>
>>>
>>> On 11/21/2017 07:39 PM, Jens Axboe wrote:
 On 11/21/2017 11:27 AM, Jens Axboe wrote:
> On 11/21/2017 11:12 AM, Christian Borntraeger wrote:
>>
>>
>> On 11/21/2017 07:09 PM, Jens Axboe wrote:
>>> On 11/21/2017 10:27 AM, Jens Axboe wrote:
 On 11/21/2017 03:14 AM, Christian Borntraeger wrote:
> Bisect points to
>
> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
> Author: Christoph Hellwig 
> Date:   Mon Jun 26 12:20:57 2017 +0200
>
> blk-mq: Create hctx for each present CPU
> 
> commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.
> 
> Currently we only create hctx for online CPUs, which can lead 
> to a lot
> of churn due to frequent soft offline / online operations.  
> Instead
> allocate one for each present CPU to avoid this and 
> dramatically simplify
> the code.
> 
> Signed-off-by: Christoph Hellwig 
> Reviewed-by: Jens Axboe 
> Cc: Keith Busch 
> Cc: linux-bl...@vger.kernel.org
> Cc: linux-n...@lists.infradead.org
> Link: 
> http://lkml.kernel.org/r/20170626102058.10200-3-...@lst.de
> Signed-off-by: Thomas Gleixner 
> Cc: Oleksandr Natalenko 
> Cc: Mike Galbraith 
> Signed-off-by: Greg Kroah-Hartman 

 I wonder if we're simply not getting the masks updated correctly. 
 I'll
 take a look.
>>>
>>> Can't make it trigger here. We do init for each present CPU, which 
>>> means
>>> that if I offline a few CPUs here and register a queue, those still 
>>> show
>>> up as present (just offline) and get mapped accordingly.
>>>
>>> From the looks of it, your setup is different. If the CPU doesn't 
>>> show
>>> up as present and it gets hotplugged, then I can see how this 
>>> condition
>>> would trigger. What environment are you running this in? We might 
>>> have
>>> to re-introduce the cpu hotplug notifier, right now we just monitor
>>> for a dead cpu and handle that.
>>
>> I am not doing a hot unplug and the replug, I use KVM and add a 
>> previously
>> not available CPU.
>>
>> in libvirt/virsh speak:
>>   4
>
> So that's why we run into problems. It's not present when we load the 
> device,
> but becomes present and online afterwards.
>
> Christoph, we used to handle this just fine, your patch broke it.
>
> I'll see if I can come up with an appropriate fix.

 Can you try the below?
>>>
>>>
>>> It does prevent the crash but it seems that the new CPU is not "used " 
>>> after the hotplug for mq:
>>>
>>>
>>> output with 2 cpus:
>>> /sys/kernel/debug/block/vda
>>> /sys/kernel/debug/block/vda/hctx0
>>> /sys/kernel/debug/block/vda/hctx0/cpu0
>>> /sys/kernel/debug/block/vda/hctx0/cpu0/completed
>>> /sys/kernel/debug/block/vda/hctx0/cpu0/merged
>>> /sys/kernel/debug/block/vda/hctx0/cpu0/dispatched
>>> /sys/kernel/debug/block/vda/hctx0/cpu0/rq_list
>>> /sys/kernel/debug/block/vda/hctx0/active
>>> /sys/kernel/debug/block/vda/hctx0/run
>>> /sys/kernel/debug/block/vda/hctx0/queued
>>> /sys/kernel/debug/block/vda/hctx0/dispatched
>>> /sys/kernel/debug/block/vda/hctx0/io_poll
>>> /sys/kernel/debug/block/vda/hctx0/sched_tags_bitmap
>>> /sys/kernel/debug/block/vda/hctx0/sched_tags
>>> /sys/kernel/debug/block/vda/hctx0/tags_bitmap
>>> /sys/kernel/debug/block/vda/hctx0/tags
>>> /sys/kernel/debug/block/vda/hctx0/ctx_map
>>> /sys/kernel/debug/block/vda/hctx0/busy
>>> /sys/kernel/debug/block/vda/hctx0/dispatch
>>> /sys/kernel/debug/block/vda/hctx0/flags
>>> /sys/kernel/debug/block/vda/hctx0/state
>>> /sys/kernel/debug/block/vda/sched
>>> /sys/kernel/debug/block/vda/sched/dispatch
>>> 

Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)

2017-11-21 Thread Christian Borntraeger


On 11/21/2017 09:21 PM, Jens Axboe wrote:
> On 11/21/2017 01:19 PM, Christian Borntraeger wrote:
>>
>> On 11/21/2017 09:14 PM, Jens Axboe wrote:
>>> On 11/21/2017 01:12 PM, Christian Borntraeger wrote:


 On 11/21/2017 08:30 PM, Jens Axboe wrote:
> On 11/21/2017 12:15 PM, Christian Borntraeger wrote:
>>
>>
>> On 11/21/2017 07:39 PM, Jens Axboe wrote:
>>> On 11/21/2017 11:27 AM, Jens Axboe wrote:
 On 11/21/2017 11:12 AM, Christian Borntraeger wrote:
>
>
> On 11/21/2017 07:09 PM, Jens Axboe wrote:
>> On 11/21/2017 10:27 AM, Jens Axboe wrote:
>>> On 11/21/2017 03:14 AM, Christian Borntraeger wrote:
 Bisect points to

 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
 commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
 Author: Christoph Hellwig 
 Date:   Mon Jun 26 12:20:57 2017 +0200

 blk-mq: Create hctx for each present CPU
 
 commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.
 
 Currently we only create hctx for online CPUs, which can lead 
 to a lot
 of churn due to frequent soft offline / online operations.  
 Instead
 allocate one for each present CPU to avoid this and 
 dramatically simplify
 the code.
 
 Signed-off-by: Christoph Hellwig 
 Reviewed-by: Jens Axboe 
 Cc: Keith Busch 
 Cc: linux-bl...@vger.kernel.org
 Cc: linux-n...@lists.infradead.org
 Link: 
 http://lkml.kernel.org/r/20170626102058.10200-3-...@lst.de
 Signed-off-by: Thomas Gleixner 
 Cc: Oleksandr Natalenko 
 Cc: Mike Galbraith 
 Signed-off-by: Greg Kroah-Hartman 
>>>
>>> I wonder if we're simply not getting the masks updated correctly. 
>>> I'll
>>> take a look.
>>
>> Can't make it trigger here. We do init for each present CPU, which 
>> means
>> that if I offline a few CPUs here and register a queue, those still 
>> show
>> up as present (just offline) and get mapped accordingly.
>>
>> From the looks of it, your setup is different. If the CPU doesn't 
>> show
>> up as present and it gets hotplugged, then I can see how this 
>> condition
>> would trigger. What environment are you running this in? We might 
>> have
>> to re-introduce the cpu hotplug notifier, right now we just monitor
>> for a dead cpu and handle that.
>
> I am not doing a hot unplug and the replug, I use KVM and add a 
> previously
> not available CPU.
>
> in libvirt/virsh speak:
>   4

 So that's why we run into problems. It's not present when we load the 
 device,
 but becomes present and online afterwards.

 Christoph, we used to handle this just fine, your patch broke it.

 I'll see if I can come up with an appropriate fix.
>>>
>>> Can you try the below?
>>
>>
>> It does prevent the crash but it seems that the new CPU is not "used " 
>> after the hotplug for mq:
>>
>>
>> output with 2 cpus:
>> /sys/kernel/debug/block/vda
>> /sys/kernel/debug/block/vda/hctx0
>> /sys/kernel/debug/block/vda/hctx0/cpu0
>> /sys/kernel/debug/block/vda/hctx0/cpu0/completed
>> /sys/kernel/debug/block/vda/hctx0/cpu0/merged
>> /sys/kernel/debug/block/vda/hctx0/cpu0/dispatched
>> /sys/kernel/debug/block/vda/hctx0/cpu0/rq_list
>> /sys/kernel/debug/block/vda/hctx0/active
>> /sys/kernel/debug/block/vda/hctx0/run
>> /sys/kernel/debug/block/vda/hctx0/queued
>> /sys/kernel/debug/block/vda/hctx0/dispatched
>> /sys/kernel/debug/block/vda/hctx0/io_poll
>> /sys/kernel/debug/block/vda/hctx0/sched_tags_bitmap
>> /sys/kernel/debug/block/vda/hctx0/sched_tags
>> /sys/kernel/debug/block/vda/hctx0/tags_bitmap
>> /sys/kernel/debug/block/vda/hctx0/tags
>> /sys/kernel/debug/block/vda/hctx0/ctx_map
>> /sys/kernel/debug/block/vda/hctx0/busy
>> /sys/kernel/debug/block/vda/hctx0/dispatch
>> /sys/kernel/debug/block/vda/hctx0/flags
>> /sys/kernel/debug/block/vda/hctx0/state
>> /sys/kernel/debug/block/vda/sched
>> /sys/kernel/debug/block/vda/sched/dispatch
>> /sys/kernel/debug/block/vda/sched/starved
>> /sys/kernel/debug/block/vda/sched/batching
>> /sys/kernel/debug/block/vda/sched/write_next_rq
>> 

Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)

2017-11-21 Thread Jens Axboe
On 11/21/2017 01:19 PM, Christian Borntraeger wrote:
> 
> On 11/21/2017 09:14 PM, Jens Axboe wrote:
>> On 11/21/2017 01:12 PM, Christian Borntraeger wrote:
>>>
>>>
>>> On 11/21/2017 08:30 PM, Jens Axboe wrote:
 On 11/21/2017 12:15 PM, Christian Borntraeger wrote:
>
>
> On 11/21/2017 07:39 PM, Jens Axboe wrote:
>> On 11/21/2017 11:27 AM, Jens Axboe wrote:
>>> On 11/21/2017 11:12 AM, Christian Borntraeger wrote:


 On 11/21/2017 07:09 PM, Jens Axboe wrote:
> On 11/21/2017 10:27 AM, Jens Axboe wrote:
>> On 11/21/2017 03:14 AM, Christian Borntraeger wrote:
>>> Bisect points to
>>>
>>> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
>>> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
>>> Author: Christoph Hellwig 
>>> Date:   Mon Jun 26 12:20:57 2017 +0200
>>>
>>> blk-mq: Create hctx for each present CPU
>>> 
>>> commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.
>>> 
>>> Currently we only create hctx for online CPUs, which can lead 
>>> to a lot
>>> of churn due to frequent soft offline / online operations.  
>>> Instead
>>> allocate one for each present CPU to avoid this and 
>>> dramatically simplify
>>> the code.
>>> 
>>> Signed-off-by: Christoph Hellwig 
>>> Reviewed-by: Jens Axboe 
>>> Cc: Keith Busch 
>>> Cc: linux-bl...@vger.kernel.org
>>> Cc: linux-n...@lists.infradead.org
>>> Link: http://lkml.kernel.org/r/20170626102058.10200-3-...@lst.de
>>> Signed-off-by: Thomas Gleixner 
>>> Cc: Oleksandr Natalenko 
>>> Cc: Mike Galbraith 
>>> Signed-off-by: Greg Kroah-Hartman 
>>
>> I wonder if we're simply not getting the masks updated correctly. 
>> I'll
>> take a look.
>
> Can't make it trigger here. We do init for each present CPU, which 
> means
> that if I offline a few CPUs here and register a queue, those still 
> show
> up as present (just offline) and get mapped accordingly.
>
> From the looks of it, your setup is different. If the CPU doesn't show
> up as present and it gets hotplugged, then I can see how this 
> condition
> would trigger. What environment are you running this in? We might have
> to re-introduce the cpu hotplug notifier, right now we just monitor
> for a dead cpu and handle that.

 I am not doing a hot unplug and the replug, I use KVM and add a 
 previously
 not available CPU.

 in libvirt/virsh speak:
   4
>>>
>>> So that's why we run into problems. It's not present when we load the 
>>> device,
>>> but becomes present and online afterwards.
>>>
>>> Christoph, we used to handle this just fine, your patch broke it.
>>>
>>> I'll see if I can come up with an appropriate fix.
>>
>> Can you try the below?
>
>
> It does prevent the crash but it seems that the new CPU is not "used " 
> after the hotplug for mq:
>
>
> output with 2 cpus:
> /sys/kernel/debug/block/vda
> /sys/kernel/debug/block/vda/hctx0
> /sys/kernel/debug/block/vda/hctx0/cpu0
> /sys/kernel/debug/block/vda/hctx0/cpu0/completed
> /sys/kernel/debug/block/vda/hctx0/cpu0/merged
> /sys/kernel/debug/block/vda/hctx0/cpu0/dispatched
> /sys/kernel/debug/block/vda/hctx0/cpu0/rq_list
> /sys/kernel/debug/block/vda/hctx0/active
> /sys/kernel/debug/block/vda/hctx0/run
> /sys/kernel/debug/block/vda/hctx0/queued
> /sys/kernel/debug/block/vda/hctx0/dispatched
> /sys/kernel/debug/block/vda/hctx0/io_poll
> /sys/kernel/debug/block/vda/hctx0/sched_tags_bitmap
> /sys/kernel/debug/block/vda/hctx0/sched_tags
> /sys/kernel/debug/block/vda/hctx0/tags_bitmap
> /sys/kernel/debug/block/vda/hctx0/tags
> /sys/kernel/debug/block/vda/hctx0/ctx_map
> /sys/kernel/debug/block/vda/hctx0/busy
> /sys/kernel/debug/block/vda/hctx0/dispatch
> /sys/kernel/debug/block/vda/hctx0/flags
> /sys/kernel/debug/block/vda/hctx0/state
> /sys/kernel/debug/block/vda/sched
> /sys/kernel/debug/block/vda/sched/dispatch
> /sys/kernel/debug/block/vda/sched/starved
> /sys/kernel/debug/block/vda/sched/batching
> /sys/kernel/debug/block/vda/sched/write_next_rq
> /sys/kernel/debug/block/vda/sched/write_fifo_list
> /sys/kernel/debug/block/vda/sched/read_next_rq
> /sys/kernel/debug/block/vda/sched/read_fifo_list
> /sys/kernel/debug/block/vda/write_hints
> 

Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)

2017-11-21 Thread Christian Borntraeger

On 11/21/2017 09:14 PM, Jens Axboe wrote:
> On 11/21/2017 01:12 PM, Christian Borntraeger wrote:
>>
>>
>> On 11/21/2017 08:30 PM, Jens Axboe wrote:
>>> On 11/21/2017 12:15 PM, Christian Borntraeger wrote:


 On 11/21/2017 07:39 PM, Jens Axboe wrote:
> On 11/21/2017 11:27 AM, Jens Axboe wrote:
>> On 11/21/2017 11:12 AM, Christian Borntraeger wrote:
>>>
>>>
>>> On 11/21/2017 07:09 PM, Jens Axboe wrote:
 On 11/21/2017 10:27 AM, Jens Axboe wrote:
> On 11/21/2017 03:14 AM, Christian Borntraeger wrote:
>> Bisect points to
>>
>> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
>> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
>> Author: Christoph Hellwig 
>> Date:   Mon Jun 26 12:20:57 2017 +0200
>>
>> blk-mq: Create hctx for each present CPU
>> 
>> commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.
>> 
>> Currently we only create hctx for online CPUs, which can lead to 
>> a lot
>> of churn due to frequent soft offline / online operations.  
>> Instead
>> allocate one for each present CPU to avoid this and dramatically 
>> simplify
>> the code.
>> 
>> Signed-off-by: Christoph Hellwig 
>> Reviewed-by: Jens Axboe 
>> Cc: Keith Busch 
>> Cc: linux-bl...@vger.kernel.org
>> Cc: linux-n...@lists.infradead.org
>> Link: http://lkml.kernel.org/r/20170626102058.10200-3-...@lst.de
>> Signed-off-by: Thomas Gleixner 
>> Cc: Oleksandr Natalenko 
>> Cc: Mike Galbraith 
>> Signed-off-by: Greg Kroah-Hartman 
>
> I wonder if we're simply not getting the masks updated correctly. I'll
> take a look.

 Can't make it trigger here. We do init for each present CPU, which 
 means
 that if I offline a few CPUs here and register a queue, those still 
 show
 up as present (just offline) and get mapped accordingly.

 From the looks of it, your setup is different. If the CPU doesn't show
 up as present and it gets hotplugged, then I can see how this condition
 would trigger. What environment are you running this in? We might have
 to re-introduce the cpu hotplug notifier, right now we just monitor
 for a dead cpu and handle that.
>>>
>>> I am not doing a hot unplug and the replug, I use KVM and add a 
>>> previously
>>> not available CPU.
>>>
>>> in libvirt/virsh speak:
>>>   4
>>
>> So that's why we run into problems. It's not present when we load the 
>> device,
>> but becomes present and online afterwards.
>>
>> Christoph, we used to handle this just fine, your patch broke it.
>>
>> I'll see if I can come up with an appropriate fix.
>
> Can you try the below?


 It does prevent the crash but it seems that the new CPU is not "used " 
 after the hotplug for mq:


 output with 2 cpus:
 /sys/kernel/debug/block/vda
 /sys/kernel/debug/block/vda/hctx0
 /sys/kernel/debug/block/vda/hctx0/cpu0
 /sys/kernel/debug/block/vda/hctx0/cpu0/completed
 /sys/kernel/debug/block/vda/hctx0/cpu0/merged
 /sys/kernel/debug/block/vda/hctx0/cpu0/dispatched
 /sys/kernel/debug/block/vda/hctx0/cpu0/rq_list
 /sys/kernel/debug/block/vda/hctx0/active
 /sys/kernel/debug/block/vda/hctx0/run
 /sys/kernel/debug/block/vda/hctx0/queued
 /sys/kernel/debug/block/vda/hctx0/dispatched
 /sys/kernel/debug/block/vda/hctx0/io_poll
 /sys/kernel/debug/block/vda/hctx0/sched_tags_bitmap
 /sys/kernel/debug/block/vda/hctx0/sched_tags
 /sys/kernel/debug/block/vda/hctx0/tags_bitmap
 /sys/kernel/debug/block/vda/hctx0/tags
 /sys/kernel/debug/block/vda/hctx0/ctx_map
 /sys/kernel/debug/block/vda/hctx0/busy
 /sys/kernel/debug/block/vda/hctx0/dispatch
 /sys/kernel/debug/block/vda/hctx0/flags
 /sys/kernel/debug/block/vda/hctx0/state
 /sys/kernel/debug/block/vda/sched
 /sys/kernel/debug/block/vda/sched/dispatch
 /sys/kernel/debug/block/vda/sched/starved
 /sys/kernel/debug/block/vda/sched/batching
 /sys/kernel/debug/block/vda/sched/write_next_rq
 /sys/kernel/debug/block/vda/sched/write_fifo_list
 /sys/kernel/debug/block/vda/sched/read_next_rq
 /sys/kernel/debug/block/vda/sched/read_fifo_list
 /sys/kernel/debug/block/vda/write_hints
 /sys/kernel/debug/block/vda/state
 /sys/kernel/debug/block/vda/requeue_list
 /sys/kernel/debug/block/vda/poll_stat
>>>
>>> Try this, basically just a revert.
>>
>> Yes, seems to work.
>>
>> Tested-by: 

Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)

2017-11-21 Thread Jens Axboe
On 11/21/2017 01:12 PM, Christian Borntraeger wrote:
> 
> 
> On 11/21/2017 08:30 PM, Jens Axboe wrote:
>> On 11/21/2017 12:15 PM, Christian Borntraeger wrote:
>>>
>>>
>>> On 11/21/2017 07:39 PM, Jens Axboe wrote:
 On 11/21/2017 11:27 AM, Jens Axboe wrote:
> On 11/21/2017 11:12 AM, Christian Borntraeger wrote:
>>
>>
>> On 11/21/2017 07:09 PM, Jens Axboe wrote:
>>> On 11/21/2017 10:27 AM, Jens Axboe wrote:
 On 11/21/2017 03:14 AM, Christian Borntraeger wrote:
> Bisect points to
>
> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
> Author: Christoph Hellwig 
> Date:   Mon Jun 26 12:20:57 2017 +0200
>
> blk-mq: Create hctx for each present CPU
> 
> commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.
> 
> Currently we only create hctx for online CPUs, which can lead to 
> a lot
> of churn due to frequent soft offline / online operations.  
> Instead
> allocate one for each present CPU to avoid this and dramatically 
> simplify
> the code.
> 
> Signed-off-by: Christoph Hellwig 
> Reviewed-by: Jens Axboe 
> Cc: Keith Busch 
> Cc: linux-bl...@vger.kernel.org
> Cc: linux-n...@lists.infradead.org
> Link: http://lkml.kernel.org/r/20170626102058.10200-3-...@lst.de
> Signed-off-by: Thomas Gleixner 
> Cc: Oleksandr Natalenko 
> Cc: Mike Galbraith 
> Signed-off-by: Greg Kroah-Hartman 

 I wonder if we're simply not getting the masks updated correctly. I'll
 take a look.
>>>
>>> Can't make it trigger here. We do init for each present CPU, which means
>>> that if I offline a few CPUs here and register a queue, those still show
>>> up as present (just offline) and get mapped accordingly.
>>>
>>> From the looks of it, your setup is different. If the CPU doesn't show
>>> up as present and it gets hotplugged, then I can see how this condition
>>> would trigger. What environment are you running this in? We might have
>>> to re-introduce the cpu hotplug notifier, right now we just monitor
>>> for a dead cpu and handle that.
>>
>> I am not doing a hot unplug and the replug, I use KVM and add a 
>> previously
>> not available CPU.
>>
>> in libvirt/virsh speak:
>>   4
>
> So that's why we run into problems. It's not present when we load the 
> device,
> but becomes present and online afterwards.
>
> Christoph, we used to handle this just fine, your patch broke it.
>
> I'll see if I can come up with an appropriate fix.

 Can you try the below?
>>>
>>>
>>> It does prevent the crash but it seems that the new CPU is not "used " 
>>> after the hotplug for mq:
>>>
>>>
>>> output with 2 cpus:
>>> /sys/kernel/debug/block/vda
>>> /sys/kernel/debug/block/vda/hctx0
>>> /sys/kernel/debug/block/vda/hctx0/cpu0
>>> /sys/kernel/debug/block/vda/hctx0/cpu0/completed
>>> /sys/kernel/debug/block/vda/hctx0/cpu0/merged
>>> /sys/kernel/debug/block/vda/hctx0/cpu0/dispatched
>>> /sys/kernel/debug/block/vda/hctx0/cpu0/rq_list
>>> /sys/kernel/debug/block/vda/hctx0/active
>>> /sys/kernel/debug/block/vda/hctx0/run
>>> /sys/kernel/debug/block/vda/hctx0/queued
>>> /sys/kernel/debug/block/vda/hctx0/dispatched
>>> /sys/kernel/debug/block/vda/hctx0/io_poll
>>> /sys/kernel/debug/block/vda/hctx0/sched_tags_bitmap
>>> /sys/kernel/debug/block/vda/hctx0/sched_tags
>>> /sys/kernel/debug/block/vda/hctx0/tags_bitmap
>>> /sys/kernel/debug/block/vda/hctx0/tags
>>> /sys/kernel/debug/block/vda/hctx0/ctx_map
>>> /sys/kernel/debug/block/vda/hctx0/busy
>>> /sys/kernel/debug/block/vda/hctx0/dispatch
>>> /sys/kernel/debug/block/vda/hctx0/flags
>>> /sys/kernel/debug/block/vda/hctx0/state
>>> /sys/kernel/debug/block/vda/sched
>>> /sys/kernel/debug/block/vda/sched/dispatch
>>> /sys/kernel/debug/block/vda/sched/starved
>>> /sys/kernel/debug/block/vda/sched/batching
>>> /sys/kernel/debug/block/vda/sched/write_next_rq
>>> /sys/kernel/debug/block/vda/sched/write_fifo_list
>>> /sys/kernel/debug/block/vda/sched/read_next_rq
>>> /sys/kernel/debug/block/vda/sched/read_fifo_list
>>> /sys/kernel/debug/block/vda/write_hints
>>> /sys/kernel/debug/block/vda/state
>>> /sys/kernel/debug/block/vda/requeue_list
>>> /sys/kernel/debug/block/vda/poll_stat
>>
>> Try this, basically just a revert.
> 
> Yes, seems to work.
> 
> Tested-by: Christian Borntraeger 

Great, thanks for testing.

> Do you know why the original commit made it into 4.12 stable? After all
> it has no Fixes tag and no cc 

Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)

2017-11-21 Thread Christian Borntraeger


On 11/21/2017 08:30 PM, Jens Axboe wrote:
> On 11/21/2017 12:15 PM, Christian Borntraeger wrote:
>>
>>
>> On 11/21/2017 07:39 PM, Jens Axboe wrote:
>>> On 11/21/2017 11:27 AM, Jens Axboe wrote:
 On 11/21/2017 11:12 AM, Christian Borntraeger wrote:
>
>
> On 11/21/2017 07:09 PM, Jens Axboe wrote:
>> On 11/21/2017 10:27 AM, Jens Axboe wrote:
>>> On 11/21/2017 03:14 AM, Christian Borntraeger wrote:
 Bisect points to

 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
 commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
 Author: Christoph Hellwig 
 Date:   Mon Jun 26 12:20:57 2017 +0200

 blk-mq: Create hctx for each present CPU
 
 commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.
 
 Currently we only create hctx for online CPUs, which can lead to a 
 lot
 of churn due to frequent soft offline / online operations.  Instead
 allocate one for each present CPU to avoid this and dramatically 
 simplify
 the code.
 
 Signed-off-by: Christoph Hellwig 
 Reviewed-by: Jens Axboe 
 Cc: Keith Busch 
 Cc: linux-bl...@vger.kernel.org
 Cc: linux-n...@lists.infradead.org
 Link: http://lkml.kernel.org/r/20170626102058.10200-3-...@lst.de
 Signed-off-by: Thomas Gleixner 
 Cc: Oleksandr Natalenko 
 Cc: Mike Galbraith 
 Signed-off-by: Greg Kroah-Hartman 
>>>
>>> I wonder if we're simply not getting the masks updated correctly. I'll
>>> take a look.
>>
>> Can't make it trigger here. We do init for each present CPU, which means
>> that if I offline a few CPUs here and register a queue, those still show
>> up as present (just offline) and get mapped accordingly.
>>
>> From the looks of it, your setup is different. If the CPU doesn't show
>> up as present and it gets hotplugged, then I can see how this condition
>> would trigger. What environment are you running this in? We might have
>> to re-introduce the cpu hotplug notifier, right now we just monitor
>> for a dead cpu and handle that.
>
> I am not doing a hot unplug and the replug, I use KVM and add a previously
> not available CPU.
>
> in libvirt/virsh speak:
>   4

 So that's why we run into problems. It's not present when we load the 
 device,
 but becomes present and online afterwards.

 Christoph, we used to handle this just fine, your patch broke it.

 I'll see if I can come up with an appropriate fix.
>>>
>>> Can you try the below?
>>
>>
>> It does prevent the crash but it seems that the new CPU is not "used " after 
>> the hotplug for mq:
>>
>>
>> output with 2 cpus:
>> /sys/kernel/debug/block/vda
>> /sys/kernel/debug/block/vda/hctx0
>> /sys/kernel/debug/block/vda/hctx0/cpu0
>> /sys/kernel/debug/block/vda/hctx0/cpu0/completed
>> /sys/kernel/debug/block/vda/hctx0/cpu0/merged
>> /sys/kernel/debug/block/vda/hctx0/cpu0/dispatched
>> /sys/kernel/debug/block/vda/hctx0/cpu0/rq_list
>> /sys/kernel/debug/block/vda/hctx0/active
>> /sys/kernel/debug/block/vda/hctx0/run
>> /sys/kernel/debug/block/vda/hctx0/queued
>> /sys/kernel/debug/block/vda/hctx0/dispatched
>> /sys/kernel/debug/block/vda/hctx0/io_poll
>> /sys/kernel/debug/block/vda/hctx0/sched_tags_bitmap
>> /sys/kernel/debug/block/vda/hctx0/sched_tags
>> /sys/kernel/debug/block/vda/hctx0/tags_bitmap
>> /sys/kernel/debug/block/vda/hctx0/tags
>> /sys/kernel/debug/block/vda/hctx0/ctx_map
>> /sys/kernel/debug/block/vda/hctx0/busy
>> /sys/kernel/debug/block/vda/hctx0/dispatch
>> /sys/kernel/debug/block/vda/hctx0/flags
>> /sys/kernel/debug/block/vda/hctx0/state
>> /sys/kernel/debug/block/vda/sched
>> /sys/kernel/debug/block/vda/sched/dispatch
>> /sys/kernel/debug/block/vda/sched/starved
>> /sys/kernel/debug/block/vda/sched/batching
>> /sys/kernel/debug/block/vda/sched/write_next_rq
>> /sys/kernel/debug/block/vda/sched/write_fifo_list
>> /sys/kernel/debug/block/vda/sched/read_next_rq
>> /sys/kernel/debug/block/vda/sched/read_fifo_list
>> /sys/kernel/debug/block/vda/write_hints
>> /sys/kernel/debug/block/vda/state
>> /sys/kernel/debug/block/vda/requeue_list
>> /sys/kernel/debug/block/vda/poll_stat
> 
> Try this, basically just a revert.

Yes, seems to work.

Tested-by: Christian Borntraeger 

Do you know why the original commit made it into 4.12 stable? After all
it has no Fixes tag and no cc stable-


> 
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 11097477eeab..bc1950fa9ef6 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -37,6 +37,9 @@
>  #include "blk-wbt.h"
>  #include "blk-mq-sched.h"
> 
> 

Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)

2017-11-21 Thread Jens Axboe
On 11/21/2017 12:15 PM, Christian Borntraeger wrote:
> 
> 
> On 11/21/2017 07:39 PM, Jens Axboe wrote:
>> On 11/21/2017 11:27 AM, Jens Axboe wrote:
>>> On 11/21/2017 11:12 AM, Christian Borntraeger wrote:


 On 11/21/2017 07:09 PM, Jens Axboe wrote:
> On 11/21/2017 10:27 AM, Jens Axboe wrote:
>> On 11/21/2017 03:14 AM, Christian Borntraeger wrote:
>>> Bisect points to
>>>
>>> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
>>> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
>>> Author: Christoph Hellwig 
>>> Date:   Mon Jun 26 12:20:57 2017 +0200
>>>
>>> blk-mq: Create hctx for each present CPU
>>> 
>>> commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.
>>> 
>>> Currently we only create hctx for online CPUs, which can lead to a 
>>> lot
>>> of churn due to frequent soft offline / online operations.  Instead
>>> allocate one for each present CPU to avoid this and dramatically 
>>> simplify
>>> the code.
>>> 
>>> Signed-off-by: Christoph Hellwig 
>>> Reviewed-by: Jens Axboe 
>>> Cc: Keith Busch 
>>> Cc: linux-bl...@vger.kernel.org
>>> Cc: linux-n...@lists.infradead.org
>>> Link: http://lkml.kernel.org/r/20170626102058.10200-3-...@lst.de
>>> Signed-off-by: Thomas Gleixner 
>>> Cc: Oleksandr Natalenko 
>>> Cc: Mike Galbraith 
>>> Signed-off-by: Greg Kroah-Hartman 
>>
>> I wonder if we're simply not getting the masks updated correctly. I'll
>> take a look.
>
> Can't make it trigger here. We do init for each present CPU, which means
> that if I offline a few CPUs here and register a queue, those still show
> up as present (just offline) and get mapped accordingly.
>
> From the looks of it, your setup is different. If the CPU doesn't show
> up as present and it gets hotplugged, then I can see how this condition
> would trigger. What environment are you running this in? We might have
> to re-introduce the cpu hotplug notifier, right now we just monitor
> for a dead cpu and handle that.

 I am not doing a hot unplug and the replug, I use KVM and add a previously
 not available CPU.

 in libvirt/virsh speak:
   4
>>>
>>> So that's why we run into problems. It's not present when we load the 
>>> device,
>>> but becomes present and online afterwards.
>>>
>>> Christoph, we used to handle this just fine, your patch broke it.
>>>
>>> I'll see if I can come up with an appropriate fix.
>>
>> Can you try the below?
> 
> 
> It does prevent the crash but it seems that the new CPU is not "used " after 
> the hotplug for mq:
> 
> 
> output with 2 cpus:
> /sys/kernel/debug/block/vda
> /sys/kernel/debug/block/vda/hctx0
> /sys/kernel/debug/block/vda/hctx0/cpu0
> /sys/kernel/debug/block/vda/hctx0/cpu0/completed
> /sys/kernel/debug/block/vda/hctx0/cpu0/merged
> /sys/kernel/debug/block/vda/hctx0/cpu0/dispatched
> /sys/kernel/debug/block/vda/hctx0/cpu0/rq_list
> /sys/kernel/debug/block/vda/hctx0/active
> /sys/kernel/debug/block/vda/hctx0/run
> /sys/kernel/debug/block/vda/hctx0/queued
> /sys/kernel/debug/block/vda/hctx0/dispatched
> /sys/kernel/debug/block/vda/hctx0/io_poll
> /sys/kernel/debug/block/vda/hctx0/sched_tags_bitmap
> /sys/kernel/debug/block/vda/hctx0/sched_tags
> /sys/kernel/debug/block/vda/hctx0/tags_bitmap
> /sys/kernel/debug/block/vda/hctx0/tags
> /sys/kernel/debug/block/vda/hctx0/ctx_map
> /sys/kernel/debug/block/vda/hctx0/busy
> /sys/kernel/debug/block/vda/hctx0/dispatch
> /sys/kernel/debug/block/vda/hctx0/flags
> /sys/kernel/debug/block/vda/hctx0/state
> /sys/kernel/debug/block/vda/sched
> /sys/kernel/debug/block/vda/sched/dispatch
> /sys/kernel/debug/block/vda/sched/starved
> /sys/kernel/debug/block/vda/sched/batching
> /sys/kernel/debug/block/vda/sched/write_next_rq
> /sys/kernel/debug/block/vda/sched/write_fifo_list
> /sys/kernel/debug/block/vda/sched/read_next_rq
> /sys/kernel/debug/block/vda/sched/read_fifo_list
> /sys/kernel/debug/block/vda/write_hints
> /sys/kernel/debug/block/vda/state
> /sys/kernel/debug/block/vda/requeue_list
> /sys/kernel/debug/block/vda/poll_stat

Try this, basically just a revert.


diff --git a/block/blk-mq.c b/block/blk-mq.c
index 11097477eeab..bc1950fa9ef6 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -37,6 +37,9 @@
 #include "blk-wbt.h"
 #include "blk-mq-sched.h"
 
+static DEFINE_MUTEX(all_q_mutex);
+static LIST_HEAD(all_q_list);
+
 static bool blk_mq_poll(struct request_queue *q, blk_qc_t cookie);
 static void blk_mq_poll_stats_start(struct request_queue *q);
 static void blk_mq_poll_stats_fn(struct blk_stat_callback *cb);
@@ -2114,8 +2117,8 @@ static void blk_mq_init_cpu_queues(struct request_queue 
*q,

Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)

2017-11-21 Thread Christian Borntraeger


On 11/21/2017 07:39 PM, Jens Axboe wrote:
> On 11/21/2017 11:27 AM, Jens Axboe wrote:
>> On 11/21/2017 11:12 AM, Christian Borntraeger wrote:
>>>
>>>
>>> On 11/21/2017 07:09 PM, Jens Axboe wrote:
 On 11/21/2017 10:27 AM, Jens Axboe wrote:
> On 11/21/2017 03:14 AM, Christian Borntraeger wrote:
>> Bisect points to
>>
>> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
>> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
>> Author: Christoph Hellwig 
>> Date:   Mon Jun 26 12:20:57 2017 +0200
>>
>> blk-mq: Create hctx for each present CPU
>> 
>> commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.
>> 
>> Currently we only create hctx for online CPUs, which can lead to a 
>> lot
>> of churn due to frequent soft offline / online operations.  Instead
>> allocate one for each present CPU to avoid this and dramatically 
>> simplify
>> the code.
>> 
>> Signed-off-by: Christoph Hellwig 
>> Reviewed-by: Jens Axboe 
>> Cc: Keith Busch 
>> Cc: linux-bl...@vger.kernel.org
>> Cc: linux-n...@lists.infradead.org
>> Link: http://lkml.kernel.org/r/20170626102058.10200-3-...@lst.de
>> Signed-off-by: Thomas Gleixner 
>> Cc: Oleksandr Natalenko 
>> Cc: Mike Galbraith 
>> Signed-off-by: Greg Kroah-Hartman 
>
> I wonder if we're simply not getting the masks updated correctly. I'll
> take a look.

 Can't make it trigger here. We do init for each present CPU, which means
 that if I offline a few CPUs here and register a queue, those still show
 up as present (just offline) and get mapped accordingly.

 From the looks of it, your setup is different. If the CPU doesn't show
 up as present and it gets hotplugged, then I can see how this condition
 would trigger. What environment are you running this in? We might have
 to re-introduce the cpu hotplug notifier, right now we just monitor
 for a dead cpu and handle that.
>>>
>>> I am not doing a hot unplug and the replug, I use KVM and add a previously
>>> not available CPU.
>>>
>>> in libvirt/virsh speak:
>>>   4
>>
>> So that's why we run into problems. It's not present when we load the device,
>> but becomes present and online afterwards.
>>
>> Christoph, we used to handle this just fine, your patch broke it.
>>
>> I'll see if I can come up with an appropriate fix.
> 
> Can you try the below?


It does prevent the crash but it seems that the new CPU is not "used " after 
the hotplug for mq:


output with 2 cpus:
/sys/kernel/debug/block/vda
/sys/kernel/debug/block/vda/hctx0
/sys/kernel/debug/block/vda/hctx0/cpu0
/sys/kernel/debug/block/vda/hctx0/cpu0/completed
/sys/kernel/debug/block/vda/hctx0/cpu0/merged
/sys/kernel/debug/block/vda/hctx0/cpu0/dispatched
/sys/kernel/debug/block/vda/hctx0/cpu0/rq_list
/sys/kernel/debug/block/vda/hctx0/active
/sys/kernel/debug/block/vda/hctx0/run
/sys/kernel/debug/block/vda/hctx0/queued
/sys/kernel/debug/block/vda/hctx0/dispatched
/sys/kernel/debug/block/vda/hctx0/io_poll
/sys/kernel/debug/block/vda/hctx0/sched_tags_bitmap
/sys/kernel/debug/block/vda/hctx0/sched_tags
/sys/kernel/debug/block/vda/hctx0/tags_bitmap
/sys/kernel/debug/block/vda/hctx0/tags
/sys/kernel/debug/block/vda/hctx0/ctx_map
/sys/kernel/debug/block/vda/hctx0/busy
/sys/kernel/debug/block/vda/hctx0/dispatch
/sys/kernel/debug/block/vda/hctx0/flags
/sys/kernel/debug/block/vda/hctx0/state
/sys/kernel/debug/block/vda/sched
/sys/kernel/debug/block/vda/sched/dispatch
/sys/kernel/debug/block/vda/sched/starved
/sys/kernel/debug/block/vda/sched/batching
/sys/kernel/debug/block/vda/sched/write_next_rq
/sys/kernel/debug/block/vda/sched/write_fifo_list
/sys/kernel/debug/block/vda/sched/read_next_rq
/sys/kernel/debug/block/vda/sched/read_fifo_list
/sys/kernel/debug/block/vda/write_hints
/sys/kernel/debug/block/vda/state
/sys/kernel/debug/block/vda/requeue_list
/sys/kernel/debug/block/vda/poll_stat

> 
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index b600463791ec..ab3a66e7bd03 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -40,6 +40,7 @@
>  static bool blk_mq_poll(struct request_queue *q, blk_qc_t cookie);
>  static void blk_mq_poll_stats_start(struct request_queue *q);
>  static void blk_mq_poll_stats_fn(struct blk_stat_callback *cb);
> +static void blk_mq_map_swqueue(struct request_queue *q);
> 
>  static int blk_mq_poll_stats_bkt(const struct request *rq)
>  {
> @@ -1947,6 +1950,15 @@ int blk_mq_alloc_rqs(struct blk_mq_tag_set *set, 
> struct blk_mq_tags *tags,
>   return -ENOMEM;
>  }
> 
> +static int blk_mq_hctx_notify_prepare(unsigned int cpu, struct hlist_node 
> *node)
> +{
> + struct blk_mq_hw_ctx *hctx;
> +
> + hctx = hlist_entry_safe(node, struct 

Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)

2017-11-21 Thread Jens Axboe
On 11/21/2017 11:27 AM, Jens Axboe wrote:
> On 11/21/2017 11:12 AM, Christian Borntraeger wrote:
>>
>>
>> On 11/21/2017 07:09 PM, Jens Axboe wrote:
>>> On 11/21/2017 10:27 AM, Jens Axboe wrote:
 On 11/21/2017 03:14 AM, Christian Borntraeger wrote:
> Bisect points to
>
> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
> Author: Christoph Hellwig 
> Date:   Mon Jun 26 12:20:57 2017 +0200
>
> blk-mq: Create hctx for each present CPU
> 
> commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.
> 
> Currently we only create hctx for online CPUs, which can lead to a lot
> of churn due to frequent soft offline / online operations.  Instead
> allocate one for each present CPU to avoid this and dramatically 
> simplify
> the code.
> 
> Signed-off-by: Christoph Hellwig 
> Reviewed-by: Jens Axboe 
> Cc: Keith Busch 
> Cc: linux-bl...@vger.kernel.org
> Cc: linux-n...@lists.infradead.org
> Link: http://lkml.kernel.org/r/20170626102058.10200-3-...@lst.de
> Signed-off-by: Thomas Gleixner 
> Cc: Oleksandr Natalenko 
> Cc: Mike Galbraith 
> Signed-off-by: Greg Kroah-Hartman 

 I wonder if we're simply not getting the masks updated correctly. I'll
 take a look.
>>>
>>> Can't make it trigger here. We do init for each present CPU, which means
>>> that if I offline a few CPUs here and register a queue, those still show
>>> up as present (just offline) and get mapped accordingly.
>>>
>>> From the looks of it, your setup is different. If the CPU doesn't show
>>> up as present and it gets hotplugged, then I can see how this condition
>>> would trigger. What environment are you running this in? We might have
>>> to re-introduce the cpu hotplug notifier, right now we just monitor
>>> for a dead cpu and handle that.
>>
>> I am not doing a hot unplug and the replug, I use KVM and add a previously
>> not available CPU.
>>
>> in libvirt/virsh speak:
>>   4
> 
> So that's why we run into problems. It's not present when we load the device,
> but becomes present and online afterwards.
> 
> Christoph, we used to handle this just fine, your patch broke it.
> 
> I'll see if I can come up with an appropriate fix.

Can you try the below?


diff --git a/block/blk-mq.c b/block/blk-mq.c
index b600463791ec..ab3a66e7bd03 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -40,6 +40,7 @@
 static bool blk_mq_poll(struct request_queue *q, blk_qc_t cookie);
 static void blk_mq_poll_stats_start(struct request_queue *q);
 static void blk_mq_poll_stats_fn(struct blk_stat_callback *cb);
+static void blk_mq_map_swqueue(struct request_queue *q);
 
 static int blk_mq_poll_stats_bkt(const struct request *rq)
 {
@@ -1947,6 +1950,15 @@ int blk_mq_alloc_rqs(struct blk_mq_tag_set *set, struct 
blk_mq_tags *tags,
return -ENOMEM;
 }
 
+static int blk_mq_hctx_notify_prepare(unsigned int cpu, struct hlist_node 
*node)
+{
+   struct blk_mq_hw_ctx *hctx;
+
+   hctx = hlist_entry_safe(node, struct blk_mq_hw_ctx, cpuhp);
+   blk_mq_map_swqueue(hctx->queue);
+   return 0;
+}
+
 /*
  * 'cpu' is going away. splice any existing rq_list entries from this
  * software queue to the hw queue dispatch list, and ensure that it
@@ -1958,7 +1970,7 @@ static int blk_mq_hctx_notify_dead(unsigned int cpu, 
struct hlist_node *node)
struct blk_mq_ctx *ctx;
LIST_HEAD(tmp);
 
-   hctx = hlist_entry_safe(node, struct blk_mq_hw_ctx, cpuhp_dead);
+   hctx = hlist_entry_safe(node, struct blk_mq_hw_ctx, cpuhp);
ctx = __blk_mq_get_ctx(hctx->queue, cpu);
 
spin_lock(>lock);
@@ -1981,8 +1993,7 @@ static int blk_mq_hctx_notify_dead(unsigned int cpu, 
struct hlist_node *node)
 
 static void blk_mq_remove_cpuhp(struct blk_mq_hw_ctx *hctx)
 {
-   cpuhp_state_remove_instance_nocalls(CPUHP_BLK_MQ_DEAD,
-   >cpuhp_dead);
+   cpuhp_state_remove_instance_nocalls(CPUHP_BLK_MQ_PREPARE, >cpuhp);
 }
 
 /* hctx->ctxs will be freed in queue's release handler */
@@ -2039,7 +2050,7 @@ static int blk_mq_init_hctx(struct request_queue *q,
hctx->queue = q;
hctx->flags = set->flags & ~BLK_MQ_F_TAG_SHARED;
 
-   cpuhp_state_add_instance_nocalls(CPUHP_BLK_MQ_DEAD, >cpuhp_dead);
+   cpuhp_state_add_instance_nocalls(CPUHP_BLK_MQ_PREPARE, >cpuhp);
 
hctx->tags = set->tags[hctx_idx];
 
@@ -2974,7 +2987,8 @@ static int __init blk_mq_init(void)
BUILD_BUG_ON((REQ_ATOM_STARTED / BITS_PER_BYTE) !=
(REQ_ATOM_COMPLETE / BITS_PER_BYTE));
 
-   cpuhp_setup_state_multi(CPUHP_BLK_MQ_DEAD, "block/mq:dead", NULL,
+   

Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)

2017-11-21 Thread Jens Axboe
On 11/21/2017 11:12 AM, Christian Borntraeger wrote:
> 
> 
> On 11/21/2017 07:09 PM, Jens Axboe wrote:
>> On 11/21/2017 10:27 AM, Jens Axboe wrote:
>>> On 11/21/2017 03:14 AM, Christian Borntraeger wrote:
 Bisect points to

 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
 commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
 Author: Christoph Hellwig 
 Date:   Mon Jun 26 12:20:57 2017 +0200

 blk-mq: Create hctx for each present CPU
 
 commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.
 
 Currently we only create hctx for online CPUs, which can lead to a lot
 of churn due to frequent soft offline / online operations.  Instead
 allocate one for each present CPU to avoid this and dramatically 
 simplify
 the code.
 
 Signed-off-by: Christoph Hellwig 
 Reviewed-by: Jens Axboe 
 Cc: Keith Busch 
 Cc: linux-bl...@vger.kernel.org
 Cc: linux-n...@lists.infradead.org
 Link: http://lkml.kernel.org/r/20170626102058.10200-3-...@lst.de
 Signed-off-by: Thomas Gleixner 
 Cc: Oleksandr Natalenko 
 Cc: Mike Galbraith 
 Signed-off-by: Greg Kroah-Hartman 
>>>
>>> I wonder if we're simply not getting the masks updated correctly. I'll
>>> take a look.
>>
>> Can't make it trigger here. We do init for each present CPU, which means
>> that if I offline a few CPUs here and register a queue, those still show
>> up as present (just offline) and get mapped accordingly.
>>
>> From the looks of it, your setup is different. If the CPU doesn't show
>> up as present and it gets hotplugged, then I can see how this condition
>> would trigger. What environment are you running this in? We might have
>> to re-introduce the cpu hotplug notifier, right now we just monitor
>> for a dead cpu and handle that.
> 
> I am not doing a hot unplug and the replug, I use KVM and add a previously
> not available CPU.
> 
> in libvirt/virsh speak:
>   4

So that's why we run into problems. It's not present when we load the device,
but becomes present and online afterwards.

Christoph, we used to handle this just fine, your patch broke it.

I'll see if I can come up with an appropriate fix.

-- 
Jens Axboe

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)

2017-11-21 Thread Christian Borntraeger


On 11/21/2017 07:09 PM, Jens Axboe wrote:
> On 11/21/2017 10:27 AM, Jens Axboe wrote:
>> On 11/21/2017 03:14 AM, Christian Borntraeger wrote:
>>> Bisect points to
>>>
>>> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
>>> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
>>> Author: Christoph Hellwig 
>>> Date:   Mon Jun 26 12:20:57 2017 +0200
>>>
>>> blk-mq: Create hctx for each present CPU
>>> 
>>> commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.
>>> 
>>> Currently we only create hctx for online CPUs, which can lead to a lot
>>> of churn due to frequent soft offline / online operations.  Instead
>>> allocate one for each present CPU to avoid this and dramatically 
>>> simplify
>>> the code.
>>> 
>>> Signed-off-by: Christoph Hellwig 
>>> Reviewed-by: Jens Axboe 
>>> Cc: Keith Busch 
>>> Cc: linux-bl...@vger.kernel.org
>>> Cc: linux-n...@lists.infradead.org
>>> Link: http://lkml.kernel.org/r/20170626102058.10200-3-...@lst.de
>>> Signed-off-by: Thomas Gleixner 
>>> Cc: Oleksandr Natalenko 
>>> Cc: Mike Galbraith 
>>> Signed-off-by: Greg Kroah-Hartman 
>>
>> I wonder if we're simply not getting the masks updated correctly. I'll
>> take a look.
> 
> Can't make it trigger here. We do init for each present CPU, which means
> that if I offline a few CPUs here and register a queue, those still show
> up as present (just offline) and get mapped accordingly.
> 
> From the looks of it, your setup is different. If the CPU doesn't show
> up as present and it gets hotplugged, then I can see how this condition
> would trigger. What environment are you running this in? We might have
> to re-introduce the cpu hotplug notifier, right now we just monitor
> for a dead cpu and handle that.

I am not doing a hot unplug and the replug, I use KVM and add a previously
not available CPU.

in libvirt/virsh speak:
  4

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)

2017-11-21 Thread Jens Axboe
On 11/21/2017 10:27 AM, Jens Axboe wrote:
> On 11/21/2017 03:14 AM, Christian Borntraeger wrote:
>> Bisect points to
>>
>> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
>> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
>> Author: Christoph Hellwig 
>> Date:   Mon Jun 26 12:20:57 2017 +0200
>>
>> blk-mq: Create hctx for each present CPU
>> 
>> commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.
>> 
>> Currently we only create hctx for online CPUs, which can lead to a lot
>> of churn due to frequent soft offline / online operations.  Instead
>> allocate one for each present CPU to avoid this and dramatically simplify
>> the code.
>> 
>> Signed-off-by: Christoph Hellwig 
>> Reviewed-by: Jens Axboe 
>> Cc: Keith Busch 
>> Cc: linux-bl...@vger.kernel.org
>> Cc: linux-n...@lists.infradead.org
>> Link: http://lkml.kernel.org/r/20170626102058.10200-3-...@lst.de
>> Signed-off-by: Thomas Gleixner 
>> Cc: Oleksandr Natalenko 
>> Cc: Mike Galbraith 
>> Signed-off-by: Greg Kroah-Hartman 
> 
> I wonder if we're simply not getting the masks updated correctly. I'll
> take a look.

Can't make it trigger here. We do init for each present CPU, which means
that if I offline a few CPUs here and register a queue, those still show
up as present (just offline) and get mapped accordingly.

>From the looks of it, your setup is different. If the CPU doesn't show
up as present and it gets hotplugged, then I can see how this condition
would trigger. What environment are you running this in? We might have
to re-introduce the cpu hotplug notifier, right now we just monitor
for a dead cpu and handle that.

-- 
Jens Axboe

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)

2017-11-21 Thread Jens Axboe
On 11/21/2017 03:14 AM, Christian Borntraeger wrote:
> Bisect points to
> 
> 1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
> commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
> Author: Christoph Hellwig 
> Date:   Mon Jun 26 12:20:57 2017 +0200
> 
> blk-mq: Create hctx for each present CPU
> 
> commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.
> 
> Currently we only create hctx for online CPUs, which can lead to a lot
> of churn due to frequent soft offline / online operations.  Instead
> allocate one for each present CPU to avoid this and dramatically simplify
> the code.
> 
> Signed-off-by: Christoph Hellwig 
> Reviewed-by: Jens Axboe 
> Cc: Keith Busch 
> Cc: linux-bl...@vger.kernel.org
> Cc: linux-n...@lists.infradead.org
> Link: http://lkml.kernel.org/r/20170626102058.10200-3-...@lst.de
> Signed-off-by: Thomas Gleixner 
> Cc: Oleksandr Natalenko 
> Cc: Mike Galbraith 
> Signed-off-by: Greg Kroah-Hartman 

I wonder if we're simply not getting the masks updated correctly. I'll
take a look.

-- 
Jens Axboe

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)

2017-11-21 Thread Christian Borntraeger


On 11/21/2017 10:50 AM, Christian Borntraeger wrote:
> 
> 
> On 11/21/2017 09:35 AM, Christian Borntraeger wrote:
>>
>>
>> On 11/20/2017 09:52 PM, Jens Axboe wrote:
>>> On 11/20/2017 01:49 PM, Christian Borntraeger wrote:


 On 11/20/2017 08:42 PM, Jens Axboe wrote:
> On 11/20/2017 12:29 PM, Christian Borntraeger wrote:
>>
>>
>> On 11/20/2017 08:20 PM, Bart Van Assche wrote:
>>> On Fri, 2017-11-17 at 15:42 +0100, Christian Borntraeger wrote:
 This is 

 b7a71e66d (Jens Axboe2017-08-01 09:28:24 -0600 1141)   
   * are mapped to it.
 b7a71e66d (Jens Axboe2017-08-01 09:28:24 -0600 1142)   
   */
 6a83e74d2 (Bart Van Assche   2016-11-02 10:09:51 -0600 1143)   
  WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) &&
 6a83e74d2 (Bart Van Assche   2016-11-02 10:09:51 -0600 1144)   
  cpu_online(hctx->next_cpu));
 6a83e74d2 (Bart Van Assche   2016-11-02 10:09:51 -0600 1145) 
 b7a71e66d (Jens Axboe2017-08-01 09:28:24 -0600 1146)   
  /*
>>>
>>> Did you really try to figure out when the code that reported the warning
>>> was introduced? I think that warning was introduced through the 
>>> following
>>> commit:
>>
>> This was more a cut'n'paste to show which warning triggered since line 
>> numbers are somewhat volatile.
>>
>>>
>>> commit fd1270d5df6a005e1248e87042159a799cc4b2c9
>>> Date:   Wed Apr 16 09:23:48 2014 -0600
>>>
>>> blk-mq: don't use preempt_count() to check for right CPU
>>>  
>>> UP or CONFIG_PREEMPT_NONE will return 0, and what we really
>>> want to check is whether or not we are on the right CPU.
>>> So don't make PREEMPT part of this, just test the CPU in
>>> the mask directly.
>>>
>>> Anyway, I think that warning is appropriate and useful. So the next step
>>> is to figure out what work item was involved and why that work item got
>>> executed on the wrong CPU.
>>
>> It seems to be related to virtio-blk (is triggered by fio on such 
>> disks). Your comment basically
>> says: "no this is not a known issue" then :-)
>> I will try to take a dump to find out the work item
>
> blk-mq does not attempt to freeze/sync existing work if a CPU goes away,
> and we reconfigure the mappings. So I don't think the above is unexpected,
> if you are doing CPU hot unplug while running a fio job.

 I did a cpu hot plug (adding a CPU) and I started fio AFTER that.
>>>
>>> OK, that's different, we should not be triggering a warning for that.
>>> What does your machine/virtblk topology look like in terms of CPUS,
>>> nr of queues for virtblk, etc?
>>
>> FWIW, 4.11 does work, 4.12 and later is broken.
> 
> In fact: 4.12 is fine, 4.12.14 is broken.


Bisect points to

1b5a7455d345b223d3a4658a9e5fce985b7998c1 is the first bad commit
commit 1b5a7455d345b223d3a4658a9e5fce985b7998c1
Author: Christoph Hellwig 
Date:   Mon Jun 26 12:20:57 2017 +0200

blk-mq: Create hctx for each present CPU

commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08 upstream.

Currently we only create hctx for online CPUs, which can lead to a lot
of churn due to frequent soft offline / online operations.  Instead
allocate one for each present CPU to avoid this and dramatically simplify
the code.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Jens Axboe 
Cc: Keith Busch 
Cc: linux-bl...@vger.kernel.org
Cc: linux-n...@lists.infradead.org
Link: http://lkml.kernel.org/r/20170626102058.10200-3-...@lst.de
Signed-off-by: Thomas Gleixner 
Cc: Oleksandr Natalenko 
Cc: Mike Galbraith 
Signed-off-by: Greg Kroah-Hartman 

:04 04 a61cb023014a7b7a6b9f24ea04fe8ab22299e706 
059ba6dc3290c74e0468937348e580cd53f963e7 M  block
:04 04 432e719d7e738ffcddfb8fc964544d3b3e0a68f7 
f4572aa21b249a851a1b604c148eea109e93b30d M  include





adding Christoph FWIW, your patch triggers the following on 4.14 when doing a 
cpu hotplug (adding a
CPU) and then accessing a virtio-blk device.


  747.652408] [ cut here ]
[  747.652410] WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 
__blk_mq_run_hw_queue+0xd4/0x100
[  747.652410] Modules linked in: dm_multipath
[  747.652412] CPU: 4 PID: 2895 Comm: kworker/4:1H Tainted: GW   
4.14.0+ #191
[  747.652412] Hardware name: IBM 2964 NC9 704 (KVM/Linux)
[  747.652414] Workqueue: kblockd blk_mq_run_work_fn
[  747.652414] task: 6068 task.stack: 5ea3
[  747.652415] Krnl PSW : 0704f0018000 00505864 
(__blk_mq_run_hw_queue+0xd4/0x100)
[  

Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk

2017-11-21 Thread Christian Borntraeger


On 11/21/2017 09:35 AM, Christian Borntraeger wrote:
> 
> 
> On 11/20/2017 09:52 PM, Jens Axboe wrote:
>> On 11/20/2017 01:49 PM, Christian Borntraeger wrote:
>>>
>>>
>>> On 11/20/2017 08:42 PM, Jens Axboe wrote:
 On 11/20/2017 12:29 PM, Christian Borntraeger wrote:
>
>
> On 11/20/2017 08:20 PM, Bart Van Assche wrote:
>> On Fri, 2017-11-17 at 15:42 +0100, Christian Borntraeger wrote:
>>> This is 
>>>
>>> b7a71e66d (Jens Axboe2017-08-01 09:28:24 -0600 1141)
>>>  * are mapped to it.
>>> b7a71e66d (Jens Axboe2017-08-01 09:28:24 -0600 1142)
>>>  */
>>> 6a83e74d2 (Bart Van Assche   2016-11-02 10:09:51 -0600 1143)
>>> WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) &&
>>> 6a83e74d2 (Bart Van Assche   2016-11-02 10:09:51 -0600 1144)
>>> cpu_online(hctx->next_cpu));
>>> 6a83e74d2 (Bart Van Assche   2016-11-02 10:09:51 -0600 1145) 
>>> b7a71e66d (Jens Axboe2017-08-01 09:28:24 -0600 1146)
>>> /*
>>
>> Did you really try to figure out when the code that reported the warning
>> was introduced? I think that warning was introduced through the following
>> commit:
>
> This was more a cut'n'paste to show which warning triggered since line 
> numbers are somewhat volatile.
>
>>
>> commit fd1270d5df6a005e1248e87042159a799cc4b2c9
>> Date:   Wed Apr 16 09:23:48 2014 -0600
>>
>> blk-mq: don't use preempt_count() to check for right CPU
>>  
>> UP or CONFIG_PREEMPT_NONE will return 0, and what we really
>> want to check is whether or not we are on the right CPU.
>> So don't make PREEMPT part of this, just test the CPU in
>> the mask directly.
>>
>> Anyway, I think that warning is appropriate and useful. So the next step
>> is to figure out what work item was involved and why that work item got
>> executed on the wrong CPU.
>
> It seems to be related to virtio-blk (is triggered by fio on such disks). 
> Your comment basically
> says: "no this is not a known issue" then :-)
> I will try to take a dump to find out the work item

 blk-mq does not attempt to freeze/sync existing work if a CPU goes away,
 and we reconfigure the mappings. So I don't think the above is unexpected,
 if you are doing CPU hot unplug while running a fio job.
>>>
>>> I did a cpu hot plug (adding a CPU) and I started fio AFTER that.
>>
>> OK, that's different, we should not be triggering a warning for that.
>> What does your machine/virtblk topology look like in terms of CPUS,
>> nr of queues for virtblk, etc?
> 
> FWIW, 4.11 does work, 4.12 and later is broken.

In fact: 4.12 is fine, 4.12.14 is broken.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk

2017-11-21 Thread Christian Borntraeger


On 11/20/2017 09:52 PM, Jens Axboe wrote:
> On 11/20/2017 01:49 PM, Christian Borntraeger wrote:
>>
>>
>> On 11/20/2017 08:42 PM, Jens Axboe wrote:
>>> On 11/20/2017 12:29 PM, Christian Borntraeger wrote:


 On 11/20/2017 08:20 PM, Bart Van Assche wrote:
> On Fri, 2017-11-17 at 15:42 +0100, Christian Borntraeger wrote:
>> This is 
>>
>> b7a71e66d (Jens Axboe2017-08-01 09:28:24 -0600 1141) 
>> * are mapped to it.
>> b7a71e66d (Jens Axboe2017-08-01 09:28:24 -0600 1142) 
>> */
>> 6a83e74d2 (Bart Van Assche   2016-11-02 10:09:51 -0600 1143)
>> WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) &&
>> 6a83e74d2 (Bart Van Assche   2016-11-02 10:09:51 -0600 1144) 
>>cpu_online(hctx->next_cpu));
>> 6a83e74d2 (Bart Van Assche   2016-11-02 10:09:51 -0600 1145) 
>> b7a71e66d (Jens Axboe2017-08-01 09:28:24 -0600 1146)
>> /*
>
> Did you really try to figure out when the code that reported the warning
> was introduced? I think that warning was introduced through the following
> commit:

 This was more a cut'n'paste to show which warning triggered since line 
 numbers are somewhat volatile.

>
> commit fd1270d5df6a005e1248e87042159a799cc4b2c9
> Date:   Wed Apr 16 09:23:48 2014 -0600
>
> blk-mq: don't use preempt_count() to check for right CPU
>  
> UP or CONFIG_PREEMPT_NONE will return 0, and what we really
> want to check is whether or not we are on the right CPU.
> So don't make PREEMPT part of this, just test the CPU in
> the mask directly.
>
> Anyway, I think that warning is appropriate and useful. So the next step
> is to figure out what work item was involved and why that work item got
> executed on the wrong CPU.

 It seems to be related to virtio-blk (is triggered by fio on such disks). 
 Your comment basically
 says: "no this is not a known issue" then :-)
 I will try to take a dump to find out the work item
>>>
>>> blk-mq does not attempt to freeze/sync existing work if a CPU goes away,
>>> and we reconfigure the mappings. So I don't think the above is unexpected,
>>> if you are doing CPU hot unplug while running a fio job.
>>
>> I did a cpu hot plug (adding a CPU) and I started fio AFTER that.
> 
> OK, that's different, we should not be triggering a warning for that.
> What does your machine/virtblk topology look like in terms of CPUS,
> nr of queues for virtblk, etc?

FWIW, 4.11 does work, 4.12 and later is broken.

> 
> You can probably get this info the easiest by just doing a:
> 
> # find /sys/kernel/debug/block/virtX
> 
> replace virtX with your virtblk device name. Generate this info both
> before and after the hotplug event.

It happens in all variants (1 cpu to 2 or 16 to 17 and independent of the
number of disks).

What I can see is that the block layer does not yet sees the new CPU:

[root@zhyp137 ~]# find /sys/kernel/debug/block/vd* 
/sys/kernel/debug/block/vda
/sys/kernel/debug/block/vda/hctx0
/sys/kernel/debug/block/vda/hctx0/cpu0
/sys/kernel/debug/block/vda/hctx0/cpu0/completed
/sys/kernel/debug/block/vda/hctx0/cpu0/merged
/sys/kernel/debug/block/vda/hctx0/cpu0/dispatched
/sys/kernel/debug/block/vda/hctx0/cpu0/rq_list
/sys/kernel/debug/block/vda/hctx0/active
/sys/kernel/debug/block/vda/hctx0/run
/sys/kernel/debug/block/vda/hctx0/queued
/sys/kernel/debug/block/vda/hctx0/dispatched
/sys/kernel/debug/block/vda/hctx0/io_poll
/sys/kernel/debug/block/vda/hctx0/sched_tags_bitmap
/sys/kernel/debug/block/vda/hctx0/sched_tags
/sys/kernel/debug/block/vda/hctx0/tags_bitmap
/sys/kernel/debug/block/vda/hctx0/tags
/sys/kernel/debug/block/vda/hctx0/ctx_map
/sys/kernel/debug/block/vda/hctx0/busy
/sys/kernel/debug/block/vda/hctx0/dispatch
/sys/kernel/debug/block/vda/hctx0/flags
/sys/kernel/debug/block/vda/hctx0/state
/sys/kernel/debug/block/vda/sched
/sys/kernel/debug/block/vda/sched/dispatch
/sys/kernel/debug/block/vda/sched/starved
/sys/kernel/debug/block/vda/sched/batching
/sys/kernel/debug/block/vda/sched/write_next_rq
/sys/kernel/debug/block/vda/sched/write_fifo_list
/sys/kernel/debug/block/vda/sched/read_next_rq
/sys/kernel/debug/block/vda/sched/read_fifo_list
/sys/kernel/debug/block/vda/write_hints
/sys/kernel/debug/block/vda/state
/sys/kernel/debug/block/vda/requeue_list
/sys/kernel/debug/block/vda/poll_stat

--> in host virsh setvcpu zhyp137 2

[root@zhyp137 ~]# chcpu -e 1
CPU 1 enabled
[root@zhyp137 ~]# find /sys/kernel/debug/block/vd* 
/sys/kernel/debug/block/vda
/sys/kernel/debug/block/vda/hctx0
/sys/kernel/debug/block/vda/hctx0/cpu0
/sys/kernel/debug/block/vda/hctx0/cpu0/completed
/sys/kernel/debug/block/vda/hctx0/cpu0/merged
/sys/kernel/debug/block/vda/hctx0/cpu0/dispatched
/sys/kernel/debug/block/vda/hctx0/cpu0/rq_list
/sys/kernel/debug/block/vda/hctx0/active

Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk

2017-11-20 Thread Jens Axboe
On 11/20/2017 01:49 PM, Christian Borntraeger wrote:
> 
> 
> On 11/20/2017 08:42 PM, Jens Axboe wrote:
>> On 11/20/2017 12:29 PM, Christian Borntraeger wrote:
>>>
>>>
>>> On 11/20/2017 08:20 PM, Bart Van Assche wrote:
 On Fri, 2017-11-17 at 15:42 +0100, Christian Borntraeger wrote:
> This is 
>
> b7a71e66d (Jens Axboe2017-08-01 09:28:24 -0600 1141) 
> * are mapped to it.
> b7a71e66d (Jens Axboe2017-08-01 09:28:24 -0600 1142) 
> */
> 6a83e74d2 (Bart Van Assche   2016-11-02 10:09:51 -0600 1143)
> WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) &&
> 6a83e74d2 (Bart Van Assche   2016-11-02 10:09:51 -0600 1144)  
>   cpu_online(hctx->next_cpu));
> 6a83e74d2 (Bart Van Assche   2016-11-02 10:09:51 -0600 1145) 
> b7a71e66d (Jens Axboe2017-08-01 09:28:24 -0600 1146)/*

 Did you really try to figure out when the code that reported the warning
 was introduced? I think that warning was introduced through the following
 commit:
>>>
>>> This was more a cut'n'paste to show which warning triggered since line 
>>> numbers are somewhat volatile.
>>>

 commit fd1270d5df6a005e1248e87042159a799cc4b2c9
 Date:   Wed Apr 16 09:23:48 2014 -0600

 blk-mq: don't use preempt_count() to check for right CPU
  
 UP or CONFIG_PREEMPT_NONE will return 0, and what we really
 want to check is whether or not we are on the right CPU.
 So don't make PREEMPT part of this, just test the CPU in
 the mask directly.

 Anyway, I think that warning is appropriate and useful. So the next step
 is to figure out what work item was involved and why that work item got
 executed on the wrong CPU.
>>>
>>> It seems to be related to virtio-blk (is triggered by fio on such disks). 
>>> Your comment basically
>>> says: "no this is not a known issue" then :-)
>>> I will try to take a dump to find out the work item
>>
>> blk-mq does not attempt to freeze/sync existing work if a CPU goes away,
>> and we reconfigure the mappings. So I don't think the above is unexpected,
>> if you are doing CPU hot unplug while running a fio job.
> 
> I did a cpu hot plug (adding a CPU) and I started fio AFTER that.

OK, that's different, we should not be triggering a warning for that.
What does your machine/virtblk topology look like in terms of CPUS,
nr of queues for virtblk, etc?

You can probably get this info the easiest by just doing a:

# find /sys/kernel/debug/block/virtX

replace virtX with your virtblk device name. Generate this info both
before and after the hotplug event.

>> While it's a bit annoying that we trigger the WARN_ON() for a condition
>> that can happen, we're basically interested in it if it triggers for
>> normal operations.
> 
> I think we should never trigger a WARN_ON on conditions that can
> happen. I know some folks enabling panic_on_warn to detect/avoid data
> integrity issues. FWIW, this also seems to happen wit 4.13 and 4.12

It's not supposed to happen for your case, so I'd say it's been useful.
It's not a critical thing, but it is something that should not trigger
and we need to look into why it did, and fixing it up.

-- 
Jens Axboe

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk

2017-11-20 Thread Jens Axboe
On 11/20/2017 12:29 PM, Christian Borntraeger wrote:
> 
> 
> On 11/20/2017 08:20 PM, Bart Van Assche wrote:
>> On Fri, 2017-11-17 at 15:42 +0100, Christian Borntraeger wrote:
>>> This is 
>>>
>>> b7a71e66d (Jens Axboe2017-08-01 09:28:24 -0600 1141) * 
>>> are mapped to it.
>>> b7a71e66d (Jens Axboe2017-08-01 09:28:24 -0600 1142) */
>>> 6a83e74d2 (Bart Van Assche   2016-11-02 10:09:51 -0600 1143)
>>> WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) &&
>>> 6a83e74d2 (Bart Van Assche   2016-11-02 10:09:51 -0600 1144)
>>> cpu_online(hctx->next_cpu));
>>> 6a83e74d2 (Bart Van Assche   2016-11-02 10:09:51 -0600 1145) 
>>> b7a71e66d (Jens Axboe2017-08-01 09:28:24 -0600 1146)/*
>>
>> Did you really try to figure out when the code that reported the warning
>> was introduced? I think that warning was introduced through the following
>> commit:
> 
> This was more a cut'n'paste to show which warning triggered since line 
> numbers are somewhat volatile.
> 
>>
>> commit fd1270d5df6a005e1248e87042159a799cc4b2c9
>> Date:   Wed Apr 16 09:23:48 2014 -0600
>>
>> blk-mq: don't use preempt_count() to check for right CPU
>>  
>> UP or CONFIG_PREEMPT_NONE will return 0, and what we really
>> want to check is whether or not we are on the right CPU.
>> So don't make PREEMPT part of this, just test the CPU in
>> the mask directly.
>>
>> Anyway, I think that warning is appropriate and useful. So the next step
>> is to figure out what work item was involved and why that work item got
>> executed on the wrong CPU.
> 
> It seems to be related to virtio-blk (is triggered by fio on such disks). 
> Your comment basically
> says: "no this is not a known issue" then :-)
> I will try to take a dump to find out the work item

blk-mq does not attempt to freeze/sync existing work if a CPU goes away,
and we reconfigure the mappings. So I don't think the above is unexpected,
if you are doing CPU hot unplug while running a fio job.

While it's a bit annoying that we trigger the WARN_ON() for a condition
that can happen, we're basically interested in it if it triggers for
normal operations.

-- 
Jens Axboe

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk

2017-11-20 Thread Christian Borntraeger


On 11/20/2017 08:20 PM, Bart Van Assche wrote:
> On Fri, 2017-11-17 at 15:42 +0100, Christian Borntraeger wrote:
>> This is 
>>
>> b7a71e66d (Jens Axboe2017-08-01 09:28:24 -0600 1141) * 
>> are mapped to it.
>> b7a71e66d (Jens Axboe2017-08-01 09:28:24 -0600 1142) */
>> 6a83e74d2 (Bart Van Assche   2016-11-02 10:09:51 -0600 1143)
>> WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) &&
>> 6a83e74d2 (Bart Van Assche   2016-11-02 10:09:51 -0600 1144) 
>>cpu_online(hctx->next_cpu));
>> 6a83e74d2 (Bart Van Assche   2016-11-02 10:09:51 -0600 1145) 
>> b7a71e66d (Jens Axboe2017-08-01 09:28:24 -0600 1146)/*
> 
> Did you really try to figure out when the code that reported the warning
> was introduced? I think that warning was introduced through the following
> commit:

This was more a cut'n'paste to show which warning triggered since line numbers 
are somewhat volatile.

> 
> commit fd1270d5df6a005e1248e87042159a799cc4b2c9
> Date:   Wed Apr 16 09:23:48 2014 -0600
> 
> blk-mq: don't use preempt_count() to check for right CPU
>  
> UP or CONFIG_PREEMPT_NONE will return 0, and what we really
> want to check is whether or not we are on the right CPU.
> So don't make PREEMPT part of this, just test the CPU in
> the mask directly.
> 
> Anyway, I think that warning is appropriate and useful. So the next step
> is to figure out what work item was involved and why that work item got
> executed on the wrong CPU.

It seems to be related to virtio-blk (is triggered by fio on such disks). Your 
comment basically
says: "no this is not a known issue" then :-)
I will try to take a dump to find out the work item

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk

2017-11-20 Thread Bart Van Assche
On Fri, 2017-11-17 at 15:42 +0100, Christian Borntraeger wrote:
> This is 
> 
> b7a71e66d (Jens Axboe2017-08-01 09:28:24 -0600 1141) * 
> are mapped to it.
> b7a71e66d (Jens Axboe2017-08-01 09:28:24 -0600 1142) */
> 6a83e74d2 (Bart Van Assche   2016-11-02 10:09:51 -0600 1143)
> WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) &&
> 6a83e74d2 (Bart Van Assche   2016-11-02 10:09:51 -0600 1144)  
>   cpu_online(hctx->next_cpu));
> 6a83e74d2 (Bart Van Assche   2016-11-02 10:09:51 -0600 1145) 
> b7a71e66d (Jens Axboe2017-08-01 09:28:24 -0600 1146)/*

Did you really try to figure out when the code that reported the warning
was introduced? I think that warning was introduced through the following
commit:

commit fd1270d5df6a005e1248e87042159a799cc4b2c9
Date:   Wed Apr 16 09:23:48 2014 -0600

blk-mq: don't use preempt_count() to check for right CPU
 
UP or CONFIG_PREEMPT_NONE will return 0, and what we really
want to check is whether or not we are on the right CPU.
So don't make PREEMPT part of this, just test the CPU in
the mask directly.

Anyway, I think that warning is appropriate and useful. So the next step
is to figure out what work item was involved and why that work item got
executed on the wrong CPU.

Bart.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk

2017-11-17 Thread Christian Borntraeger
When doing cpu hotplug in a KVM guest with virtio blk  I get  warnings like
  747.652408] [ cut here ]
[  747.652410] WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 
__blk_mq_run_hw_queue+0xd4/0x100
[  747.652410] Modules linked in: dm_multipath
[  747.652412] CPU: 4 PID: 2895 Comm: kworker/4:1H Tainted: GW   
4.14.0+ #191
[  747.652412] Hardware name: IBM 2964 NC9 704 (KVM/Linux)
[  747.652414] Workqueue: kblockd blk_mq_run_work_fn
[  747.652414] task: 6068 task.stack: 5ea3
[  747.652415] Krnl PSW : 0704f0018000 00505864 
(__blk_mq_run_hw_queue+0xd4/0x100)
[  747.652417]R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:3 PM:0 
RI:0 EA:3
[  747.652417] Krnl GPRS: 0010 00ff 5cbec400 

[  747.652418]63709120  63709500 
59fa44b0
[  747.652418]59fa4480  6370f700 
63709100
[  747.652419]5cbec500 00970948 5ea33d80 
5ea33d48
[  747.652423] Krnl Code: 00505854: ebaff0a4lmg 
%r10,%r15,160(%r15)
   0050585a: c0f4ffe690d3   brcl15,1d7a00
  #00505860: a7f40001   brc 15,505862
  >00505864: 581003b0   l   %r1,944
   00505868: c01b001fff00   nilf%r1,2096896
   0050586e: a784ffdb   brc 8,505824
   00505872: a7f40001   brc 15,505874
   00505876: 9120218f   tm  399(%r2),32
[  747.652435] Call Trace:
[  747.652435] ([<63709600>] 0x63709600)
[  747.652436]  [<00187bcc>] process_one_work+0x264/0x4b8 
[  747.652438]  [<00187e78>] worker_thread+0x58/0x4f8 
[  747.652439]  [<0018ee94>] kthread+0x144/0x168 
[  747.652439]  [<008f8a62>] kernel_thread_starter+0x6/0xc 
[  747.652440]  [<008f8a5c>] kernel_thread_starter+0x0/0xc 
[  747.652440] Last Breaking-Event-Address:
[  747.652441]  [<00505860>] __blk_mq_run_hw_queue+0xd0/0x100
[  747.652442] ---[ end trace 4a001a80379b18ba ]---
[  747.652450] [ cut here ]


This is 

b7a71e66d (Jens Axboe2017-08-01 09:28:24 -0600 1141) * are 
mapped to it.
b7a71e66d (Jens Axboe2017-08-01 09:28:24 -0600 1142) */
6a83e74d2 (Bart Van Assche   2016-11-02 10:09:51 -0600 1143)
WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) &&
6a83e74d2 (Bart Van Assche   2016-11-02 10:09:51 -0600 1144)
cpu_online(hctx->next_cpu));
6a83e74d2 (Bart Van Assche   2016-11-02 10:09:51 -0600 1145) 
b7a71e66d (Jens Axboe2017-08-01 09:28:24 -0600 1146)/*


Is this a known issue?

Christian

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization