Re: [PATCH v2 2/2] block: Fix a race between the throttling code and request queue initialization

2018-02-07 Thread Jan Kara
On Mon 05-02-18 17:58:12, Bart Van Assche wrote:
> On Sat, 2018-02-03 at 10:51 +0800, Joseph Qi wrote:
> > Hi Bart,
> > 
> > On 18/2/3 00:21, Bart Van Assche wrote:
> > > On Fri, 2018-02-02 at 09:02 +0800, Joseph Qi wrote:
> > > > We triggered this race when using single queue. I'm not sure if it
> > > > exists in multi-queue.
> > > 
> > > Regarding the races between modifying the queue_lock pointer and the code 
> > > that
> > > uses that pointer, I think the following construct in blk_cleanup_queue() 
> > > is
> > > sufficient to avoid races between the queue_lock pointer assignment and 
> > > the code
> > > that executes concurrently with blk_cleanup_queue():
> > > 
> > >   spin_lock_irq(lock);
> > >   if (q->queue_lock != >__queue_lock)
> > >   q->queue_lock = >__queue_lock;
> > >   spin_unlock_irq(lock);
> > > 
> > 
> > IMO, the race also exists.
> > 
> > blk_cleanup_queue   blkcg_print_blkgs
> >   spin_lock_irq(lock) (1)   spin_lock_irq(blkg->q->queue_lock) (2,5)
> > q->queue_lock = >__queue_lock (3)
> >   spin_unlock_irq(lock) (4)
> > spin_unlock_irq(blkg->q->queue_lock) (6)
> > 
> > (1) take driver lock;
> > (2) busy loop for driver lock;
> > (3) override driver lock with internal lock;
> > (4) unlock driver lock; 
> > (5) can take driver lock now;
> > (6) but unlock internal lock.
> > 
> > If we get blkg->q->queue_lock to local first like blk_cleanup_queue,
> > it indeed can fix the different lock use in lock/unlock. But since
> > blk_cleanup_queue has overridden queue lock to internal lock now, I'm
> > afraid we couldn't still use driver lock in blkcg_print_blkgs.
> 
> (+ Jan Kara)
> 
> Hello Joseph,
> 
> That's a good catch. Since modifying all code that accesses the queue_lock
> pointer and that can race with blk_cleanup_queue() would be too cumbersome I
> see only one solution, namely making the request queue cgroup and sysfs
> attributes invisible before the queue_lock pointer is restored. Leaving the
> debugfs attributes visible while blk_cleanup_queue() is in progress should
> be fine if the request queue initialization code is modified such that it
> only modifies the queue_lock pointer for legacy queues. Jan, I think some
> time ago you objected when I proposed to move code from __blk_release_queue()
> into blk_cleanup_queue(). Would you be fine with a slightly different
> approach, namely making block cgroup and sysfs attributes invisible earlier,
> namely from inside blk_cleanup_queue() instead of from inside
> __blk_release_queue()?

Making attributes invisible earlier should be fine. But this whole
switching of queue_lock in blk_cleanup_queue() looks error-prone to me.
Generally anyone having access to request_queue can have old value of
q->queue_lock in its CPU caches and can happily use that value after
blk_cleanup_queue() finishes and the driver specific structure storing the
lock is freed. blkcg_print_blkgs() is one such example but I wouldn't bet a
penny that there are no other paths with a similar problem.

Logically, the lifetime of storage for q->queue_lock should be at least as
long as that of the request_queue itself - i.e., released only after
__blk_release_queue(). Otherwise all users of q->queue_lock need a proper
synchronization against lock switch in blk_cleanup_queue(). Either of these
looks like a lot of work. I guess since this involves only a legacy path,
your approach to move removal sysfs attributes earlier might be a
reasonable band aid.

Honza
-- 
Jan Kara 
SUSE Labs, CR


Re: [PATCH v2 2/2] block: Fix a race between the throttling code and request queue initialization

2018-02-05 Thread Bart Van Assche
On Sat, 2018-02-03 at 10:51 +0800, Joseph Qi wrote:
> Hi Bart,
> 
> On 18/2/3 00:21, Bart Van Assche wrote:
> > On Fri, 2018-02-02 at 09:02 +0800, Joseph Qi wrote:
> > > We triggered this race when using single queue. I'm not sure if it
> > > exists in multi-queue.
> > 
> > Regarding the races between modifying the queue_lock pointer and the code 
> > that
> > uses that pointer, I think the following construct in blk_cleanup_queue() is
> > sufficient to avoid races between the queue_lock pointer assignment and the 
> > code
> > that executes concurrently with blk_cleanup_queue():
> > 
> > spin_lock_irq(lock);
> > if (q->queue_lock != >__queue_lock)
> > q->queue_lock = >__queue_lock;
> > spin_unlock_irq(lock);
> > 
> 
> IMO, the race also exists.
> 
> blk_cleanup_queue   blkcg_print_blkgs
>   spin_lock_irq(lock) (1)   spin_lock_irq(blkg->q->queue_lock) (2,5)
> q->queue_lock = >__queue_lock (3)
>   spin_unlock_irq(lock) (4)
> spin_unlock_irq(blkg->q->queue_lock) (6)
> 
> (1) take driver lock;
> (2) busy loop for driver lock;
> (3) override driver lock with internal lock;
> (4) unlock driver lock; 
> (5) can take driver lock now;
> (6) but unlock internal lock.
> 
> If we get blkg->q->queue_lock to local first like blk_cleanup_queue,
> it indeed can fix the different lock use in lock/unlock. But since
> blk_cleanup_queue has overridden queue lock to internal lock now, I'm
> afraid we couldn't still use driver lock in blkcg_print_blkgs.

(+ Jan Kara)

Hello Joseph,

That's a good catch. Since modifying all code that accesses the queue_lock
pointer and that can race with blk_cleanup_queue() would be too cumbersome I
see only one solution, namely making the request queue cgroup and sysfs
attributes invisible before the queue_lock pointer is restored. Leaving the
debugfs attributes visible while blk_cleanup_queue() is in progress should
be fine if the request queue initialization code is modified such that it
only modifies the queue_lock pointer for legacy queues. Jan, I think some
time ago you objected when I proposed to move code from __blk_release_queue()
into blk_cleanup_queue(). Would you be fine with a slightly different
approach, namely making block cgroup and sysfs attributes invisible earlier,
namely from inside blk_cleanup_queue() instead of from inside
__blk_release_queue()?

Thanks,

Bart.



Re: [PATCH v2 2/2] block: Fix a race between the throttling code and request queue initialization

2018-02-02 Thread Joseph Qi
Hi Bart,

On 18/2/3 00:21, Bart Van Assche wrote:
> On Fri, 2018-02-02 at 09:02 +0800, Joseph Qi wrote:
>> We triggered this race when using single queue. I'm not sure if it
>> exists in multi-queue.
> 
> Regarding the races between modifying the queue_lock pointer and the code that
> uses that pointer, I think the following construct in blk_cleanup_queue() is
> sufficient to avoid races between the queue_lock pointer assignment and the 
> code
> that executes concurrently with blk_cleanup_queue():
> 
>   spin_lock_irq(lock);
>   if (q->queue_lock != >__queue_lock)
>   q->queue_lock = >__queue_lock;
>   spin_unlock_irq(lock);
> 
IMO, the race also exists.

blk_cleanup_queue   blkcg_print_blkgs
  spin_lock_irq(lock) (1)   spin_lock_irq(blkg->q->queue_lock) (2,5)
q->queue_lock = >__queue_lock (3)
  spin_unlock_irq(lock) (4)
spin_unlock_irq(blkg->q->queue_lock) (6)

(1) take driver lock;
(2) busy loop for driver lock;
(3) override driver lock with internal lock;
(4) unlock driver lock; 
(5) can take driver lock now;
(6) but unlock internal lock.

If we get blkg->q->queue_lock to local first like blk_cleanup_queue,
it indeed can fix the different lock use in lock/unlock. But since
blk_cleanup_queue has overridden queue lock to internal lock now, I'm
afraid we couldn't still use driver lock in blkcg_print_blkgs.

Thanks,
Joseph

> In other words, I think that this patch series should be sufficient to address
> all races between .queue_lock assignments and the code that uses that pointer.
> 
> Thanks,
> 
> Bart.
> 


Re: [PATCH v2 2/2] block: Fix a race between the throttling code and request queue initialization

2018-02-02 Thread Bart Van Assche
On Fri, 2018-02-02 at 09:02 +0800, Joseph Qi wrote:
> We triggered this race when using single queue. I'm not sure if it
> exists in multi-queue.

Regarding the races between modifying the queue_lock pointer and the code that
uses that pointer, I think the following construct in blk_cleanup_queue() is
sufficient to avoid races between the queue_lock pointer assignment and the code
that executes concurrently with blk_cleanup_queue():

spin_lock_irq(lock);
if (q->queue_lock != >__queue_lock)
q->queue_lock = >__queue_lock;
spin_unlock_irq(lock);

In other words, I think that this patch series should be sufficient to address
all races between .queue_lock assignments and the code that uses that pointer.

Thanks,

Bart.

Re: [PATCH v2 2/2] block: Fix a race between the throttling code and request queue initialization

2018-02-02 Thread Jens Axboe
On 2/1/18 6:02 PM, Joseph Qi wrote:
> Hi Bart,
> 
> On 18/2/2 00:16, Bart Van Assche wrote:
>> On Thu, 2018-02-01 at 09:53 +0800, Joseph Qi wrote:
>>> I'm afraid the risk may also exist in blk_cleanup_queue, which will
>>> set queue_lock to to the default internal lock.
>>>
>>> spin_lock_irq(lock);
>>> if (q->queue_lock != >__queue_lock)
>>> q->queue_lock = >__queue_lock;
>>> spin_unlock_irq(lock);
>>>
>>> I'm thinking of getting blkg->q->queue_lock to local first, but this
>>> will result in still using driver lock even the queue_lock has already
>>> been set to the default internal lock.
>>
>> Hello Joseph,
>>
>> I think the race between the queue_lock assignment in blk_cleanup_queue()
>> and the use of that pointer by cgroup attributes could be solved by
>> removing the visibility of these attributes from blk_cleanup_queue() instead
>> of __blk_release_queue(). However, last time I proposed to move code from
>> __blk_release_queue() into blk_cleanup_queue() I received the feedback that
>> from some kernel developers that they didn't like this.
>>
>> Is the block driver that triggered the race on the q->queue_lock assignment
>> using legacy (single queue) or multiqueue (blk-mq) mode? If that driver is
>> using legacy mode, are you aware that there are plans to remove legacy mode
>> from the upstream kernel? And if your driver is using multiqueue mode, how
>> about the following change instead of the two patches in this patch series:
>>
> We triggered this race when using single queue. I'm not sure if it
> exists in multi-queue.
> Do you mean upstream won't fix bugs any more in single queue?

No, we'll still fix bugs in the legacy path, we just won't introduce
any new features of accept any new drivers that use that path.
Ultimately that path will go away once there are no more users,
but until then it is maintained.

-- 
Jens Axboe



Re: [PATCH v2 2/2] block: Fix a race between the throttling code and request queue initialization

2018-02-01 Thread Joseph Qi
Hi Bart,

On 18/2/2 00:16, Bart Van Assche wrote:
> On Thu, 2018-02-01 at 09:53 +0800, Joseph Qi wrote:
>> I'm afraid the risk may also exist in blk_cleanup_queue, which will
>> set queue_lock to to the default internal lock.
>>
>> spin_lock_irq(lock);
>> if (q->queue_lock != >__queue_lock)
>>  q->queue_lock = >__queue_lock;
>> spin_unlock_irq(lock);
>>
>> I'm thinking of getting blkg->q->queue_lock to local first, but this
>> will result in still using driver lock even the queue_lock has already
>> been set to the default internal lock.
> 
> Hello Joseph,
> 
> I think the race between the queue_lock assignment in blk_cleanup_queue()
> and the use of that pointer by cgroup attributes could be solved by
> removing the visibility of these attributes from blk_cleanup_queue() instead
> of __blk_release_queue(). However, last time I proposed to move code from
> __blk_release_queue() into blk_cleanup_queue() I received the feedback that
> from some kernel developers that they didn't like this.
> 
> Is the block driver that triggered the race on the q->queue_lock assignment
> using legacy (single queue) or multiqueue (blk-mq) mode? If that driver is
> using legacy mode, are you aware that there are plans to remove legacy mode
> from the upstream kernel? And if your driver is using multiqueue mode, how
> about the following change instead of the two patches in this patch series:
> 
We triggered this race when using single queue. I'm not sure if it
exists in multi-queue.
Do you mean upstream won't fix bugs any more in single queue?

Thanks,
Joseph

> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -1093,7 +1093,7 @@ blk_init_queue_node(request_fn_proc *rfn, spinlock_t 
> *lock, int node_id)
>   return NULL;
>  
>   q->request_fn = rfn;
> - if (lock)
> + if (!q->mq_ops && lock)
>   q->queue_lock = lock;
>   if (blk_init_allocated_queue(q) < 0) {
>   blk_cleanup_queue(q);
> 
> Thanks,
> 
> Bart.
> 


Re: [PATCH v2 2/2] block: Fix a race between the throttling code and request queue initialization

2018-02-01 Thread Bart Van Assche
On Thu, 2018-02-01 at 09:53 +0800, Joseph Qi wrote:
> I'm afraid the risk may also exist in blk_cleanup_queue, which will
> set queue_lock to to the default internal lock.
> 
> spin_lock_irq(lock);
> if (q->queue_lock != >__queue_lock)
>   q->queue_lock = >__queue_lock;
> spin_unlock_irq(lock);
> 
> I'm thinking of getting blkg->q->queue_lock to local first, but this
> will result in still using driver lock even the queue_lock has already
> been set to the default internal lock.

Hello Joseph,

I think the race between the queue_lock assignment in blk_cleanup_queue()
and the use of that pointer by cgroup attributes could be solved by
removing the visibility of these attributes from blk_cleanup_queue() instead
of __blk_release_queue(). However, last time I proposed to move code from
__blk_release_queue() into blk_cleanup_queue() I received the feedback that
from some kernel developers that they didn't like this.

Is the block driver that triggered the race on the q->queue_lock assignment
using legacy (single queue) or multiqueue (blk-mq) mode? If that driver is
using legacy mode, are you aware that there are plans to remove legacy mode
from the upstream kernel? And if your driver is using multiqueue mode, how
about the following change instead of the two patches in this patch series:

--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1093,7 +1093,7 @@ blk_init_queue_node(request_fn_proc *rfn, spinlock_t 
*lock, int node_id)
return NULL;
 
q->request_fn = rfn;
-   if (lock)
+   if (!q->mq_ops && lock)
q->queue_lock = lock;
if (blk_init_allocated_queue(q) < 0) {
blk_cleanup_queue(q);

Thanks,

Bart.

Re: [PATCH v2 2/2] block: Fix a race between the throttling code and request queue initialization

2018-01-31 Thread Joseph Qi
Hi Bart,

On 18/2/1 07:53, Bart Van Assche wrote:
> Initialize the request queue lock earlier such that the following
> race can no longer occur:
> 
> blk_init_queue_node blkcg_print_blkgs
>   blk_alloc_queue_node (1)
> q->queue_lock = >__queue_lock (2)
> blkcg_init_queue(q) (3)
> spin_lock_irq(blkg->q->queue_lock) (4)
>   q->queue_lock = lock (5)
> spin_unlock_irq(blkg->q->queue_lock) (6)
> 
> (1) allocate an uninitialized queue;
> (2) initialize queue_lock to its default internal lock;
> (3) initialize blkcg part of request queue, which will create blkg and
> then insert it to blkg_list;
> (4) traverse blkg_list and find the created blkg, and then take its
> queue lock, here it is the default *internal lock*;
> (5) *race window*, now queue_lock is overridden with *driver specified
> lock*;
> (6) now unlock *driver specified lock*, not the locked *internal lock*,
> unlock balance breaks.
> 
> The changes in this patch are as follows:
> - Move the .queue_lock initialization from blk_init_queue_node() into
>   blk_alloc_queue_node().
> - For all all block drivers that initialize .queue_lock explicitly,
>   change the blk_alloc_queue() call in the driver into a
>   blk_alloc_queue_node() call and remove the explicit .queue_lock
>   initialization. Additionally, initialize the spin lock that will
>   be used as queue lock earlier if necessary.
> 
I'm afraid the risk may also exist in blk_cleanup_queue, which will
set queue_lock to to the default internal lock.

spin_lock_irq(lock);
if (q->queue_lock != >__queue_lock)
q->queue_lock = >__queue_lock;
spin_unlock_irq(lock);

I'm thinking of getting blkg->q->queue_lock to local first, but this
will result in still using driver lock even the queue_lock has already
been set to the default internal lock.

Thanks,
Joseph