Re: races between blk-cgroup operations and I/O scheds in blk-mq (?)

2017-05-18 Thread Paolo Valente

> Il giorno 17 mag 2017, alle ore 21:12, Tejun Heo  ha scritto:
> 
> Hello,
> 
> On Mon, May 15, 2017 at 09:49:13PM +0200, Paolo Valente wrote:
>> So, unless you tell me that there are other races I haven't seen, or,
>> even worse, that I'm just talking nonsense, I have thought of a simple
>> solution to address this issue without resorting to the request_queue
>> lock: further caching, on blkg lookups, the only policy or blkg data
>> the scheduler may use, and access this data directly when needed.  By
>> doing so, the issue is reduced to the occasional use of stale data.
>> And apparently this already happens, e.g., in cfq when it uses the
>> weight of a cfq_queue associated with a process whose group has just
>> been changed (and for which a blkg_lookup has not yet been invoked).
>> The same should happen when cfq invokes cfq_log_cfqq for such a
>> cfq_queue, as this function prints the path of the group the bfq_queue
>> belongs to.
> 
> I haven't studied the code but the problem sounds correct to me.  All
> of blkcg code assumes the use of rq lock.  And, yeah, none of the hot
> paths requires strong synchornization.  All the actual management
> operations can be synchronized separately and the hot lookup path can
> be protected with rcu and maybe percpu reference counters.
> 

Great, thanks for this ack.  User reports do confirm the problem, and,
so far, the effectiveness of a solution I have implemented.  I'm
finalizing the patch for submission.

Thanks,
Paolo

> Thanks.
> 
> -- 
> tejun



Re: races between blk-cgroup operations and I/O scheds in blk-mq (?)

2017-05-18 Thread Paolo Valente

> Il giorno 17 mag 2017, alle ore 21:12, Tejun Heo  ha scritto:
> 
> Hello,
> 
> On Mon, May 15, 2017 at 09:49:13PM +0200, Paolo Valente wrote:
>> So, unless you tell me that there are other races I haven't seen, or,
>> even worse, that I'm just talking nonsense, I have thought of a simple
>> solution to address this issue without resorting to the request_queue
>> lock: further caching, on blkg lookups, the only policy or blkg data
>> the scheduler may use, and access this data directly when needed.  By
>> doing so, the issue is reduced to the occasional use of stale data.
>> And apparently this already happens, e.g., in cfq when it uses the
>> weight of a cfq_queue associated with a process whose group has just
>> been changed (and for which a blkg_lookup has not yet been invoked).
>> The same should happen when cfq invokes cfq_log_cfqq for such a
>> cfq_queue, as this function prints the path of the group the bfq_queue
>> belongs to.
> 
> I haven't studied the code but the problem sounds correct to me.  All
> of blkcg code assumes the use of rq lock.  And, yeah, none of the hot
> paths requires strong synchornization.  All the actual management
> operations can be synchronized separately and the hot lookup path can
> be protected with rcu and maybe percpu reference counters.
> 

Great, thanks for this ack.  User reports do confirm the problem, and,
so far, the effectiveness of a solution I have implemented.  I'm
finalizing the patch for submission.

Thanks,
Paolo

> Thanks.
> 
> -- 
> tejun



Re: races between blk-cgroup operations and I/O scheds in blk-mq (?)

2017-05-17 Thread Tejun Heo
Hello,

On Mon, May 15, 2017 at 09:49:13PM +0200, Paolo Valente wrote:
> So, unless you tell me that there are other races I haven't seen, or,
> even worse, that I'm just talking nonsense, I have thought of a simple
> solution to address this issue without resorting to the request_queue
> lock: further caching, on blkg lookups, the only policy or blkg data
> the scheduler may use, and access this data directly when needed.  By
> doing so, the issue is reduced to the occasional use of stale data.
> And apparently this already happens, e.g., in cfq when it uses the
> weight of a cfq_queue associated with a process whose group has just
> been changed (and for which a blkg_lookup has not yet been invoked).
> The same should happen when cfq invokes cfq_log_cfqq for such a
> cfq_queue, as this function prints the path of the group the bfq_queue
> belongs to.

I haven't studied the code but the problem sounds correct to me.  All
of blkcg code assumes the use of rq lock.  And, yeah, none of the hot
paths requires strong synchornization.  All the actual management
operations can be synchronized separately and the hot lookup path can
be protected with rcu and maybe percpu reference counters.

Thanks.

-- 
tejun


Re: races between blk-cgroup operations and I/O scheds in blk-mq (?)

2017-05-17 Thread Tejun Heo
Hello,

On Mon, May 15, 2017 at 09:49:13PM +0200, Paolo Valente wrote:
> So, unless you tell me that there are other races I haven't seen, or,
> even worse, that I'm just talking nonsense, I have thought of a simple
> solution to address this issue without resorting to the request_queue
> lock: further caching, on blkg lookups, the only policy or blkg data
> the scheduler may use, and access this data directly when needed.  By
> doing so, the issue is reduced to the occasional use of stale data.
> And apparently this already happens, e.g., in cfq when it uses the
> weight of a cfq_queue associated with a process whose group has just
> been changed (and for which a blkg_lookup has not yet been invoked).
> The same should happen when cfq invokes cfq_log_cfqq for such a
> cfq_queue, as this function prints the path of the group the bfq_queue
> belongs to.

I haven't studied the code but the problem sounds correct to me.  All
of blkcg code assumes the use of rq lock.  And, yeah, none of the hot
paths requires strong synchornization.  All the actual management
operations can be synchronized separately and the hot lookup path can
be protected with rcu and maybe percpu reference counters.

Thanks.

-- 
tejun


races between blk-cgroup operations and I/O scheds in blk-mq (?)

2017-05-15 Thread Paolo Valente
Hi Tejun, Jens, and anyone else possibly interested in this issue,
I have realized that, while blk-cgroup operation are of course
protected by the usual request_queue lock, I/O scheduler operations
aren't any longer protected by this same lock in blk-mq.  They are
protected by a finer-grained scheduler lock instead.  If I'm not
missing anything, this exposes to obvious races any I/O scheduler
supporting cgroups, as bfq.  So I have tried to check bfq code,
against blk-cgroup, as carefully as I could.

The only dangerous operations I found in blk-cgroup, for bfq, are blkg
destroy ones. But the scheduler hook related to these operations
(pd_offline) seems to be always invoked before any other, possibly
dangerous, step.  It seems then enough that this hook is executed with
the scheduler lock held, to serialize cgroup and in-scheduler
blkg-lookup operations.

As for in-scheduler operations, the only danger I found so far is the
dereference of the blkg_policy_data pointer field cached in the
descriptor of a group.  Given the parent group of some process in the
scheduler, that pointer may have become a dangling reference if the
policy data it pointed to has been destroyed, but the parent-group
pointer for that process has not yet been updated (the parent pointer
itself is then a dangling reference).  In this respect, these updates
happen (only) after the arrival of new I/O requests after the
destruction of a parent group.

So, unless you tell me that there are other races I haven't seen, or,
even worse, that I'm just talking nonsense, I have thought of a simple
solution to address this issue without resorting to the request_queue
lock: further caching, on blkg lookups, the only policy or blkg data
the scheduler may use, and access this data directly when needed.  By
doing so, the issue is reduced to the occasional use of stale data.
And apparently this already happens, e.g., in cfq when it uses the
weight of a cfq_queue associated with a process whose group has just
been changed (and for which a blkg_lookup has not yet been invoked).
The same should happen when cfq invokes cfq_log_cfqq for such a
cfq_queue, as this function prints the path of the group the bfq_queue
belongs to.

Thanks,
Paolo



races between blk-cgroup operations and I/O scheds in blk-mq (?)

2017-05-15 Thread Paolo Valente
Hi Tejun, Jens, and anyone else possibly interested in this issue,
I have realized that, while blk-cgroup operation are of course
protected by the usual request_queue lock, I/O scheduler operations
aren't any longer protected by this same lock in blk-mq.  They are
protected by a finer-grained scheduler lock instead.  If I'm not
missing anything, this exposes to obvious races any I/O scheduler
supporting cgroups, as bfq.  So I have tried to check bfq code,
against blk-cgroup, as carefully as I could.

The only dangerous operations I found in blk-cgroup, for bfq, are blkg
destroy ones. But the scheduler hook related to these operations
(pd_offline) seems to be always invoked before any other, possibly
dangerous, step.  It seems then enough that this hook is executed with
the scheduler lock held, to serialize cgroup and in-scheduler
blkg-lookup operations.

As for in-scheduler operations, the only danger I found so far is the
dereference of the blkg_policy_data pointer field cached in the
descriptor of a group.  Given the parent group of some process in the
scheduler, that pointer may have become a dangling reference if the
policy data it pointed to has been destroyed, but the parent-group
pointer for that process has not yet been updated (the parent pointer
itself is then a dangling reference).  In this respect, these updates
happen (only) after the arrival of new I/O requests after the
destruction of a parent group.

So, unless you tell me that there are other races I haven't seen, or,
even worse, that I'm just talking nonsense, I have thought of a simple
solution to address this issue without resorting to the request_queue
lock: further caching, on blkg lookups, the only policy or blkg data
the scheduler may use, and access this data directly when needed.  By
doing so, the issue is reduced to the occasional use of stale data.
And apparently this already happens, e.g., in cfq when it uses the
weight of a cfq_queue associated with a process whose group has just
been changed (and for which a blkg_lookup has not yet been invoked).
The same should happen when cfq invokes cfq_log_cfqq for such a
cfq_queue, as this function prints the path of the group the bfq_queue
belongs to.

Thanks,
Paolo