Re: [PATCH v6] blk-mq: Avoid that a completion can be ignored for BLK_EH_RESET_TIMER

2018-04-21 Thread Jens Axboe
On 4/21/18 9:05 AM, Bart Van Assche wrote:
> On Sat, 2018-04-21 at 22:55 +0800, jianchao.wang wrote:
>> So we have a __deadline as below:
>>  
>> deadline gen state   
>>> __||_|
>>
>>\___ __/ 
>>V
>>   granularity 
>>
>> and even we usually have a 30~60s timeout,
>> but we should support a 1s granularity at least.
>> How to decide the granularity ?
> 
> Your diagram shows that the uppermost bits are used for the deadline. What I
> proposed is to use the lower 32 bits for the deadline such that we again have
> a resolution of 1/HZ.

It honestly doesn't matter how we do it, granularity could be exactly
the same. But yeah, storing it at the lower bits is perfectly fine,
either way works. Only difference is in the encoding, functionally
they can be identical.

-- 
Jens Axboe



Re: [PATCH v6] blk-mq: Avoid that a completion can be ignored for BLK_EH_RESET_TIMER

2018-04-21 Thread Bart Van Assche
On Sat, 2018-04-21 at 22:55 +0800, jianchao.wang wrote:
> So we have a __deadline as below:
>  
> deadline gen state   
> > __||_|
> 
>\___ __/ 
>V
>   granularity 
> 
> and even we usually have a 30~60s timeout,
> but we should support a 1s granularity at least.
> How to decide the granularity ?

Your diagram shows that the uppermost bits are used for the deadline. What I
proposed is to use the lower 32 bits for the deadline such that we again have
a resolution of 1/HZ.

Bart.




Re: [PATCH v6] blk-mq: Avoid that a completion can be ignored for BLK_EH_RESET_TIMER

2018-04-21 Thread jianchao.wang


On 04/21/2018 10:10 PM, Jens Axboe wrote:
> On 4/21/18 7:34 AM, jianchao.wang wrote:
>> Hi Bart
>>
>> Thanks for your kindly response.
>>
>> On 04/20/2018 10:11 PM, Bart Van Assche wrote:
>>> On Fri, 2018-04-20 at 14:55 +0800, jianchao.wang wrote:
 Hi Bart

 On 04/20/2018 12:43 AM, Bart Van Assche wrote:
> Use the deadline instead of the request generation to detect whether
>   or not a request timer fired after reinitialization of a request

 Maybe use deadline to do this is not suitable.

 Let's think of the following scenario.

 T1/T2 times in nano seconds
 J1/J2 times in jiffies

 rq is started at T1,J1
 blk_mq_timeout_work
   -> blk_mq_check_expired
 -> save rq->__deadline (J1 + MQ_RQ_IN_FLIGHT)

  rq is completed and freed 
  rq is allocated and started 
 again on T2
  rq->__deadline = J2 + 
 MQ_RQ_IN_FLIGHT
   -> synchronize srcu/rcu
   -> blk_mq_terminate_expired
 -> rq->__deadline (J2 + MQ_RQ_IN_FLIGHT)

 1. deadline = rq->__deadline & ~RQ_STATE_MASK (0x3)
if J2-J1 < 4 jiffies, we will get the same deadline value.

 2. even if we do some bit shift when save and get deadline
if T2 - T1 < time of 1 jiffies, we sill get the same deadline.
>>>
>>> Hello Jianchao,
>>>
>>> How about using the upper 16 or 32 bits of rq->__deadline to store a 
>>> generation
>>> number? I don't know any block driver for which the product of (deadline in
>>> seconds) and HZ exceeds (1 << 32).
>>>
>> yes, we don't need so long timeout value.
>> However, req->__deadline is an absolute time, not a relative one, 
>> its type is unsigned long variable which is same with jiffies.
>> If we reserve some bits (not just 1 or 2 bits) of req->__deadline for 
>> generation number,
>> how to handle it when mod_timer ?
> 
> We don't need to support timeouts to the full granularity of jiffies.
> Most normal commands are in the 30-60s range, and some lower level
> commands may take minutes. We're well within the range of chopping
> off some bits and not having to worry about that at all.
> 
Yes.
So we have a __deadline as below:
 
deadline gen state   
|__||_|
   \___ __/ 
   V
  granularity 

and even we usually have a 30~60s timeout,
but we should support a 1s granularity at least.
How to decide the granularity ?

Thanks
Jianchao


Re: [PATCH v6] blk-mq: Avoid that a completion can be ignored for BLK_EH_RESET_TIMER

2018-04-21 Thread Jens Axboe
On 4/21/18 7:34 AM, jianchao.wang wrote:
> Hi Bart
> 
> Thanks for your kindly response.
> 
> On 04/20/2018 10:11 PM, Bart Van Assche wrote:
>> On Fri, 2018-04-20 at 14:55 +0800, jianchao.wang wrote:
>>> Hi Bart
>>>
>>> On 04/20/2018 12:43 AM, Bart Van Assche wrote:
 Use the deadline instead of the request generation to detect whether
   or not a request timer fired after reinitialization of a request
>>>
>>> Maybe use deadline to do this is not suitable.
>>>
>>> Let's think of the following scenario.
>>>
>>> T1/T2 times in nano seconds
>>> J1/J2 times in jiffies
>>>
>>> rq is started at T1,J1
>>> blk_mq_timeout_work
>>>   -> blk_mq_check_expired
>>> -> save rq->__deadline (J1 + MQ_RQ_IN_FLIGHT)
>>>
>>>  rq is completed and freed 
>>>  rq is allocated and started 
>>> again on T2
>>>  rq->__deadline = J2 + 
>>> MQ_RQ_IN_FLIGHT
>>>   -> synchronize srcu/rcu
>>>   -> blk_mq_terminate_expired
>>> -> rq->__deadline (J2 + MQ_RQ_IN_FLIGHT)
>>>
>>> 1. deadline = rq->__deadline & ~RQ_STATE_MASK (0x3)
>>>if J2-J1 < 4 jiffies, we will get the same deadline value.
>>>
>>> 2. even if we do some bit shift when save and get deadline
>>>if T2 - T1 < time of 1 jiffies, we sill get the same deadline.
>>
>> Hello Jianchao,
>>
>> How about using the upper 16 or 32 bits of rq->__deadline to store a 
>> generation
>> number? I don't know any block driver for which the product of (deadline in
>> seconds) and HZ exceeds (1 << 32).
>>
> yes, we don't need so long timeout value.
> However, req->__deadline is an absolute time, not a relative one, 
> its type is unsigned long variable which is same with jiffies.
> If we reserve some bits (not just 1 or 2 bits) of req->__deadline for 
> generation number,
> how to handle it when mod_timer ?

We don't need to support timeouts to the full granularity of jiffies.
Most normal commands are in the 30-60s range, and some lower level
commands may take minutes. We're well within the range of chopping
off some bits and not having to worry about that at all.

-- 
Jens Axboe



Re: [PATCH v6] blk-mq: Avoid that a completion can be ignored for BLK_EH_RESET_TIMER

2018-04-21 Thread jianchao.wang
Hi Bart

Thanks for your kindly response.

On 04/20/2018 10:11 PM, Bart Van Assche wrote:
> On Fri, 2018-04-20 at 14:55 +0800, jianchao.wang wrote:
>> Hi Bart
>>
>> On 04/20/2018 12:43 AM, Bart Van Assche wrote:
>>> Use the deadline instead of the request generation to detect whether
>>>   or not a request timer fired after reinitialization of a request
>>
>> Maybe use deadline to do this is not suitable.
>>
>> Let's think of the following scenario.
>>
>> T1/T2 times in nano seconds
>> J1/J2 times in jiffies
>>
>> rq is started at T1,J1
>> blk_mq_timeout_work
>>   -> blk_mq_check_expired
>> -> save rq->__deadline (J1 + MQ_RQ_IN_FLIGHT)
>>
>>  rq is completed and freed 
>>  rq is allocated and started 
>> again on T2
>>  rq->__deadline = J2 + 
>> MQ_RQ_IN_FLIGHT
>>   -> synchronize srcu/rcu
>>   -> blk_mq_terminate_expired
>> -> rq->__deadline (J2 + MQ_RQ_IN_FLIGHT)
>>
>> 1. deadline = rq->__deadline & ~RQ_STATE_MASK (0x3)
>>if J2-J1 < 4 jiffies, we will get the same deadline value.
>>
>> 2. even if we do some bit shift when save and get deadline
>>if T2 - T1 < time of 1 jiffies, we sill get the same deadline.
> 
> Hello Jianchao,
> 
> How about using the upper 16 or 32 bits of rq->__deadline to store a 
> generation
> number? I don't know any block driver for which the product of (deadline in
> seconds) and HZ exceeds (1 << 32).
> 
yes, we don't need so long timeout value.
However, req->__deadline is an absolute time, not a relative one, 
its type is unsigned long variable which is same with jiffies.
If we reserve some bits (not just 1 or 2 bits) of req->__deadline for 
generation number,
how to handle it when mod_timer ?

Thanks
Jianchao
 


Re: [PATCH v6] blk-mq: Avoid that a completion can be ignored for BLK_EH_RESET_TIMER

2018-04-20 Thread Bart Van Assche
On Fri, 2018-04-20 at 14:55 +0800, jianchao.wang wrote:
> Hi Bart
> 
> On 04/20/2018 12:43 AM, Bart Van Assche wrote:
> > Use the deadline instead of the request generation to detect whether
> >   or not a request timer fired after reinitialization of a request
> 
> Maybe use deadline to do this is not suitable.
> 
> Let's think of the following scenario.
> 
> T1/T2 times in nano seconds
> J1/J2 times in jiffies
> 
> rq is started at T1,J1
> blk_mq_timeout_work
>   -> blk_mq_check_expired
> -> save rq->__deadline (J1 + MQ_RQ_IN_FLIGHT)
> 
>  rq is completed and freed 
>  rq is allocated and started 
> again on T2
>  rq->__deadline = J2 + 
> MQ_RQ_IN_FLIGHT
>   -> synchronize srcu/rcu
>   -> blk_mq_terminate_expired
> -> rq->__deadline (J2 + MQ_RQ_IN_FLIGHT)
> 
> 1. deadline = rq->__deadline & ~RQ_STATE_MASK (0x3)
>if J2-J1 < 4 jiffies, we will get the same deadline value.
> 
> 2. even if we do some bit shift when save and get deadline
>if T2 - T1 < time of 1 jiffies, we sill get the same deadline.

Hello Jianchao,

How about using the upper 16 or 32 bits of rq->__deadline to store a generation
number? I don't know any block driver for which the product of (deadline in
seconds) and HZ exceeds (1 << 32).

Thanks,

Bart.





Re: [PATCH v6] blk-mq: Avoid that a completion can be ignored for BLK_EH_RESET_TIMER

2018-04-20 Thread jianchao.wang
Hi Bart

On 04/20/2018 12:43 AM, Bart Van Assche wrote:
> Use the deadline instead of the request generation to detect whether
>   or not a request timer fired after reinitialization of a request

Maybe use deadline to do this is not suitable.

Let's think of the following scenario.

T1/T2 times in nano seconds
J1/J2 times in jiffies

rq is started at T1,J1
blk_mq_timeout_work
  -> blk_mq_check_expired
-> save rq->__deadline (J1 + MQ_RQ_IN_FLIGHT)

 rq is completed and freed 
 rq is allocated and started again 
on T2
 rq->__deadline = J2 + 
MQ_RQ_IN_FLIGHT
  -> synchronize srcu/rcu
  -> blk_mq_terminate_expired
-> rq->__deadline (J2 + MQ_RQ_IN_FLIGHT)

1. deadline = rq->__deadline & ~RQ_STATE_MASK (0x3)
   if J2-J1 < 4 jiffies, we will get the same deadline value.

2. even if we do some bit shift when save and get deadline
   if T2 - T1 < time of 1 jiffies, we sill get the same deadline.

Thanks
Jianchao


Re: [PATCH v6] blk-mq: Avoid that a completion can be ignored for BLK_EH_RESET_TIMER

2018-04-19 Thread Jens Axboe
On 4/19/18 10:43 AM, Bart Van Assche wrote:
> The blk-mq timeout handling code ignores completions that occur after
> blk_mq_check_expired() has been called and before blk_mq_rq_timed_out()
> has reset rq->aborted_gstate. If a block driver timeout handler always
> returns BLK_EH_RESET_TIMER then the result will be that the request
> never terminates.
> 
> Fix this race as follows:
> - Use the deadline instead of the request generation to detect whether
>   or not a request timer fired after reinitialization of a request.
> - Store the request state in the lowest two bits of the deadline instead
>   of the lowest two bits of 'gstate'.
> - Rename MQ_RQ_STATE_MASK into RQ_STATE_MASK and change it from an
>   enumeration member into a #define such that its type can be changed
>   into unsigned long. That allows to write & ~RQ_STATE_MASK instead of
>   ~(unsigned long)RQ_STATE_MASK.
> - Remove all request member variables that became superfluous due to
>   this change: gstate, gstate_seq and aborted_gstate_sync.
> - Remove the request state information that became superfluous due to this
>   patch, namely RQF_MQ_TIMEOUT_EXPIRED.
> - Remove the code that became superfluous due to this change, namely
>   the RCU lock and unlock statements in blk_mq_complete_request() and
>   also the synchronize_rcu() call in the timeout handler.

I like this approach, it's simpler and easier to understand. Net
reduction in code is nice too. I've run it through the testing and it
passes. Unless people holler loudly, I'd like to include this in 4.17.

-- 
Jens Axboe