Re: [PATCH 1/3] percpu_ref: add a new helper interface __percpu_ref_get_many

2018-09-20 Thread jianchao.wang
Hi Tejun

Thanks for your kindly response.

On 09/21/2018 04:53 AM, Tejun Heo wrote:
> Hello,
> 
> On Thu, Sep 20, 2018 at 06:18:21PM +0800, Jianchao Wang wrote:
>> -static inline void percpu_ref_get_many(struct percpu_ref *ref, unsigned 
>> long nr)
>> +static inline void __percpu_ref_get_many(struct percpu_ref *ref, unsigned 
>> long nr)
>>  {
>>  unsigned long __percpu *percpu_count;
>>  
>> -rcu_read_lock_sched();
> 
> So, if we're gonna do this (please read below tho), please add the
> matching assertion

Yes, I will.

> 
>>  if (__ref_is_percpu(ref, _count))
>>  this_cpu_add(*percpu_count, nr);
>>  else
>>  atomic_long_add(nr, >count);
>> +}
>>  
>> +/**
>> + * percpu_ref_get_many - increment a percpu refcount
>> + * @ref: percpu_ref to get
>> + * @nr: number of references to get
>> + *
>> + * Analogous to atomic_long_add().
>> + *
>> + * This function is safe to call as long as @ref is between init and exit.
>> + */
>> +static inline void percpu_ref_get_many(struct percpu_ref *ref, unsigned 
>> long nr)
>> +{
>> +rcu_read_lock_sched();
>> +__percpu_ref_get_many(ref, nr);
>>  rcu_read_unlock_sched();
>>  }
> 
> And add the matching variant for get/put with and without _many.

Yes.

> 
> Ming, so, if we make locking explicit like above, I think it should be
> fine to share the locking.  However, please note that percpu_ref and
> blk_mq are using different types of RCU, at least for now, and I'm not
> really sure that unifying that and taking out one rcu read lock/unlock
> is a meaningful optimization.
> 

Essentially, I want to rework the current queue freeze which depends on
percpu_ref_kil/reinit. We implement our own condition checking instead of
using the __PERCPU_REF_DEAD checking in percpu_ref_tryget_live. And use
percpu with percpu_ref_switch_to_atomic/percpu to switch the percpu ref
mode between percpu and atimic directly. Then it will be more convenient
to implement:
 - unfreeze the queue without draining requests.
 - check whether q->q_usage_counter is zero
 - add other gate conditions into blk_queue_enter, such as preempt-only

So I want to put the condition checking and percpu_ref_get_many into a
same sched rcu critical section.
 rcu_read_lock_sched()
   if condition check true
 percpu_ref_get_many(>q_usage_counter, 1)
   else
 goto wait
   rcu_read_unlock_sched()

Then we could freeze the queue like:
  set FROZEN flag on q
  percpu_ref_put(1)
  percpu_ref_switch_to_atomic
Otherwise, we may need a synchronize_rcu.
It is not for performance.

Is there any reason that why blk_queue_enter only could use rcu lock instead of
the sched rcu lock ?


> Let's please first do something straight-forward.  If somebody can
> show that this actually impacts performance, we can optimize it but
> right now all these seem premature to me.

Adding this interface is just for saving an extra rcu_read_lock/unlock_sched 
pair.
If it doesn't make any sense, It is OK for me to discard it. :)

Thanks
Jianchao


Re: [PATCH 1/3] percpu_ref: add a new helper interface __percpu_ref_get_many

2018-09-20 Thread jianchao.wang
Hi Tejun

Thanks for your kindly response.

On 09/21/2018 04:53 AM, Tejun Heo wrote:
> Hello,
> 
> On Thu, Sep 20, 2018 at 06:18:21PM +0800, Jianchao Wang wrote:
>> -static inline void percpu_ref_get_many(struct percpu_ref *ref, unsigned 
>> long nr)
>> +static inline void __percpu_ref_get_many(struct percpu_ref *ref, unsigned 
>> long nr)
>>  {
>>  unsigned long __percpu *percpu_count;
>>  
>> -rcu_read_lock_sched();
> 
> So, if we're gonna do this (please read below tho), please add the
> matching assertion

Yes, I will.

> 
>>  if (__ref_is_percpu(ref, _count))
>>  this_cpu_add(*percpu_count, nr);
>>  else
>>  atomic_long_add(nr, >count);
>> +}
>>  
>> +/**
>> + * percpu_ref_get_many - increment a percpu refcount
>> + * @ref: percpu_ref to get
>> + * @nr: number of references to get
>> + *
>> + * Analogous to atomic_long_add().
>> + *
>> + * This function is safe to call as long as @ref is between init and exit.
>> + */
>> +static inline void percpu_ref_get_many(struct percpu_ref *ref, unsigned 
>> long nr)
>> +{
>> +rcu_read_lock_sched();
>> +__percpu_ref_get_many(ref, nr);
>>  rcu_read_unlock_sched();
>>  }
> 
> And add the matching variant for get/put with and without _many.

Yes.

> 
> Ming, so, if we make locking explicit like above, I think it should be
> fine to share the locking.  However, please note that percpu_ref and
> blk_mq are using different types of RCU, at least for now, and I'm not
> really sure that unifying that and taking out one rcu read lock/unlock
> is a meaningful optimization.
> 

Essentially, I want to rework the current queue freeze which depends on
percpu_ref_kil/reinit. We implement our own condition checking instead of
using the __PERCPU_REF_DEAD checking in percpu_ref_tryget_live. And use
percpu with percpu_ref_switch_to_atomic/percpu to switch the percpu ref
mode between percpu and atimic directly. Then it will be more convenient
to implement:
 - unfreeze the queue without draining requests.
 - check whether q->q_usage_counter is zero
 - add other gate conditions into blk_queue_enter, such as preempt-only

So I want to put the condition checking and percpu_ref_get_many into a
same sched rcu critical section.
 rcu_read_lock_sched()
   if condition check true
 percpu_ref_get_many(>q_usage_counter, 1)
   else
 goto wait
   rcu_read_unlock_sched()

Then we could freeze the queue like:
  set FROZEN flag on q
  percpu_ref_put(1)
  percpu_ref_switch_to_atomic
Otherwise, we may need a synchronize_rcu.
It is not for performance.

Is there any reason that why blk_queue_enter only could use rcu lock instead of
the sched rcu lock ?


> Let's please first do something straight-forward.  If somebody can
> show that this actually impacts performance, we can optimize it but
> right now all these seem premature to me.

Adding this interface is just for saving an extra rcu_read_lock/unlock_sched 
pair.
If it doesn't make any sense, It is OK for me to discard it. :)

Thanks
Jianchao


Re: [PATCH 1/3] percpu_ref: add a new helper interface __percpu_ref_get_many

2018-09-20 Thread Tejun Heo
Hello,

On Thu, Sep 20, 2018 at 06:18:21PM +0800, Jianchao Wang wrote:
> -static inline void percpu_ref_get_many(struct percpu_ref *ref, unsigned long 
> nr)
> +static inline void __percpu_ref_get_many(struct percpu_ref *ref, unsigned 
> long nr)
>  {
>   unsigned long __percpu *percpu_count;
>  
> - rcu_read_lock_sched();

So, if we're gonna do this (please read below tho), please add the
matching assertion

>   if (__ref_is_percpu(ref, _count))
>   this_cpu_add(*percpu_count, nr);
>   else
>   atomic_long_add(nr, >count);
> +}
>  
> +/**
> + * percpu_ref_get_many - increment a percpu refcount
> + * @ref: percpu_ref to get
> + * @nr: number of references to get
> + *
> + * Analogous to atomic_long_add().
> + *
> + * This function is safe to call as long as @ref is between init and exit.
> + */
> +static inline void percpu_ref_get_many(struct percpu_ref *ref, unsigned long 
> nr)
> +{
> + rcu_read_lock_sched();
> + __percpu_ref_get_many(ref, nr);
>   rcu_read_unlock_sched();
>  }

And add the matching variant for get/put with and without _many.

Ming, so, if we make locking explicit like above, I think it should be
fine to share the locking.  However, please note that percpu_ref and
blk_mq are using different types of RCU, at least for now, and I'm not
really sure that unifying that and taking out one rcu read lock/unlock
is a meaningful optimization.

Let's please first do something straight-forward.  If somebody can
show that this actually impacts performance, we can optimize it but
right now all these seem premature to me.

Thanks.

-- 
tejun


Re: [PATCH 1/3] percpu_ref: add a new helper interface __percpu_ref_get_many

2018-09-20 Thread Tejun Heo
Hello,

On Thu, Sep 20, 2018 at 06:18:21PM +0800, Jianchao Wang wrote:
> -static inline void percpu_ref_get_many(struct percpu_ref *ref, unsigned long 
> nr)
> +static inline void __percpu_ref_get_many(struct percpu_ref *ref, unsigned 
> long nr)
>  {
>   unsigned long __percpu *percpu_count;
>  
> - rcu_read_lock_sched();

So, if we're gonna do this (please read below tho), please add the
matching assertion

>   if (__ref_is_percpu(ref, _count))
>   this_cpu_add(*percpu_count, nr);
>   else
>   atomic_long_add(nr, >count);
> +}
>  
> +/**
> + * percpu_ref_get_many - increment a percpu refcount
> + * @ref: percpu_ref to get
> + * @nr: number of references to get
> + *
> + * Analogous to atomic_long_add().
> + *
> + * This function is safe to call as long as @ref is between init and exit.
> + */
> +static inline void percpu_ref_get_many(struct percpu_ref *ref, unsigned long 
> nr)
> +{
> + rcu_read_lock_sched();
> + __percpu_ref_get_many(ref, nr);
>   rcu_read_unlock_sched();
>  }

And add the matching variant for get/put with and without _many.

Ming, so, if we make locking explicit like above, I think it should be
fine to share the locking.  However, please note that percpu_ref and
blk_mq are using different types of RCU, at least for now, and I'm not
really sure that unifying that and taking out one rcu read lock/unlock
is a meaningful optimization.

Let's please first do something straight-forward.  If somebody can
show that this actually impacts performance, we can optimize it but
right now all these seem premature to me.

Thanks.

-- 
tejun