Re: [PATCH] Btrfs: enchanse raid1/10 balance heuristic for non rotating devices

2017-12-28 Thread Timofey Titovets
2017-12-28 11:06 GMT+03:00 Dmitrii Tcvetkov :
> On Thu, 28 Dec 2017 01:39:31 +0300
> Timofey Titovets  wrote:
>
>> Currently btrfs raid1/10 balancer blance requests to mirrors,
>> based on pid % num of mirrors.
>>
>> Update logic and make it understood if underline device are non rotational.
>>
>> If one of mirrors are non rotational, then all read requests will be moved to
>> non rotational device.
>>
>> If both of mirrors are non rotational, calculate sum of
>> pending and in flight request for queue on that bdev and use
>> device with least queue leght.
>>
>> P.S.
>> Inspired by md-raid1 read balancing
>>
>> Signed-off-by: Timofey Titovets 
>> ---
>>  fs/btrfs/volumes.c | 59
>> ++ 1 file changed, 59
>> insertions(+)
>>
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index 9a04245003ab..98bc2433a920 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -5216,13 +5216,30 @@ int btrfs_is_parity_mirror(struct btrfs_fs_info
>> *fs_info, u64 logical, u64 len) return ret;
>>  }
>>
>> +static inline int bdev_get_queue_len(struct block_device *bdev)
>> +{
>> + int sum = 0;
>> + struct request_queue *rq = bdev_get_queue(bdev);
>> +
>> + sum += rq->nr_rqs[BLK_RW_SYNC] + rq->nr_rqs[BLK_RW_ASYNC];
>> + sum += rq->in_flight[BLK_RW_SYNC] + rq->in_flight[BLK_RW_ASYNC];
>> +
>
> This won't work as expected if bdev is controlled by blk-mq, these
> counters will be zero. AFAIK to get this info in block layer agnostic way
> part_in_flight[1] has to be used. It extracts these counters approriately.
>
> But it needs to be EXPORT_SYMBOL()'ed in block/genhd.c so we can continue
> to build btrfs as module.
>
>> + /*
>> +  * Try prevent switch for every sneeze
>> +  * By roundup output num by 2
>> +  */
>> + return ALIGN(sum, 2);
>> +}
>> +
>>  static int find_live_mirror(struct btrfs_fs_info *fs_info,
>>   struct map_lookup *map, int first, int num,
>>   int optimal, int dev_replace_is_ongoing)
>>  {
>>   int i;
>>   int tolerance;
>> + struct block_device *bdev;
>>   struct btrfs_device *srcdev;
>> + bool all_bdev_nonrot = true;
>>
>>   if (dev_replace_is_ongoing &&
>>   fs_info->dev_replace.cont_reading_from_srcdev_mode ==
>> @@ -5231,6 +5248,48 @@ static int find_live_mirror(struct btrfs_fs_info
>> *fs_info, else
>>   srcdev = NULL;
>>
>> + /*
>> +  * Optimal expected to be pid % num
>> +  * That's generaly ok for spinning rust drives
>> +  * But if one of mirror are non rotating,
>> +  * that bdev can show better performance
>> +  *
>> +  * if one of disks are non rotating:
>> +  *  - set optimal to non rotating device
>> +  * if both disk are non rotating
>> +  *  - set optimal to bdev with least queue
>> +  * If both disks are spinning rust:
>> +  *  - leave old pid % nu,
>> +  */
>> + for (i = 0; i < num; i++) {
>> + bdev = map->stripes[i].dev->bdev;
>> + if (!bdev)
>> + continue;
>> + if (blk_queue_nonrot(bdev_get_queue(bdev)))
>> + optimal = i;
>> + else
>> + all_bdev_nonrot = false;
>> + }
>> +
>> + if (all_bdev_nonrot) {
>> + int qlen;
>> + /* Forse following logic choise by init with some big number
>> */
>> + int optimal_dev_rq_count = 1 << 24;
>
> Probably better to use INT_MAX macro instead.
>
> [1] https://elixir.free-electrons.com/linux/v4.15-rc5/source/block/genhd.c#L68
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Thank you very much!

-- 
Have a nice day,
Timofey.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: enchanse raid1/10 balance heuristic for non rotating devices

2017-12-28 Thread waxhead



Timofey Titovets wrote:

Currently btrfs raid1/10 balancer blance requests to mirrors,
based on pid % num of mirrors.

Update logic and make it understood if underline device are non rotational.

If one of mirrors are non rotational, then all read requests will be moved to
non rotational device.

And this would make reads regardless of the PID always end up on the 
fastest device which sounds sane enough , but scubbing will be even more 
important since there is a less chance that a "random PID" will check 
the other copy every now and then.



If both of mirrors are non rotational, calculate sum of
pending and in flight request for queue on that bdev and use
device with least queue leght.

I think this would be tried out on rotational disk as well. I am happy 
to test this out for you on a 7x disk server if you want.
Note: I have no experience with compiling kernels and applying patches 
(but I do code a bit in C every now and then) so a pre-compiled kernel 
would be required (I believe you are on Debain as well)
For rotational then perhaps it would not be wise to use another mirror 
unless the queue length is significantly higher than the other. Again I 
am happy to test if tunables are provided.



P.S.
Inspired by md-raid1 read balancing

Signed-off-by: Timofey Titovets 
---
 fs/btrfs/volumes.c | 59 ++
 1 file changed, 59 insertions(+)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 9a04245003ab..98bc2433a920 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -5216,13 +5216,30 @@ int btrfs_is_parity_mirror(struct btrfs_fs_info 
*fs_info, u64 logical, u64 len)
return ret;
 }

+static inline int bdev_get_queue_len(struct block_device *bdev)
+{
+   int sum = 0;
+   struct request_queue *rq = bdev_get_queue(bdev);
+
+   sum += rq->nr_rqs[BLK_RW_SYNC] + rq->nr_rqs[BLK_RW_ASYNC];
+   sum += rq->in_flight[BLK_RW_SYNC] + rq->in_flight[BLK_RW_ASYNC];
+
+   /*
+* Try prevent switch for every sneeze
+* By roundup output num by 2
+*/
+   return ALIGN(sum, 2);
+}
+
 static int find_live_mirror(struct btrfs_fs_info *fs_info,
struct map_lookup *map, int first, int num,
int optimal, int dev_replace_is_ongoing)
 {
int i;
int tolerance;
+   struct block_device *bdev;
struct btrfs_device *srcdev;
+   bool all_bdev_nonrot = true;

if (dev_replace_is_ongoing &&
fs_info->dev_replace.cont_reading_from_srcdev_mode ==
@@ -5231,6 +5248,48 @@ static int find_live_mirror(struct btrfs_fs_info 
*fs_info,
else
srcdev = NULL;

+   /*
+* Optimal expected to be pid % num
+* That's generaly ok for spinning rust drives
+* But if one of mirror are non rotating,
+* that bdev can show better performance
+*
+* if one of disks are non rotating:
+*  - set optimal to non rotating device
+* if both disk are non rotating
+*  - set optimal to bdev with least queue
+* If both disks are spinning rust:
+*  - leave old pid % nu,
+*/
+   for (i = 0; i < num; i++) {
+   bdev = map->stripes[i].dev->bdev;
+   if (!bdev)
+   continue;
+   if (blk_queue_nonrot(bdev_get_queue(bdev)))
+   optimal = i;
+   else
+   all_bdev_nonrot = false;
+   }
+
+   if (all_bdev_nonrot) {
+   int qlen;
+   /* Forse following logic choise by init with some big number */
+   int optimal_dev_rq_count = 1 << 24;
+
+   for (i = 0; i < num; i++) {
+   bdev = map->stripes[i].dev->bdev;
+   if (!bdev)
+   continue;
+
+   qlen = bdev_get_queue_len(bdev);
+
+   if (qlen < optimal_dev_rq_count) {
+   optimal = i;
+   optimal_dev_rq_count = qlen;
+   }
+   }
+   }
+
/*
 * try to avoid the drive that is the source drive for a
 * dev-replace procedure, only choose it if no other non-missing


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: enchanse raid1/10 balance heuristic for non rotating devices

2017-12-28 Thread Dmitrii Tcvetkov
On Thu, 28 Dec 2017 01:39:31 +0300
Timofey Titovets  wrote:

> Currently btrfs raid1/10 balancer blance requests to mirrors,
> based on pid % num of mirrors.
> 
> Update logic and make it understood if underline device are non rotational.
> 
> If one of mirrors are non rotational, then all read requests will be moved to
> non rotational device.
> 
> If both of mirrors are non rotational, calculate sum of
> pending and in flight request for queue on that bdev and use
> device with least queue leght.
> 
> P.S.
> Inspired by md-raid1 read balancing
> 
> Signed-off-by: Timofey Titovets 
> ---
>  fs/btrfs/volumes.c | 59
> ++ 1 file changed, 59
> insertions(+)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 9a04245003ab..98bc2433a920 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -5216,13 +5216,30 @@ int btrfs_is_parity_mirror(struct btrfs_fs_info
> *fs_info, u64 logical, u64 len) return ret;
>  }
>  
> +static inline int bdev_get_queue_len(struct block_device *bdev)
> +{
> + int sum = 0;
> + struct request_queue *rq = bdev_get_queue(bdev);
> +
> + sum += rq->nr_rqs[BLK_RW_SYNC] + rq->nr_rqs[BLK_RW_ASYNC];
> + sum += rq->in_flight[BLK_RW_SYNC] + rq->in_flight[BLK_RW_ASYNC];
> +

This won't work as expected if bdev is controlled by blk-mq, these
counters will be zero. AFAIK to get this info in block layer agnostic way
part_in_flight[1] has to be used. It extracts these counters approriately.

But it needs to be EXPORT_SYMBOL()'ed in block/genhd.c so we can continue
to build btrfs as module.

> + /*
> +  * Try prevent switch for every sneeze
> +  * By roundup output num by 2
> +  */
> + return ALIGN(sum, 2);
> +}
> +
>  static int find_live_mirror(struct btrfs_fs_info *fs_info,
>   struct map_lookup *map, int first, int num,
>   int optimal, int dev_replace_is_ongoing)
>  {
>   int i;
>   int tolerance;
> + struct block_device *bdev;
>   struct btrfs_device *srcdev;
> + bool all_bdev_nonrot = true;
>  
>   if (dev_replace_is_ongoing &&
>   fs_info->dev_replace.cont_reading_from_srcdev_mode ==
> @@ -5231,6 +5248,48 @@ static int find_live_mirror(struct btrfs_fs_info
> *fs_info, else
>   srcdev = NULL;
>  
> + /*
> +  * Optimal expected to be pid % num
> +  * That's generaly ok for spinning rust drives
> +  * But if one of mirror are non rotating,
> +  * that bdev can show better performance
> +  *
> +  * if one of disks are non rotating:
> +  *  - set optimal to non rotating device
> +  * if both disk are non rotating
> +  *  - set optimal to bdev with least queue
> +  * If both disks are spinning rust:
> +  *  - leave old pid % nu,
> +  */
> + for (i = 0; i < num; i++) {
> + bdev = map->stripes[i].dev->bdev;
> + if (!bdev)
> + continue;
> + if (blk_queue_nonrot(bdev_get_queue(bdev)))
> + optimal = i;
> + else
> + all_bdev_nonrot = false;
> + }
> +
> + if (all_bdev_nonrot) {
> + int qlen;
> + /* Forse following logic choise by init with some big number
> */
> + int optimal_dev_rq_count = 1 << 24;

Probably better to use INT_MAX macro instead.

[1] https://elixir.free-electrons.com/linux/v4.15-rc5/source/block/genhd.c#L68

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: enchanse raid1/10 balance heuristic for non rotating devices

2017-12-27 Thread Qu Wenruo


On 2017年12月28日 06:39, Timofey Titovets wrote:
> Currently btrfs raid1/10 balancer blance requests to mirrors,
> based on pid % num of mirrors.
> 
> Update logic and make it understood if underline device are non rotational.
> 
> If one of mirrors are non rotational, then all read requests will be moved to
> non rotational device.
> 
> If both of mirrors are non rotational, calculate sum of
> pending and in flight request for queue on that bdev and use
> device with least queue leght.
> 
> P.S.
> Inspired by md-raid1 read balancing
> 
> Signed-off-by: Timofey Titovets 
> ---
>  fs/btrfs/volumes.c | 59 
> ++
>  1 file changed, 59 insertions(+)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 9a04245003ab..98bc2433a920 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -5216,13 +5216,30 @@ int btrfs_is_parity_mirror(struct btrfs_fs_info 
> *fs_info, u64 logical, u64 len)
>   return ret;
>  }
>  
> +static inline int bdev_get_queue_len(struct block_device *bdev)
> +{
> + int sum = 0;
> + struct request_queue *rq = bdev_get_queue(bdev);
> +
> + sum += rq->nr_rqs[BLK_RW_SYNC] + rq->nr_rqs[BLK_RW_ASYNC];
> + sum += rq->in_flight[BLK_RW_SYNC] + rq->in_flight[BLK_RW_ASYNC];
> +
> + /*
> +  * Try prevent switch for every sneeze
> +  * By roundup output num by 2
> +  */
> + return ALIGN(sum, 2);
> +}
> +
>  static int find_live_mirror(struct btrfs_fs_info *fs_info,
>   struct map_lookup *map, int first, int num,
>   int optimal, int dev_replace_is_ongoing)
>  {
>   int i;
>   int tolerance;
> + struct block_device *bdev;
>   struct btrfs_device *srcdev;
> + bool all_bdev_nonrot = true;
>  
>   if (dev_replace_is_ongoing &&
>   fs_info->dev_replace.cont_reading_from_srcdev_mode ==
> @@ -5231,6 +5248,48 @@ static int find_live_mirror(struct btrfs_fs_info 
> *fs_info,
>   else
>   srcdev = NULL;
>  
> + /*
> +  * Optimal expected to be pid % num
> +  * That's generaly ok for spinning rust drives
> +  * But if one of mirror are non rotating,
> +  * that bdev can show better performance
> +  *
> +  * if one of disks are non rotating:
> +  *  - set optimal to non rotating device
> +  * if both disk are non rotating
> +  *  - set optimal to bdev with least queue
> +  * If both disks are spinning rust:
> +  *  - leave old pid % nu,

And I'm wondering why this case can't use the same bdev queue length?

Any special reason spinning disk can't benifit from a shorter queue?

Thanks,
Qu

> +  */
> + for (i = 0; i < num; i++) {
> + bdev = map->stripes[i].dev->bdev;
> + if (!bdev)
> + continue;
> + if (blk_queue_nonrot(bdev_get_queue(bdev)))
> + optimal = i;
> + else
> + all_bdev_nonrot = false;
> + }
> +
> + if (all_bdev_nonrot) {
> + int qlen;
> + /* Forse following logic choise by init with some big number */
> + int optimal_dev_rq_count = 1 << 24;
> +
> + for (i = 0; i < num; i++) {
> + bdev = map->stripes[i].dev->bdev;
> + if (!bdev)
> + continue;
> +
> + qlen = bdev_get_queue_len(bdev);
> +
> + if (qlen < optimal_dev_rq_count) {
> + optimal = i;
> + optimal_dev_rq_count = qlen;
> + }
> + }
> + }
> +
>   /*
>* try to avoid the drive that is the source drive for a
>* dev-replace procedure, only choose it if no other non-missing
> 



signature.asc
Description: OpenPGP digital signature


Re: [PATCH] Btrfs: enchanse raid1/10 balance heuristic for non rotating devices

2017-12-27 Thread Qu Wenruo


On 2017年12月28日 06:39, Timofey Titovets wrote:
> Currently btrfs raid1/10 balancer blance requests to mirrors,
> based on pid % num of mirrors.
> 
> Update logic and make it understood if underline device are non rotational.
> 
> If one of mirrors are non rotational, then all read requests will be moved to
> non rotational device.
> 
> If both of mirrors are non rotational, calculate sum of
> pending and in flight request for queue on that bdev and use
> device with least queue leght.
> 
> P.S.
> Inspired by md-raid1 read balancing

Any benchmark?

It would be more persuasive.

Thanks,
Qu

> 
> Signed-off-by: Timofey Titovets 
> ---
>  fs/btrfs/volumes.c | 59 
> ++
>  1 file changed, 59 insertions(+)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 9a04245003ab..98bc2433a920 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -5216,13 +5216,30 @@ int btrfs_is_parity_mirror(struct btrfs_fs_info 
> *fs_info, u64 logical, u64 len)
>   return ret;
>  }
>  
> +static inline int bdev_get_queue_len(struct block_device *bdev)
> +{
> + int sum = 0;
> + struct request_queue *rq = bdev_get_queue(bdev);
> +
> + sum += rq->nr_rqs[BLK_RW_SYNC] + rq->nr_rqs[BLK_RW_ASYNC];
> + sum += rq->in_flight[BLK_RW_SYNC] + rq->in_flight[BLK_RW_ASYNC];
> +
> + /*
> +  * Try prevent switch for every sneeze
> +  * By roundup output num by 2
> +  */
> + return ALIGN(sum, 2);
> +}
> +
>  static int find_live_mirror(struct btrfs_fs_info *fs_info,
>   struct map_lookup *map, int first, int num,
>   int optimal, int dev_replace_is_ongoing)
>  {
>   int i;
>   int tolerance;
> + struct block_device *bdev;
>   struct btrfs_device *srcdev;
> + bool all_bdev_nonrot = true;
>  
>   if (dev_replace_is_ongoing &&
>   fs_info->dev_replace.cont_reading_from_srcdev_mode ==
> @@ -5231,6 +5248,48 @@ static int find_live_mirror(struct btrfs_fs_info 
> *fs_info,
>   else
>   srcdev = NULL;
>  
> + /*
> +  * Optimal expected to be pid % num
> +  * That's generaly ok for spinning rust drives
> +  * But if one of mirror are non rotating,
> +  * that bdev can show better performance
> +  *
> +  * if one of disks are non rotating:
> +  *  - set optimal to non rotating device
> +  * if both disk are non rotating
> +  *  - set optimal to bdev with least queue
> +  * If both disks are spinning rust:
> +  *  - leave old pid % nu,
> +  */
> + for (i = 0; i < num; i++) {
> + bdev = map->stripes[i].dev->bdev;
> + if (!bdev)
> + continue;
> + if (blk_queue_nonrot(bdev_get_queue(bdev)))
> + optimal = i;
> + else
> + all_bdev_nonrot = false;
> + }
> +
> + if (all_bdev_nonrot) {
> + int qlen;
> + /* Forse following logic choise by init with some big number */
> + int optimal_dev_rq_count = 1 << 24;
> +
> + for (i = 0; i < num; i++) {
> + bdev = map->stripes[i].dev->bdev;
> + if (!bdev)
> + continue;
> +
> + qlen = bdev_get_queue_len(bdev);
> +
> + if (qlen < optimal_dev_rq_count) {
> + optimal = i;
> + optimal_dev_rq_count = qlen;
> + }
> + }
> + }
> +
>   /*
>* try to avoid the drive that is the source drive for a
>* dev-replace procedure, only choose it if no other non-missing
> 



signature.asc
Description: OpenPGP digital signature