Re: [PATCH 0/2] Policy to balance read across mirrored devices

2018-01-31 Thread Peter Becker
ok, i understood the commitmessage as if the behavior for tests is
more of a bonus

2018-01-30 7:30 GMT+01:00 Anand Jain :
> This __also__ helps testing

then it's clear to me. would only be good that this behavior is
documented. not that anyone else, like me, tries to use this as
performance tuning. at least the feature with the devid.

Thanks Austin,
Thanks Anand

2018-01-31 17:11 GMT+01:00 Austin S. Hemmelgarn :
> On 2018-01-31 09:52, Peter Becker wrote:
>>
>> This is all clear. My question referes to "use the lower devid disk
>> containing the stripe"
>>
>> 2018-01-31 10:01 GMT+01:00 Anand Jain :
>>>
>>>   When a stripe is not present on the read optimized disk it will just
>>>   use the lower devid disk containing the stripe (instead of failing back
>>>   to the pid based random disk).
>>
>>
>> Use only one disk (the disk with the lowest devid that containing the
>> stripe) as fallback should be not a good option imho.
>> Instead of it should still be used the pid as fallback to distribute
>> the workload among all available drives.
>>
>> [stripe to use] = [preffer stripes present on read_mirror_policy
>> devids] > [fallback to pid % stripe count]
>>
>> Perhaps I'm not be able to express myself in English or did I
>> misunderstand you?
>
> Unless I'm seriously misunderstanding the commit messages, the primary
> purpose of having this as an option at all is for testing.  The fact that it
> happens to allow semantics similar to MD's write-mostly flag when dealing
> with a 2-device raid1 profile is just a bonus.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] Policy to balance read across mirrored devices

2018-01-31 Thread Austin S. Hemmelgarn

On 2018-01-31 09:52, Peter Becker wrote:

This is all clear. My question referes to "use the lower devid disk
containing the stripe"

2018-01-31 10:01 GMT+01:00 Anand Jain :

  When a stripe is not present on the read optimized disk it will just
  use the lower devid disk containing the stripe (instead of failing back
  to the pid based random disk).


Use only one disk (the disk with the lowest devid that containing the
stripe) as fallback should be not a good option imho.
Instead of it should still be used the pid as fallback to distribute
the workload among all available drives.

[stripe to use] = [preffer stripes present on read_mirror_policy
devids] > [fallback to pid % stripe count]

Perhaps I'm not be able to express myself in English or did I misunderstand you?
Unless I'm seriously misunderstanding the commit messages, the primary 
purpose of having this as an option at all is for testing.  The fact 
that it happens to allow semantics similar to MD's write-mostly flag 
when dealing with a 2-device raid1 profile is just a bonus.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] Policy to balance read across mirrored devices

2018-01-31 Thread Peter Becker
This is all clear. My question referes to "use the lower devid disk
containing the stripe"

2018-01-31 10:01 GMT+01:00 Anand Jain :
>  When a stripe is not present on the read optimized disk it will just
>  use the lower devid disk containing the stripe (instead of failing back
>  to the pid based random disk).

Use only one disk (the disk with the lowest devid that containing the
stripe) as fallback should be not a good option imho.
Instead of it should still be used the pid as fallback to distribute
the workload among all available drives.

[stripe to use] = [preffer stripes present on read_mirror_policy
devids] > [fallback to pid % stripe count]

Perhaps I'm not be able to express myself in English or did I misunderstand you?

2018-01-31 15:26 GMT+01:00 Anand Jain :
>
>
> On 01/31/2018 06:47 PM, Peter Becker wrote:
>>
>> 2018-01-31 10:01 GMT+01:00 Anand Jain :
>>>
>>>   When a stripe is not present on the read optimized disk it will just
>>>   use the lower devid disk containing the stripe (instead of failing back
>>>   to the pid based random disk).
>>
>>
>> Is this a good behavior? beause this would eliminate every performance
>> benefit of the pid base random disk pick if the requested stripe is
>> not present on the read optimized disk.
>> Wouldn't it be better to specify a fallback and use the pid base
>> random pick as default for the fallback.
>>
>> For example:
>>
>> RAID 1 over 4 disk's
>>
>> devid | rpm | size
>> 
>> 1 | 7200 rpm | 3 TB
>> 2 | 7200 rpm | 3 TB
>> 3 | 5400 rpm | 4 TB
>> 4 | 5400 rpm | 4 TB
>>
>> mount -o read_mirror_policy=1,read_mirror_policy=2
>>
>> Cases:
>> 1. if the requested stripe is on devid 3 and 4 the algorithm should
>> choise on of both randomly to incresse performance instead of read
>> everytime from 3 and never from 4
>> 2. if the requested stripe is on devid 1 and 3, all is fine ( in case
>> of the queue deep of 1 isn't mutch larger then the queue deep of 3 )
>> 3. if the requested stripe is on devid 1 and 2, the algorithm should
>> choise on of both randomly to incresse performance instead of read
>> everytime from 1 and never from 2
>
>>
>>
>> And all randomly picks of a device should be replaced by a heuristic
>> algorithm wo respect the queue deep and sequential reads in the
>> future.
>
>
>  This scenario is very well handled by the pid/heuristic based
>  read load balancer, pid based read load balancer is by default still,
>  Tim has written IO load based read balancer which can be set using
>  this mount option when all integrated together, and it needs
>  experiments to see if it can be by default replacing the pid method.
>  Further as of now we don't do allocation grouping, so if you have two
>  ssd and two hd in a RAID1 its not guaranteed that allocation will
>  always span across a SSD and a HD, so there is bit of randomness
>  in the allocation itself.
>
> Thanks, Anand
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] Policy to balance read across mirrored devices

2018-01-31 Thread Anand Jain



On 01/31/2018 06:47 PM, Peter Becker wrote:

2018-01-31 10:01 GMT+01:00 Anand Jain :

  When a stripe is not present on the read optimized disk it will just
  use the lower devid disk containing the stripe (instead of failing back
  to the pid based random disk).


Is this a good behavior? beause this would eliminate every performance
benefit of the pid base random disk pick if the requested stripe is
not present on the read optimized disk.
Wouldn't it be better to specify a fallback and use the pid base
random pick as default for the fallback.

For example:

RAID 1 over 4 disk's

devid | rpm | size

1 | 7200 rpm | 3 TB
2 | 7200 rpm | 3 TB
3 | 5400 rpm | 4 TB
4 | 5400 rpm | 4 TB

mount -o read_mirror_policy=1,read_mirror_policy=2

Cases:
1. if the requested stripe is on devid 3 and 4 the algorithm should
choise on of both randomly to incresse performance instead of read
everytime from 3 and never from 4
2. if the requested stripe is on devid 1 and 3, all is fine ( in case
of the queue deep of 1 isn't mutch larger then the queue deep of 3 )
3. if the requested stripe is on devid 1 and 2, the algorithm should
choise on of both randomly to incresse performance instead of read
everytime from 1 and never from 2

>

And all randomly picks of a device should be replaced by a heuristic
algorithm wo respect the queue deep and sequential reads in the
future.


 This scenario is very well handled by the pid/heuristic based
 read load balancer, pid based read load balancer is by default still,
 Tim has written IO load based read balancer which can be set using
 this mount option when all integrated together, and it needs
 experiments to see if it can be by default replacing the pid method.
 Further as of now we don't do allocation grouping, so if you have two
 ssd and two hd in a RAID1 its not guaranteed that allocation will
 always span across a SSD and a HD, so there is bit of randomness
 in the allocation itself.

Thanks, Anand
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] Policy to balance read across mirrored devices

2018-01-31 Thread Anand Jain


On 01/31/2018 03:51 PM, Peter Becker wrote:


A little question about mount -o read_mirror_policy=.

How would this work with RAID1 over 3 or 4 HDD's?
In particular, if the desired block is not available on device .


 When a stripe is not present on the read optimized disk it will just
 use the lower devid disk containing the stripe (instead of failing back
 to the pid based random disk).


Could i repeat this option like the device-option to specify a
order/priority like this:

mount -o read_mirror_policy= 1,read_mirror_policy= 3


 Yes.

Thanks, Anand
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] Policy to balance read across mirrored devices

2018-01-30 Thread Peter Becker
A little question about mount -o read_mirror_policy=.

How would this work with RAID1 over 3 or 4 HDD's?
In particular, if the desired block is not available on device .
Could i repeat this option like the device-option to specify a
order/priority like this:

mount -o read_mirror_policy= 1,read_mirror_policy= 3

2018-01-30 7:30 GMT+01:00 Anand Jain :
> In case of RAID1 and RAID10 devices are mirror-ed, a read IO can
> pick any device for reading. This choice of picking a device for
> reading should be configurable. In short not one policy would
> satisfy all types of workload and configs.
>
> So before we add more policies, this patch-set makes existing
> $pid policy configurable from the mount option and adds a devid
> based read_mirror device policy. And keeps $pid based policy as
> the default option for now. So this mount option helps to try out
> different read mirror load balances.
>
> Further we can add more policies on top of it, for example..
>
>   mount -o read_mirror_policy=pid (current, default) [1]
>   mount -o read_mirror_policy= [2]
>   mount -o read_mirror_policy=lba (illustration only) [3]
>   mount -o read_mirror_policy=ssd (illustration only) [4]
>   mount -o read_mirror_policy=io (illustration only) [5]
>   mount -o read_mirror_policy= (illustration only) [6]
>
> [1]
>  Current PID based read mirror balance.
>
> [2]
>  Set the devid of the device which should be used for read. That
>  means all the normal read will go to that particular device only.
>  This also helps testing and gives a better control for the test
>  scripts including mount context reads.
>
> [3]
>  In case of SAN storage, some storage prefers that host access the
>  LUN based on the LBA so that there won't be duplications of
>  caching on the storage.
>
> [4]
>  In case of mix of SSD and HD we may want to use SSD as the primary
>  read device.
>
> [5]
>  If storage caching is not the bottleneck but the transport layer
>  is then read load should be tuned based on the IO load.
>
> [6]
>  Or a combination of any of above as per the priority.
>  Timofey should consider to base his patch on top of this.
> Btrfs: enchanse raid1/10 balance heuristic
>
> This patch set is on top of the preparatory patch set:
>   [PATCH 0/2] Preparatory to add read_mirror mount option
>
>
> Anand Jain (2):
>   btrfs: add mount option read_mirror_policy
>   btrfs: add read_mirror_policy parameter devid
>
>  fs/btrfs/ctree.h   |  2 ++
>  fs/btrfs/super.c   | 31 +++
>  fs/btrfs/volumes.c | 18 +-
>  fs/btrfs/volumes.h |  7 +++
>  4 files changed, 57 insertions(+), 1 deletion(-)
>
> --
> 2.7.0
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/2] Policy to balance read across mirrored devices

2018-01-29 Thread Anand Jain
In case of RAID1 and RAID10 devices are mirror-ed, a read IO can
pick any device for reading. This choice of picking a device for
reading should be configurable. In short not one policy would
satisfy all types of workload and configs.

So before we add more policies, this patch-set makes existing
$pid policy configurable from the mount option and adds a devid
based read_mirror device policy. And keeps $pid based policy as
the default option for now. So this mount option helps to try out
different read mirror load balances.

Further we can add more policies on top of it, for example..

  mount -o read_mirror_policy=pid (current, default) [1]
  mount -o read_mirror_policy= [2]
  mount -o read_mirror_policy=lba (illustration only) [3]
  mount -o read_mirror_policy=ssd (illustration only) [4]
  mount -o read_mirror_policy=io (illustration only) [5]
  mount -o read_mirror_policy= (illustration only) [6]

[1]
 Current PID based read mirror balance.

[2]
 Set the devid of the device which should be used for read. That
 means all the normal read will go to that particular device only.
 This also helps testing and gives a better control for the test
 scripts including mount context reads.

[3]
 In case of SAN storage, some storage prefers that host access the
 LUN based on the LBA so that there won't be duplications of
 caching on the storage.

[4]
 In case of mix of SSD and HD we may want to use SSD as the primary
 read device.

[5]
 If storage caching is not the bottleneck but the transport layer
 is then read load should be tuned based on the IO load.

[6]
 Or a combination of any of above as per the priority.
 Timofey should consider to base his patch on top of this.
Btrfs: enchanse raid1/10 balance heuristic

This patch set is on top of the preparatory patch set:
  [PATCH 0/2] Preparatory to add read_mirror mount option


Anand Jain (2):
  btrfs: add mount option read_mirror_policy
  btrfs: add read_mirror_policy parameter devid

 fs/btrfs/ctree.h   |  2 ++
 fs/btrfs/super.c   | 31 +++
 fs/btrfs/volumes.c | 18 +-
 fs/btrfs/volumes.h |  7 +++
 4 files changed, 57 insertions(+), 1 deletion(-)

-- 
2.7.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html