Re: [RFC] improve space utilization on off-sized raid devices

2012-01-24 Thread Thomas Schmidt
On Thursday 01 December 2011 09:55:27 Arne Jansen wrote:
> As RAID0 is already not a strict 'all disks or none', I like the idea to
> have it even more dynamic to reach full optimization. But I'd like to see
> some properties conserved:
>  a) In case of even size disks, the stripes should always be full size, not
>n - 1
>  b) Minor variations in the used space per disk due to metadata chunks
> should not lead to deviation from a)
>  c) The algorithms should not give weird results under unconventional
> setups. Some theoretical background would be nice 

Resent because it did not appear on the ML for about 4h.
KMail's acting up.

Sorry to only get back to you now, I must have missed your mail somehow.

The problem is the shrinking stripe width with unmatched devices. Once it hits 
devs_min-1 it's over. My solution is to try to keep the stripe width constant.
The sorting then takes care of selecting the right devices.

It's simply: space / min-hight = max-width
a) is dictated by math
Since circumstances change (add, rm devs, rounding, ...) it is calculated again 
at every allocation. The result is then rounded to the nearest multiple of 
devs_increment. This takes care of b).

The code may look wiered but should be identical to the mathematical
floor(Space / min-hight + increment/2) if considered together with the round 
down already present in the line after my patch.

The two ifs should safeguard against weird stuff by limiting the result to sane 
values.

I include an updated patch below. It's again written for and tested with 3.0.0 
but diff3 worked nicely for applying it to 3.3-rc1.

--- volumes.c.orig  2012-01-20 16:59:31.0 +0100
+++ volumes.c   2012-01-24 11:24:07.261401805 +0100
@@ -2329,6 +2329,8 @@
u64 stripe_size;
u64 num_bytes;
int ndevs;
+   u64 fs_total_avail;
+   int opt_ndevs;
int i;
int j;
 
@@ -2448,6 +2450,7 @@
devices_info[ndevs].total_avail = total_avail;
devices_info[ndevs].dev = device;
++ndevs;
+   fs_total_avail += total_avail;
}
 
/*
@@ -2456,6 +2459,16 @@
sort(devices_info, ndevs, sizeof(struct btrfs_device_info),
 btrfs_cmp_device_info, NULL);
 
+   /*
+   * do not allocate space on all devices
+   * instead balance free space to maximise space utilization
+   */
+   opt_ndevs = (fs_total_avail*2 + 
devs_increment*devices_info[0].total_avail) / (devices_info[0].total_avail*2);
+   if (opt_ndevs < devs_min)
+   opt_ndevs = devs_min;
+   if (ndevs > opt_ndevs)
+   ndevs = opt_ndevs;
+
/* round down to number of usable stripes */
ndevs -= ndevs % devs_increment;

-- 
Ihr GMX Postfach immer dabei: die kostenlose GMX Mail App für Android.
Komfortabel, sicher und schnell: www.gmx.de/android
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] improve space utilization on off-sized raid devices

2012-01-24 Thread THomas Schmidt
On Thursday 01 December 2011 09:55:27 Arne Jansen wrote:
> As RAID0 is already not a strict 'all disks or none', I like the idea to
> have it even more dynamic to reach full optimization. But I'd like to see
> some properties conserved:
>  a) In case of even size disks, the stripes should always be full size, not
>n - 1
>  b) Minor variations in the used space per disk due to metadata chunks
> should not lead to deviation from a)
>  c) The algorithms should not give weird results under unconventional
> setups. Some theoretical background would be nice :)

Sorry to only get back to you now, I must have missed your mail somehow.

The problem is the shrinking stripe width with unmatched devices. Once it hits 
devs_min-1 it's over. My solution is to try to keep the stripe width constant.
The sorting then takes care of selecting the right devices.

It's simply: space / min-hight = max-width
a) is dictated by math
Since circumstances change (add, rm devs, rounding, ...) it is calculated again 
at every allocation. The result is then rounded to the nearest multiple of 
devs_increment. This takes care of b).

The code may look wiered but should be identical to the mathematical
floor(Space / min-hight + increment/2) if considered together with the round 
down already present in the line after my patch.

The two ifs should safeguard against weird stuff by limiting the result to sane 
values.

I include an updated patch below. It's again written for and tested with 3.0.0 
but diff3 worked nicely for applying it to 3.3-rc1.

--- volumes.c.orig  2012-01-20 16:59:31.0 +0100
+++ volumes.c   2012-01-24 11:24:07.261401805 +0100
@@ -2329,6 +2329,8 @@
u64 stripe_size;
u64 num_bytes;
int ndevs;
+   u64 fs_total_avail;
+   int opt_ndevs;
int i;
int j;
 
@@ -2448,6 +2450,7 @@
devices_info[ndevs].total_avail = total_avail;
devices_info[ndevs].dev = device;
++ndevs;
+   fs_total_avail += total_avail;
}
 
/*
@@ -2456,6 +2459,16 @@
sort(devices_info, ndevs, sizeof(struct btrfs_device_info),
 btrfs_cmp_device_info, NULL);
 
+   /*
+   * do not allocate space on all devices
+   * instead balance free space to maximise space utilization
+   */
+   opt_ndevs = (fs_total_avail*2 + 
devs_increment*devices_info[0].total_avail) / (devices_info[0].total_avail*2);
+   if (opt_ndevs < devs_min)
+   opt_ndevs = devs_min;
+   if (ndevs > opt_ndevs)
+   ndevs = opt_ndevs;
+
/* round down to number of usable stripes */
ndevs -= ndevs % devs_increment;

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] improve space utilization on off-sized raid devices

2011-12-01 Thread Arne Jansen
On 17.11.2011 15:06, Thomas Schmidt wrote:
>  Original-Nachricht 
>> Datum: Thu, 17 Nov 2011 13:59:26 +0100
>> Von: Arne Jansen 
>> An: Thomas Schmidt 
>> CC: linux-btrfs@vger.kernel.org
>> Betreff: Re: [RFC] improve space utilization on off-sized raid devices
> 
> 
> Consider a 4 dev setup: 3 1TB drives and 1 2TB using -m raid1
> -d raid0: 80% capacity striped 4 way.
> -d single: 100% but no striping.
> My patch: 100% striped 3 way, a good trade imho.
> 
> I don't think such a setup is unlikely enough to ignore, a home user will
> simply buy the drive with the best space/cost whenever he needs space,
> leading exactly to the described situation. Adding a newly bought 2T drive
> to my 3x1T setup, only to see that only half of it can be used would really
> piss me off.
> 
> Note that if the (optional) first "if" is removed I only reduce width if it
> is required to reach 100% capacity. At least thats the intention, it might
> need some tweaking.
> According to the (hackish) simulator I used to test this, typically the
> average stripe width sacrificed on setups of 5+ unmatched devices is below
> 2

As RAID0 is already not a strict 'all disks or none', I like the idea to
have it even more dynamic to reach full optimization. But I'd like to see
some properties conserved:
 a) In case of even size disks, the stripes should always be full size, not
   n - 1
 b) Minor variations in the used space per disk due to metadata chunks should
   not lead to deviation from a)
 c) The algorithms should not give weird results under unconventional setups.
Some theoretical background would be nice :)

It might well be that your algorithm is already close :)

-Arne
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] improve space utilization on off-sized raid devices

2011-11-17 Thread Phillip Susi

On 11/17/2011 7:59 AM, Arne Jansen wrote:

Right you are. So you want to sacrifice stripe size for space efficiency.
Why don't you just use RAID1?
Instead of reducing the stripe size for the majority of writes, I'd prefer
to allow RAID10 to go down to 2 disks. This should also solve it.


Yes, it appears that btrfs's current idea of raid10 is actually raid0+1, 
not raid10.  If it were proper raid10, it could use the remaining space 
on the 3 larger disks for a raid10 metadata chunk.


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] improve space utilization on off-sized raid devices

2011-11-17 Thread Thomas Schmidt
 Original-Nachricht 
> Datum: Thu, 17 Nov 2011 13:59:26 +0100
> Von: Arne Jansen 
> An: Thomas Schmidt 
> CC: linux-btrfs@vger.kernel.org
> Betreff: Re: [RFC] improve space utilization on off-sized raid devices

> Right you are. So you want to sacrifice stripe size for space efficiency.
> Why don't you just use RAID1?
> Instead of reducing the stripe size for the majority of writes, I'd prefer
> to allow RAID10 to go down to 2 disks. This should also solve it.

Yes, that's my trade.
With my patch I still have striping across 6 devices for meta (6-7 for
data) which is faster then the 2 raid1 would give me. Since 6 drives
allready saturate my bus it's a very good trade in my case.

While implementing your "degenerated raid0/10" would somewhat lessen the
problem (and fix it for me), it would not fix it in general.
But implementing it might is still be a good idea.

Consider a 4 dev setup: 3 1TB drives and 1 2TB using -m raid1
-d raid0: 80% capacity striped 4 way.
-d single: 100% but no striping.
My patch: 100% striped 3 way, a good trade imho.

I don't think such a setup is unlikely enough to ignore, a home user will
simply buy the drive with the best space/cost whenever he needs space,
leading exactly to the described situation. Adding a newly bought 2T drive
to my 3x1T setup, only to see that only half of it can be used would really
piss me off.

Note that if the (optional) first "if" is removed I only reduce width if it
is required to reach 100% capacity. At least thats the intention, it might
need some tweaking.
According to the (hackish) simulator I used to test this, typically the
average stripe width sacrificed on setups of 5+ unmatched devices is below
2
-- 
Empfehlen Sie GMX DSL Ihren Freunden und Bekannten und wir
belohnen Sie mit bis zu 50,- Euro! https://freundschaftswerbung.gmx.de
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] improve space utilization on off-sized raid devices

2011-11-17 Thread Arne Jansen
On 17.11.2011 12:53, Thomas Schmidt wrote:
> 
>> On 17.11.2011 01:27, Thomas Schmidt wrote:
>> In your setup, it should stripe to all 8 devices until the 5 smaller ones
>> are full, and from then on stripe to the 3 remaining devices.
> 
> Afaik the behavior you describe is exactly the problem.
> It wants to continuing with 3 devices, but according to the code raid10
> requires 4.
> 

Right you are. So you want to sacrifice stripe size for space efficiency.
Why don't you just use RAID1?
Instead of reducing the stripe size for the majority of writes, I'd prefer
to allow RAID10 to go down to 2 disks. This should also solve it.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] improve space utilization on off-sized raid devices

2011-11-17 Thread Thomas Schmidt


 Original-Nachricht 
> Datum: Thu, 17 Nov 2011 08:42:48 +0100
> Von: Arne Jansen 
> An: Thomas Schmidt 
> CC: linux-btrfs@vger.kernel.org
> Betreff: Re: [RFC] improve space utilization on off-sized raid devices

> On 17.11.2011 01:27, Thomas Schmidt wrote:
> > With 2.6.38 I frequently ran into a no space left error
> Did you also test with 3.0? In 3.0, the allocation strategy changed
> vastly.
> In your setup, it should stripe to all 8 devices until the 5 smaller ones
> are full, and from then on stripe to the 3 remaining devices.
> See commit
> 
> commit 73c5de0051533cbdf2bb656586c3eb21a475aa7d
> Author: Arne Jansen 
> Date:   Tue Apr 12 12:07:57 2011 +0200
> 
> btrfs: quasi-round-robin for chunk allocation
> 
> Also using raid1 instead of raid10 will yield a better space utilization.

No I did not test if the problem occoured in vanilla 3.0.0.
I only did compare the code and saw no reason why behavior should have
changed (for my case).
The sorting is the base of my idea. But the order does not matter if you
allocate on all devices anyway (as with an even number of devs).

Afaik the behavior you describe is exactly the problem.
It wants to continuing with 3 devices, but according to the code raid10
requires 4.

I can't use the actual fs (or devs) the problem happened on but I will try a 
small scale test on some files. As I currently have my patch in use 
I will have to wait till I can reboot.
-- 
Empfehlen Sie GMX DSL Ihren Freunden und Bekannten und wir
belohnen Sie mit bis zu 50,- Euro! https://freundschaftswerbung.gmx.de
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] improve space utilization on off-sized raid devices

2011-11-16 Thread Arne Jansen
On 17.11.2011 01:27, Thomas Schmidt wrote:
> I wrote a small patch to improve allocation on differently sized raid devices.
> 
> With 2.6.38 I frequently ran into a no space left error that I attribute to
> this. But I'm not entierly sure. The fs was an 8 device -d raid0 -m raid10.
> The used space was the same across all devices. 5 were full and 3 bigger ones 
> still had plenty of space.
> I was unable to use the remaning space and a balance did not fix it for
> long.
> 

Did you also test with 3.0? In 3.0, the allocation strategy changed vastly.
In your setup, it should stripe to all 8 devices until the 5 smaller ones
are full, and from then on stripe to the 3 remaining devices.
See commit

commit 73c5de0051533cbdf2bb656586c3eb21a475aa7d
Author: Arne Jansen 
Date:   Tue Apr 12 12:07:57 2011 +0200

btrfs: quasi-round-robin for chunk allocation

Also using raid1 instead of raid10 will yield a better space utilization.

-Arne

> Now I tried to avoid getting there again.
> 
> The basic idea to not allocate space on the devices with the least free
> space. The amount of devices to leave out is calculated on each allocation
> to ajust to changing circumstances. It leaves the minimum number that still
> can achieve full space usage.
> 


> Additionally I tought leaving at least one out might be of use in device 
> removal.
> 
> Please take extra care with this. I'm new to btrfs, kernel and C in general.
> It was written and tested with 3.0.0.
> 
> 
> --- volumes.c.orig  2011-10-07 16:50:04.0 +0200
> +++ volumes.c   2011-11-16 23:49:08.097085568 +0100
> @@ -2329,6 +2329,8 @@ static int __btrfs_alloc_chunk(struct bt
> u64 stripe_size;
> u64 num_bytes;
> int ndevs;
> +   u64 fs_total_avail;
> +   int opt_ndevs;
> int i;
> int j;
>  
> @@ -2404,6 +2406,7 @@ static int __btrfs_alloc_chunk(struct bt
>  * about the available holes on each device.
>  */
> ndevs = 0;
> +   fs_total_avail = 0;
> while (cur != &fs_devices->alloc_list) {
> struct btrfs_device *device;
> u64 max_avail;
> @@ -2448,6 +2451,7 @@ static int __btrfs_alloc_chunk(struct bt
> devices_info[ndevs].total_avail = total_avail;
> devices_info[ndevs].dev = device;
> ++ndevs;
> +   fs_total_avail += total_avail;
> }
>  
> /*
> @@ -2456,6 +2460,20 @@ static int __btrfs_alloc_chunk(struct bt
> sort(devices_info, ndevs, sizeof(struct btrfs_device_info),
>  btrfs_cmp_device_info, NULL);
>  
> +   /*
> +* do not allocate space on all devices
> +* instead balance free space to maximise space utilization
> +* (this needs tweaking if parity raid gets implemented
> +* for n parity ignore the n first (after sort) devs in the sum and 
> division)
> +*/
> +   opt_ndevs = fs_total_avail / devices_info[0].total_avail;
> +   if (opt_ndevs >= ndevs)
> +   opt_ndevs = ndevs - 1; //optional, might be used for faster 
> dev remove?
> +   if (opt_ndevs < devs_min)
> +   opt_ndevs = devs_min;
> +   if (ndevs > opt_ndevs)
> +   ndevs = opt_ndevs;
> +
> /* round down to number of usable stripes */
> ndevs -= ndevs % devs_increment;
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC] improve space utilization on off-sized raid devices

2011-11-16 Thread Thomas Schmidt
I wrote a small patch to improve allocation on differently sized raid devices.

With 2.6.38 I frequently ran into a no space left error that I attribute to
this. But I'm not entierly sure. The fs was an 8 device -d raid0 -m raid10.
The used space was the same across all devices. 5 were full and 3 bigger ones 
still had plenty of space.
I was unable to use the remaning space and a balance did not fix it for
long.

Now I tried to avoid getting there again.

The basic idea to not allocate space on the devices with the least free
space. The amount of devices to leave out is calculated on each allocation
to ajust to changing circumstances. It leaves the minimum number that still
can achieve full space usage.

Additionally I tought leaving at least one out might be of use in device 
removal.

Please take extra care with this. I'm new to btrfs, kernel and C in general.
It was written and tested with 3.0.0.


--- volumes.c.orig  2011-10-07 16:50:04.0 +0200
+++ volumes.c   2011-11-16 23:49:08.097085568 +0100
@@ -2329,6 +2329,8 @@ static int __btrfs_alloc_chunk(struct bt
u64 stripe_size;
u64 num_bytes;
int ndevs;
+   u64 fs_total_avail;
+   int opt_ndevs;
int i;
int j;
 
@@ -2404,6 +2406,7 @@ static int __btrfs_alloc_chunk(struct bt
 * about the available holes on each device.
 */
ndevs = 0;
+   fs_total_avail = 0;
while (cur != &fs_devices->alloc_list) {
struct btrfs_device *device;
u64 max_avail;
@@ -2448,6 +2451,7 @@ static int __btrfs_alloc_chunk(struct bt
devices_info[ndevs].total_avail = total_avail;
devices_info[ndevs].dev = device;
++ndevs;
+   fs_total_avail += total_avail;
}
 
/*
@@ -2456,6 +2460,20 @@ static int __btrfs_alloc_chunk(struct bt
sort(devices_info, ndevs, sizeof(struct btrfs_device_info),
 btrfs_cmp_device_info, NULL);
 
+   /*
+* do not allocate space on all devices
+* instead balance free space to maximise space utilization
+* (this needs tweaking if parity raid gets implemented
+* for n parity ignore the n first (after sort) devs in the sum and 
division)
+*/
+   opt_ndevs = fs_total_avail / devices_info[0].total_avail;
+   if (opt_ndevs >= ndevs)
+   opt_ndevs = ndevs - 1; //optional, might be used for faster dev 
remove?
+   if (opt_ndevs < devs_min)
+   opt_ndevs = devs_min;
+   if (ndevs > opt_ndevs)
+   ndevs = opt_ndevs;
+
/* round down to number of usable stripes */
ndevs -= ndevs % devs_increment;

-- 
NEU: FreePhone - 0ct/min Handyspartarif mit Geld-zurück-Garantie!   
Jetzt informieren: http://www.gmx.net/de/go/freephone
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html