Re: [ceph-users] cephfs full, 2/3 Raw capacity used

2019-08-26 Thread Mark Nelson


On 8/26/19 7:39 AM, Wido den Hollander wrote:


On 8/26/19 1:35 PM, Simon Oosthoek wrote:

On 26-08-19 13:25, Simon Oosthoek wrote:

On 26-08-19 13:11, Wido den Hollander wrote:


The reweight might actually cause even more confusion for the balancer.
The balancer uses upmap mode and that re-allocates PGs to different OSDs
if needed.

Looking at the output send earlier I have some replies. See below.




Looking at this output the balancing seems OK, but from a different
perspective.

PGs are allocated to OSDs and not Objects nor data. All OSDs have 95~97
Placement Groups allocated.

That's good! A almost perfect distribution.

The problem that now rises is the difference in the size of these
Placement Groups as they hold different objects.

This is one of the side-effects of larger disks. The PGs on them will
grow and this will lead to inbalance between the OSDs.

I *think* that increasing the amount of PGs on this cluster would help,
but only for the pools which will contain most of the data.

This will consume a bit more CPU Power and Memory, but on modern systems
this should be less of a problem.

The good thing is that with Nautilus you can also scale down on the
amount of PGs if things would become a problem.

More PGs will mean smaller PGs and thus lead to a better data
distribution.



That makes sense, dividing the data in smaller chunks makes it more
flexible. The osd nodes are quite underloaded, even with turbo
recovery mode on (10, not 32 ;-).

When the cluster is in HEALTH_OK again, I'll increase the PGs for the
cephfs pools...

On second thought, I reverted my reweight commands and adjusted the PGs,
which were quite low for some of the pools. The reason they were low is
that when we first created them, we expected them to be rarely used, but
then we started filling them just for the filling, and these are
probably the cause of the unbalance.


You should make sure that the pools which contain the most data have the
most PGs.

Although ~100 PGs per OSD is the recommendation it won't hurt to have
~200 PGs as long as you have enough CPU power and Memory. More PGs will
mean better data distribution with such large disks.



Memory is probably the biggest concern, since the pglog can eat up a 
surprising amount of memory with lots of PGs on the OSD.  I suspect we 
should consider having the pglog controlled by the prioritycachemanager 
and set the lengths based on the amount of memory we want assigned to 
it.  Perhaps even dynamically changing based on the pool and current 
workload.  In the long run, we should probably have a much longer log on 
disk and shorter log in memory regardless.



Mark





The cluster now has over 8% misplaced objects, so that can take a while...

Cheers

/Simon
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs full, 2/3 Raw capacity used

2019-08-26 Thread Wido den Hollander



On 8/26/19 1:35 PM, Simon Oosthoek wrote:
> On 26-08-19 13:25, Simon Oosthoek wrote:
>> On 26-08-19 13:11, Wido den Hollander wrote:
>> 
>>>
>>> The reweight might actually cause even more confusion for the balancer.
>>> The balancer uses upmap mode and that re-allocates PGs to different OSDs
>>> if needed.
>>>
>>> Looking at the output send earlier I have some replies. See below.
>>>
>> 
>>>
>>> Looking at this output the balancing seems OK, but from a different
>>> perspective.
>>>
>>> PGs are allocated to OSDs and not Objects nor data. All OSDs have 95~97
>>> Placement Groups allocated.
>>>
>>> That's good! A almost perfect distribution.
>>>
>>> The problem that now rises is the difference in the size of these
>>> Placement Groups as they hold different objects.
>>>
>>> This is one of the side-effects of larger disks. The PGs on them will
>>> grow and this will lead to inbalance between the OSDs.
>>>
>>> I *think* that increasing the amount of PGs on this cluster would help,
>>> but only for the pools which will contain most of the data.
>>>
>>> This will consume a bit more CPU Power and Memory, but on modern systems
>>> this should be less of a problem.
>>>
>>> The good thing is that with Nautilus you can also scale down on the
>>> amount of PGs if things would become a problem.
>>>
>>> More PGs will mean smaller PGs and thus lead to a better data
>>> distribution.
>> 
>>
>> That makes sense, dividing the data in smaller chunks makes it more
>> flexible. The osd nodes are quite underloaded, even with turbo
>> recovery mode on (10, not 32 ;-).
>>
>> When the cluster is in HEALTH_OK again, I'll increase the PGs for the
>> cephfs pools...
> 
> On second thought, I reverted my reweight commands and adjusted the PGs,
> which were quite low for some of the pools. The reason they were low is
> that when we first created them, we expected them to be rarely used, but
> then we started filling them just for the filling, and these are
> probably the cause of the unbalance.
> 

You should make sure that the pools which contain the most data have the
most PGs.

Although ~100 PGs per OSD is the recommendation it won't hurt to have
~200 PGs as long as you have enough CPU power and Memory. More PGs will
mean better data distribution with such large disks.

> The cluster now has over 8% misplaced objects, so that can take a while...
> 
> Cheers
> 
> /Simon
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs full, 2/3 Raw capacity used

2019-08-26 Thread Simon Oosthoek

On 26-08-19 13:25, Simon Oosthoek wrote:

On 26-08-19 13:11, Wido den Hollander wrote:



The reweight might actually cause even more confusion for the balancer.
The balancer uses upmap mode and that re-allocates PGs to different OSDs
if needed.

Looking at the output send earlier I have some replies. See below.





Looking at this output the balancing seems OK, but from a different
perspective.

PGs are allocated to OSDs and not Objects nor data. All OSDs have 95~97
Placement Groups allocated.

That's good! A almost perfect distribution.

The problem that now rises is the difference in the size of these
Placement Groups as they hold different objects.

This is one of the side-effects of larger disks. The PGs on them will
grow and this will lead to inbalance between the OSDs.

I *think* that increasing the amount of PGs on this cluster would help,
but only for the pools which will contain most of the data.

This will consume a bit more CPU Power and Memory, but on modern systems
this should be less of a problem.

The good thing is that with Nautilus you can also scale down on the
amount of PGs if things would become a problem.

More PGs will mean smaller PGs and thus lead to a better data 
distribution.



That makes sense, dividing the data in smaller chunks makes it more 
flexible. The osd nodes are quite underloaded, even with turbo recovery 
mode on (10, not 32 ;-).


When the cluster is in HEALTH_OK again, I'll increase the PGs for the 
cephfs pools...


On second thought, I reverted my reweight commands and adjusted the PGs, 
which were quite low for some of the pools. The reason they were low is 
that when we first created them, we expected them to be rarely used, but 
then we started filling them just for the filling, and these are 
probably the cause of the unbalance.


The cluster now has over 8% misplaced objects, so that can take a while...

Cheers

/Simon
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs full, 2/3 Raw capacity used

2019-08-26 Thread Simon Oosthoek

On 26-08-19 13:11, Wido den Hollander wrote:



The reweight might actually cause even more confusion for the balancer.
The balancer uses upmap mode and that re-allocates PGs to different OSDs
if needed.

Looking at the output send earlier I have some replies. See below.





Looking at this output the balancing seems OK, but from a different
perspective.

PGs are allocated to OSDs and not Objects nor data. All OSDs have 95~97
Placement Groups allocated.

That's good! A almost perfect distribution.

The problem that now rises is the difference in the size of these
Placement Groups as they hold different objects.

This is one of the side-effects of larger disks. The PGs on them will
grow and this will lead to inbalance between the OSDs.

I *think* that increasing the amount of PGs on this cluster would help,
but only for the pools which will contain most of the data.

This will consume a bit more CPU Power and Memory, but on modern systems
this should be less of a problem.

The good thing is that with Nautilus you can also scale down on the
amount of PGs if things would become a problem.

More PGs will mean smaller PGs and thus lead to a better data distribution.



That makes sense, dividing the data in smaller chunks makes it more 
flexible. The osd nodes are quite underloaded, even with turbo recovery 
mode on (10, not 32 ;-).


When the cluster is in HEALTH_OK again, I'll increase the PGs for the 
cephfs pools...


Cheers,

/Simon
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs full, 2/3 Raw capacity used

2019-08-26 Thread Wido den Hollander


On 8/26/19 12:33 PM, Simon Oosthoek wrote:
> On 26-08-19 12:00, EDH - Manuel Rios Fernandez wrote:
>> Balancer just balance in Healthy mode.
>>
>> The problem is that data is distributed without be balanced in their
>> first
>> write, that cause unproperly data balanced across osd.
> 
> I suppose the crush algorithm doesn't take the fullness of the osds into
> account when placing objects...
> 

No, it doesn't. Objects are allocated to a Placement Group based on
their name (hash of it) and the amount of PGs for that pool.

There is no database where objects are. Clients (librados) calculate
this based on the object's name and the OSDMap (which contains the
CRUSHMap).

The allocation of the OSD isn't taken into account as this will result a
in different outcome every time and thus won't let you find your objects
after storing them.

>>
>> This problem only happens in CEPH, we are the same with 14.2.2, having to
>> change the weight manually.Because the balancer is a passive element
>> of the
>> cluster.
>>
>> I hope in next version we get a more aggressive balancer, like
>> enterprises
>> storages that allow to fill up 95% storage (raw).
> 
> I'm thinking a cronjob with a script to parse the output of `ceph osd df
> tree` and reweight according to the percentage used would be relatively
> easy to write. But I'll concentrate on monitoring before I start
> tweaking there ;-)
> 

The reweight might actually cause even more confusion for the balancer.
The balancer uses upmap mode and that re-allocates PGs to different OSDs
if needed.

Looking at the output send earlier I have some replies. See below.

> Cheers
> 
> /Simon
> 
>>
>> Regards
>>
>>
>> -Mensaje original-----
>> De: ceph-users  En nombre de Simon
>> Oosthoek
>> Enviado el: lunes, 26 de agosto de 2019 11:52
>> Para: Dan van der Ster 
>> CC: ceph-users 
>> Asunto: Re: [ceph-users] cephfs full, 2/3 Raw capacity used
>>
>> On 26-08-19 11:37, Dan van der Ster wrote:
>>> Thanks. The version and balancer config look good.
>>>
>>> So you can try `ceph osd reweight osd.10 0.8` to see if it helps to
>>> get you out of this.
>>
>> I've done this and the next fullest 3 osds. This will take some time to
>> recover, I'll let you know when it's done.
>>
>> Thanks,
>>
>> /simon
>>
>>>
>>> -- dan
>>>
>>> On Mon, Aug 26, 2019 at 11:35 AM Simon Oosthoek
>>>  wrote:
>>>>
>>>> On 26-08-19 11:16, Dan van der Ster wrote:
>>>>> Hi,
>>>>>
>>>>> Which version of ceph are you using? Which balancer mode?
>>>>
>>>> Nautilus (14.2.2), balancer is in upmap mode.
>>>>
>>>>> The balancer score isn't a percent-error or anything humanly usable.
>>>>> `ceph osd df tree` can better show you exactly which osds are
>>>>> over/under utilized and by how much.
>>>>>
>>>>
>>>> Aha, I ran this and sorted on the %full column:
>>>>
>>>>     81   hdd   10.81149  1.0  11 TiB 5.2 TiB 5.1 TiB   4 KiB  14
>>>> GiB
>>>> 5.6 TiB 48.40 0.73  96 up osd.81
>>>>     48   hdd   10.81149  1.0  11 TiB 5.3 TiB 5.2 TiB  15 KiB  14
>>>> GiB
>>>> 5.5 TiB 49.08 0.74  95 up osd.48
>>>> 154   hdd   10.81149  1.0  11 TiB 5.5 TiB 5.4 TiB 2.6 GiB  15 GiB
>>>> 5.3 TiB 50.95 0.76  96 up osd.154
>>>> 129   hdd   10.81149  1.0  11 TiB 5.5 TiB 5.4 TiB 5.1 GiB  16 GiB
>>>> 5.3 TiB 51.33 0.77  96 up osd.129
>>>>     42   hdd   10.81149  1.0  11 TiB 5.6 TiB 5.5 TiB 2.6 GiB  14
>>>> GiB
>>>> 5.2 TiB 51.81 0.78  96 up osd.42
>>>> 122   hdd   10.81149  1.0  11 TiB 5.7 TiB 5.6 TiB  16 KiB  14 GiB
>>>> 5.1 TiB 52.47 0.79  96 up osd.122
>>>> 120   hdd   10.81149  1.0  11 TiB 5.7 TiB 5.6 TiB 2.6 GiB  15 GiB
>>>> 5.1 TiB 52.92 0.79  95 up osd.120
>>>>     96   hdd   10.81149  1.0  11 TiB 5.8 TiB 5.7 TiB 2.6 GiB  15
>>>> GiB
>>>> 5.0 TiB 53.58 0.80  96 up osd.96
>>>>     26   hdd   10.81149  1.0  11 TiB 5.8 TiB 5.7 TiB  20 KiB  15
>>>> GiB
>>>> 5.0 TiB 53.68 0.80  97 up osd.26
>>>> ...
>>>>  6   hdd   10.81149  1.0  11 TiB 8

Re: [ceph-users] cephfs full, 2/3 Raw capacity used

2019-08-26 Thread Paul Emmerich
The balancer is unfortunately not that good when you have large k+m in
erasure coding profiles and relatively few servers, some manual
balancing will be required


Paul

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90

On Mon, Aug 26, 2019 at 12:33 PM Simon Oosthoek
 wrote:
>
> On 26-08-19 12:00, EDH - Manuel Rios Fernandez wrote:
> > Balancer just balance in Healthy mode.
> >
> > The problem is that data is distributed without be balanced in their first
> > write, that cause unproperly data balanced across osd.
>
> I suppose the crush algorithm doesn't take the fullness of the osds into
> account when placing objects...
>
> >
> > This problem only happens in CEPH, we are the same with 14.2.2, having to
> > change the weight manually.Because the balancer is a passive element of the
> > cluster.
> >
> > I hope in next version we get a more aggressive balancer, like enterprises
> > storages that allow to fill up 95% storage (raw).
>
> I'm thinking a cronjob with a script to parse the output of `ceph osd df
> tree` and reweight according to the percentage used would be relatively
> easy to write. But I'll concentrate on monitoring before I start
> tweaking there ;-)
>
> Cheers
>
> /Simon
>
> >
> > Regards
> >
> >
> > -Mensaje original-
> > De: ceph-users  En nombre de Simon
> > Oosthoek
> > Enviado el: lunes, 26 de agosto de 2019 11:52
> > Para: Dan van der Ster 
> > CC: ceph-users 
> > Asunto: Re: [ceph-users] cephfs full, 2/3 Raw capacity used
> >
> > On 26-08-19 11:37, Dan van der Ster wrote:
> >> Thanks. The version and balancer config look good.
> >>
> >> So you can try `ceph osd reweight osd.10 0.8` to see if it helps to
> >> get you out of this.
> >
> > I've done this and the next fullest 3 osds. This will take some time to
> > recover, I'll let you know when it's done.
> >
> > Thanks,
> >
> > /simon
> >
> >>
> >> -- dan
> >>
> >> On Mon, Aug 26, 2019 at 11:35 AM Simon Oosthoek
> >>  wrote:
> >>>
> >>> On 26-08-19 11:16, Dan van der Ster wrote:
> >>>> Hi,
> >>>>
> >>>> Which version of ceph are you using? Which balancer mode?
> >>>
> >>> Nautilus (14.2.2), balancer is in upmap mode.
> >>>
> >>>> The balancer score isn't a percent-error or anything humanly usable.
> >>>> `ceph osd df tree` can better show you exactly which osds are
> >>>> over/under utilized and by how much.
> >>>>
> >>>
> >>> Aha, I ran this and sorted on the %full column:
> >>>
> >>> 81   hdd   10.81149  1.0  11 TiB 5.2 TiB 5.1 TiB   4 KiB  14 GiB
> >>> 5.6 TiB 48.40 0.73  96 up osd.81
> >>> 48   hdd   10.81149  1.0  11 TiB 5.3 TiB 5.2 TiB  15 KiB  14 GiB
> >>> 5.5 TiB 49.08 0.74  95 up osd.48
> >>> 154   hdd   10.81149  1.0  11 TiB 5.5 TiB 5.4 TiB 2.6 GiB  15 GiB
> >>> 5.3 TiB 50.95 0.76  96 up osd.154
> >>> 129   hdd   10.81149  1.0  11 TiB 5.5 TiB 5.4 TiB 5.1 GiB  16 GiB
> >>> 5.3 TiB 51.33 0.77  96 up osd.129
> >>> 42   hdd   10.81149  1.0  11 TiB 5.6 TiB 5.5 TiB 2.6 GiB  14 GiB
> >>> 5.2 TiB 51.81 0.78  96 up osd.42
> >>> 122   hdd   10.81149  1.0  11 TiB 5.7 TiB 5.6 TiB  16 KiB  14 GiB
> >>> 5.1 TiB 52.47 0.79  96 up osd.122
> >>> 120   hdd   10.81149  1.0  11 TiB 5.7 TiB 5.6 TiB 2.6 GiB  15 GiB
> >>> 5.1 TiB 52.92 0.79  95 up osd.120
> >>> 96   hdd   10.81149  1.0  11 TiB 5.8 TiB 5.7 TiB 2.6 GiB  15 GiB
> >>> 5.0 TiB 53.58 0.80  96 up osd.96
> >>> 26   hdd   10.81149  1.0  11 TiB 5.8 TiB 5.7 TiB  20 KiB  15 GiB
> >>> 5.0 TiB 53.68 0.80  97 up osd.26
> >>> ...
> >>>  6   hdd   10.81149  1.0  11 TiB 8.3 TiB 8.2 TiB  88 KiB  18 GiB
> >>> 2.5 TiB 77.14 1.16  96 up osd.6
> >>> 16   hdd   10.81149  1.0  11 TiB 8.4 TiB 8.3 TiB  28 KiB  18 GiB
> >>> 2.4 TiB 77.56 1.16  95 up osd.16
> >>>  0   hdd   10.81149  1.0  11 TiB 8.6 TiB 8.4 TiB  48 KiB  17 GiB
> >>> 2.2 TiB 79.24 

Re: [ceph-users] cephfs full, 2/3 Raw capacity used

2019-08-26 Thread Simon Oosthoek

On 26-08-19 12:00, EDH - Manuel Rios Fernandez wrote:

Balancer just balance in Healthy mode.

The problem is that data is distributed without be balanced in their first
write, that cause unproperly data balanced across osd.


I suppose the crush algorithm doesn't take the fullness of the osds into 
account when placing objects...




This problem only happens in CEPH, we are the same with 14.2.2, having to
change the weight manually.Because the balancer is a passive element of the
cluster.

I hope in next version we get a more aggressive balancer, like enterprises
storages that allow to fill up 95% storage (raw).


I'm thinking a cronjob with a script to parse the output of `ceph osd df 
tree` and reweight according to the percentage used would be relatively 
easy to write. But I'll concentrate on monitoring before I start 
tweaking there ;-)


Cheers

/Simon



Regards


-Mensaje original-
De: ceph-users  En nombre de Simon
Oosthoek
Enviado el: lunes, 26 de agosto de 2019 11:52
Para: Dan van der Ster 
CC: ceph-users 
Asunto: Re: [ceph-users] cephfs full, 2/3 Raw capacity used

On 26-08-19 11:37, Dan van der Ster wrote:

Thanks. The version and balancer config look good.

So you can try `ceph osd reweight osd.10 0.8` to see if it helps to
get you out of this.


I've done this and the next fullest 3 osds. This will take some time to
recover, I'll let you know when it's done.

Thanks,

/simon



-- dan

On Mon, Aug 26, 2019 at 11:35 AM Simon Oosthoek
 wrote:


On 26-08-19 11:16, Dan van der Ster wrote:

Hi,

Which version of ceph are you using? Which balancer mode?


Nautilus (14.2.2), balancer is in upmap mode.


The balancer score isn't a percent-error or anything humanly usable.
`ceph osd df tree` can better show you exactly which osds are
over/under utilized and by how much.



Aha, I ran this and sorted on the %full column:

81   hdd   10.81149  1.0  11 TiB 5.2 TiB 5.1 TiB   4 KiB  14 GiB
5.6 TiB 48.40 0.73  96 up osd.81
48   hdd   10.81149  1.0  11 TiB 5.3 TiB 5.2 TiB  15 KiB  14 GiB
5.5 TiB 49.08 0.74  95 up osd.48
154   hdd   10.81149  1.0  11 TiB 5.5 TiB 5.4 TiB 2.6 GiB  15 GiB
5.3 TiB 50.95 0.76  96 up osd.154
129   hdd   10.81149  1.0  11 TiB 5.5 TiB 5.4 TiB 5.1 GiB  16 GiB
5.3 TiB 51.33 0.77  96 up osd.129
42   hdd   10.81149  1.0  11 TiB 5.6 TiB 5.5 TiB 2.6 GiB  14 GiB
5.2 TiB 51.81 0.78  96 up osd.42
122   hdd   10.81149  1.0  11 TiB 5.7 TiB 5.6 TiB  16 KiB  14 GiB
5.1 TiB 52.47 0.79  96 up osd.122
120   hdd   10.81149  1.0  11 TiB 5.7 TiB 5.6 TiB 2.6 GiB  15 GiB
5.1 TiB 52.92 0.79  95 up osd.120
96   hdd   10.81149  1.0  11 TiB 5.8 TiB 5.7 TiB 2.6 GiB  15 GiB
5.0 TiB 53.58 0.80  96 up osd.96
26   hdd   10.81149  1.0  11 TiB 5.8 TiB 5.7 TiB  20 KiB  15 GiB
5.0 TiB 53.68 0.80  97 up osd.26
...
 6   hdd   10.81149  1.0  11 TiB 8.3 TiB 8.2 TiB  88 KiB  18 GiB
2.5 TiB 77.14 1.16  96 up osd.6
16   hdd   10.81149  1.0  11 TiB 8.4 TiB 8.3 TiB  28 KiB  18 GiB
2.4 TiB 77.56 1.16  95 up osd.16
 0   hdd   10.81149  1.0  11 TiB 8.6 TiB 8.4 TiB  48 KiB  17 GiB
2.2 TiB 79.24 1.19  96 up osd.0
144   hdd   10.81149  1.0  11 TiB 8.6 TiB 8.5 TiB 2.6 GiB  18 GiB
2.2 TiB 79.57 1.19  95 up osd.144
136   hdd   10.81149  1.0  11 TiB 8.6 TiB 8.5 TiB  48 KiB  17 GiB
2.2 TiB 79.60 1.19  95 up osd.136
63   hdd   10.81149  1.0  11 TiB 8.6 TiB 8.5 TiB 2.6 GiB  17 GiB
2.2 TiB 79.60 1.19  95 up osd.63
155   hdd   10.81149  1.0  11 TiB 8.6 TiB 8.5 TiB   8 KiB  19 GiB
2.2 TiB 79.85 1.20  95 up osd.155
89   hdd   10.81149  1.0  11 TiB 8.7 TiB 8.5 TiB  12 KiB  20 GiB
2.2 TiB 80.04 1.20  96 up osd.89
106   hdd   10.81149  1.0  11 TiB 8.8 TiB 8.7 TiB  64 KiB  19 GiB
2.0 TiB 81.38 1.22  96 up osd.106
94   hdd   10.81149  1.0  11 TiB 9.0 TiB 8.9 TiB 0 B  19 GiB
1.8 TiB 83.53 1.25  96 up osd.94
33   hdd   10.81149  1.0  11 TiB 9.1 TiB 9.0 TiB  44 KiB  19 GiB
1.7 TiB 84.40 1.27  96 up osd.33
15   hdd   10.81149  1.0  11 TiB  10 TiB 9.8 TiB  16 KiB  20 GiB
877 GiB 92.08 1.38  96 up osd.15
53   hdd   10.81149  1.0  11 TiB  10 TiB  10 TiB 2.6 GiB  20 GiB
676 GiB 93.90 1.41  96 up osd.53
51   hdd   10.81149  1.0  11 TiB  10 TiB  10 TiB 2.6 GiB  20 GiB
666 GiB 93.98 1.41  96 up osd.51
10   hdd   10.81149  1.0  11 TiB  10 TiB  10 TiB  40 KiB  22 GiB
552 GiB 95.01 1.42  97 up osd.10

So the fullest one is at 95.01%, the emptiest one at 48.4%, so
there's some b

Re: [ceph-users] cephfs full, 2/3 Raw capacity used

2019-08-26 Thread EDH - Manuel Rios Fernandez
Balancer just balance in Healthy mode.

The problem is that data is distributed without be balanced in their first
write, that cause unproperly data balanced across osd.

This problem only happens in CEPH, we are the same with 14.2.2, having to
change the weight manually.Because the balancer is a passive element of the
cluster.

I hope in next version we get a more aggressive balancer, like enterprises
storages that allow to fill up 95% storage (raw).

Regards


-Mensaje original-
De: ceph-users  En nombre de Simon
Oosthoek
Enviado el: lunes, 26 de agosto de 2019 11:52
Para: Dan van der Ster 
CC: ceph-users 
Asunto: Re: [ceph-users] cephfs full, 2/3 Raw capacity used

On 26-08-19 11:37, Dan van der Ster wrote:
> Thanks. The version and balancer config look good.
> 
> So you can try `ceph osd reweight osd.10 0.8` to see if it helps to 
> get you out of this.

I've done this and the next fullest 3 osds. This will take some time to
recover, I'll let you know when it's done.

Thanks,

/simon

> 
> -- dan
> 
> On Mon, Aug 26, 2019 at 11:35 AM Simon Oosthoek 
>  wrote:
>>
>> On 26-08-19 11:16, Dan van der Ster wrote:
>>> Hi,
>>>
>>> Which version of ceph are you using? Which balancer mode?
>>
>> Nautilus (14.2.2), balancer is in upmap mode.
>>
>>> The balancer score isn't a percent-error or anything humanly usable.
>>> `ceph osd df tree` can better show you exactly which osds are 
>>> over/under utilized and by how much.
>>>
>>
>> Aha, I ran this and sorted on the %full column:
>>
>>81   hdd   10.81149  1.0  11 TiB 5.2 TiB 5.1 TiB   4 KiB  14 GiB
>> 5.6 TiB 48.40 0.73  96 up osd.81
>>48   hdd   10.81149  1.0  11 TiB 5.3 TiB 5.2 TiB  15 KiB  14 GiB
>> 5.5 TiB 49.08 0.74  95 up osd.48
>> 154   hdd   10.81149  1.0  11 TiB 5.5 TiB 5.4 TiB 2.6 GiB  15 GiB
>> 5.3 TiB 50.95 0.76  96 up osd.154
>> 129   hdd   10.81149  1.0  11 TiB 5.5 TiB 5.4 TiB 5.1 GiB  16 GiB
>> 5.3 TiB 51.33 0.77  96 up osd.129
>>42   hdd   10.81149  1.0  11 TiB 5.6 TiB 5.5 TiB 2.6 GiB  14 GiB
>> 5.2 TiB 51.81 0.78  96 up osd.42
>> 122   hdd   10.81149  1.0  11 TiB 5.7 TiB 5.6 TiB  16 KiB  14 GiB
>> 5.1 TiB 52.47 0.79  96 up osd.122
>> 120   hdd   10.81149  1.0  11 TiB 5.7 TiB 5.6 TiB 2.6 GiB  15 GiB
>> 5.1 TiB 52.92 0.79  95 up osd.120
>>96   hdd   10.81149  1.0  11 TiB 5.8 TiB 5.7 TiB 2.6 GiB  15 GiB
>> 5.0 TiB 53.58 0.80  96 up osd.96
>>26   hdd   10.81149  1.0  11 TiB 5.8 TiB 5.7 TiB  20 KiB  15 GiB
>> 5.0 TiB 53.68 0.80  97 up osd.26
>> ...
>> 6   hdd   10.81149  1.0  11 TiB 8.3 TiB 8.2 TiB  88 KiB  18 GiB
>> 2.5 TiB 77.14 1.16  96 up osd.6
>>16   hdd   10.81149  1.0  11 TiB 8.4 TiB 8.3 TiB  28 KiB  18 GiB
>> 2.4 TiB 77.56 1.16  95 up osd.16
>> 0   hdd   10.81149  1.0  11 TiB 8.6 TiB 8.4 TiB  48 KiB  17 GiB
>> 2.2 TiB 79.24 1.19  96 up osd.0
>> 144   hdd   10.81149  1.0  11 TiB 8.6 TiB 8.5 TiB 2.6 GiB  18 GiB
>> 2.2 TiB 79.57 1.19  95 up osd.144
>> 136   hdd   10.81149  1.0  11 TiB 8.6 TiB 8.5 TiB  48 KiB  17 GiB
>> 2.2 TiB 79.60 1.19  95 up osd.136
>>63   hdd   10.81149  1.0  11 TiB 8.6 TiB 8.5 TiB 2.6 GiB  17 GiB
>> 2.2 TiB 79.60 1.19  95 up osd.63
>> 155   hdd   10.81149  1.0  11 TiB 8.6 TiB 8.5 TiB   8 KiB  19 GiB
>> 2.2 TiB 79.85 1.20  95 up osd.155
>>89   hdd   10.81149  1.0  11 TiB 8.7 TiB 8.5 TiB  12 KiB  20 GiB
>> 2.2 TiB 80.04 1.20  96 up osd.89
>> 106   hdd   10.81149  1.0  11 TiB 8.8 TiB 8.7 TiB  64 KiB  19 GiB
>> 2.0 TiB 81.38 1.22  96 up osd.106
>>94   hdd   10.81149  1.0  11 TiB 9.0 TiB 8.9 TiB 0 B  19 GiB
>> 1.8 TiB 83.53 1.25  96 up osd.94
>>33   hdd   10.81149  1.0  11 TiB 9.1 TiB 9.0 TiB  44 KiB  19 GiB
>> 1.7 TiB 84.40 1.27  96 up osd.33
>>15   hdd   10.81149  1.0  11 TiB  10 TiB 9.8 TiB  16 KiB  20 GiB
>> 877 GiB 92.08 1.38  96 up osd.15
>>53   hdd   10.81149  1.0  11 TiB  10 TiB  10 TiB 2.6 GiB  20 GiB
>> 676 GiB 93.90 1.41  96 up osd.53
>>51   hdd   10.81149  1.0  11 TiB  10 TiB  10 TiB 2.6 GiB  20 GiB
>> 666 GiB 93.98 1.41  96 up osd.51
>>10   hdd   10.8

Re: [ceph-users] cephfs full, 2/3 Raw capacity used

2019-08-26 Thread Simon Oosthoek

On 26-08-19 11:37, Dan van der Ster wrote:

Thanks. The version and balancer config look good.

So you can try `ceph osd reweight osd.10 0.8` to see if it helps to
get you out of this.


I've done this and the next fullest 3 osds. This will take some time to 
recover, I'll let you know when it's done.


Thanks,

/simon



-- dan

On Mon, Aug 26, 2019 at 11:35 AM Simon Oosthoek
 wrote:


On 26-08-19 11:16, Dan van der Ster wrote:

Hi,

Which version of ceph are you using? Which balancer mode?


Nautilus (14.2.2), balancer is in upmap mode.


The balancer score isn't a percent-error or anything humanly usable.
`ceph osd df tree` can better show you exactly which osds are
over/under utilized and by how much.



Aha, I ran this and sorted on the %full column:

   81   hdd   10.81149  1.0  11 TiB 5.2 TiB 5.1 TiB   4 KiB  14 GiB
5.6 TiB 48.40 0.73  96 up osd.81
   48   hdd   10.81149  1.0  11 TiB 5.3 TiB 5.2 TiB  15 KiB  14 GiB
5.5 TiB 49.08 0.74  95 up osd.48
154   hdd   10.81149  1.0  11 TiB 5.5 TiB 5.4 TiB 2.6 GiB  15 GiB
5.3 TiB 50.95 0.76  96 up osd.154
129   hdd   10.81149  1.0  11 TiB 5.5 TiB 5.4 TiB 5.1 GiB  16 GiB
5.3 TiB 51.33 0.77  96 up osd.129
   42   hdd   10.81149  1.0  11 TiB 5.6 TiB 5.5 TiB 2.6 GiB  14 GiB
5.2 TiB 51.81 0.78  96 up osd.42
122   hdd   10.81149  1.0  11 TiB 5.7 TiB 5.6 TiB  16 KiB  14 GiB
5.1 TiB 52.47 0.79  96 up osd.122
120   hdd   10.81149  1.0  11 TiB 5.7 TiB 5.6 TiB 2.6 GiB  15 GiB
5.1 TiB 52.92 0.79  95 up osd.120
   96   hdd   10.81149  1.0  11 TiB 5.8 TiB 5.7 TiB 2.6 GiB  15 GiB
5.0 TiB 53.58 0.80  96 up osd.96
   26   hdd   10.81149  1.0  11 TiB 5.8 TiB 5.7 TiB  20 KiB  15 GiB
5.0 TiB 53.68 0.80  97 up osd.26
...
6   hdd   10.81149  1.0  11 TiB 8.3 TiB 8.2 TiB  88 KiB  18 GiB
2.5 TiB 77.14 1.16  96 up osd.6
   16   hdd   10.81149  1.0  11 TiB 8.4 TiB 8.3 TiB  28 KiB  18 GiB
2.4 TiB 77.56 1.16  95 up osd.16
0   hdd   10.81149  1.0  11 TiB 8.6 TiB 8.4 TiB  48 KiB  17 GiB
2.2 TiB 79.24 1.19  96 up osd.0
144   hdd   10.81149  1.0  11 TiB 8.6 TiB 8.5 TiB 2.6 GiB  18 GiB
2.2 TiB 79.57 1.19  95 up osd.144
136   hdd   10.81149  1.0  11 TiB 8.6 TiB 8.5 TiB  48 KiB  17 GiB
2.2 TiB 79.60 1.19  95 up osd.136
   63   hdd   10.81149  1.0  11 TiB 8.6 TiB 8.5 TiB 2.6 GiB  17 GiB
2.2 TiB 79.60 1.19  95 up osd.63
155   hdd   10.81149  1.0  11 TiB 8.6 TiB 8.5 TiB   8 KiB  19 GiB
2.2 TiB 79.85 1.20  95 up osd.155
   89   hdd   10.81149  1.0  11 TiB 8.7 TiB 8.5 TiB  12 KiB  20 GiB
2.2 TiB 80.04 1.20  96 up osd.89
106   hdd   10.81149  1.0  11 TiB 8.8 TiB 8.7 TiB  64 KiB  19 GiB
2.0 TiB 81.38 1.22  96 up osd.106
   94   hdd   10.81149  1.0  11 TiB 9.0 TiB 8.9 TiB 0 B  19 GiB
1.8 TiB 83.53 1.25  96 up osd.94
   33   hdd   10.81149  1.0  11 TiB 9.1 TiB 9.0 TiB  44 KiB  19 GiB
1.7 TiB 84.40 1.27  96 up osd.33
   15   hdd   10.81149  1.0  11 TiB  10 TiB 9.8 TiB  16 KiB  20 GiB
877 GiB 92.08 1.38  96 up osd.15
   53   hdd   10.81149  1.0  11 TiB  10 TiB  10 TiB 2.6 GiB  20 GiB
676 GiB 93.90 1.41  96 up osd.53
   51   hdd   10.81149  1.0  11 TiB  10 TiB  10 TiB 2.6 GiB  20 GiB
666 GiB 93.98 1.41  96 up osd.51
   10   hdd   10.81149  1.0  11 TiB  10 TiB  10 TiB  40 KiB  22 GiB
552 GiB 95.01 1.42  97 up osd.10

So the fullest one is at 95.01%, the emptiest one at 48.4%, so there's
some balancing to be done.


You might be able to manually fix things by using `ceph osd reweight
...` on the most full osds to move data elsewhere.


I'll look into this, but I was hoping that the balancer module would
take care of this...



Otherwise, in general, its good to setup monitoring so you notice and
take action well before the osds fill up.


Yes, I'm still working on this, I want to add some checks to our
check_mk+icinga setup using native plugins, but my python skills are not
quite up to the task, at least, not yet ;-)

Cheers

/Simon



Cheers, Dan

On Mon, Aug 26, 2019 at 11:09 AM Simon Oosthoek
 wrote:


Hi all,

we're building up our experience with our ceph cluster before we take it
into production. I've now tried to fill up the cluster with cephfs,
which we plan to use for about 95% of all data on the cluster.

The cephfs pools are full when the cluster reports 67% raw capacity
used. There are 4 pools we use for cephfs data, 3-copy, 4-copy, EC 8+3
and EC 5+7. The balancer module is turned on and `ceph balancer eval`
gives `current cluster score 0.013255 (lower is better)`, so well within
the default 5% margin. Is there a setti

Re: [ceph-users] cephfs full, 2/3 Raw capacity used

2019-08-26 Thread Dan van der Ster
Thanks. The version and balancer config look good.

So you can try `ceph osd reweight osd.10 0.8` to see if it helps to
get you out of this.

-- dan

On Mon, Aug 26, 2019 at 11:35 AM Simon Oosthoek
 wrote:
>
> On 26-08-19 11:16, Dan van der Ster wrote:
> > Hi,
> >
> > Which version of ceph are you using? Which balancer mode?
>
> Nautilus (14.2.2), balancer is in upmap mode.
>
> > The balancer score isn't a percent-error or anything humanly usable.
> > `ceph osd df tree` can better show you exactly which osds are
> > over/under utilized and by how much.
> >
>
> Aha, I ran this and sorted on the %full column:
>
>   81   hdd   10.81149  1.0  11 TiB 5.2 TiB 5.1 TiB   4 KiB  14 GiB
> 5.6 TiB 48.40 0.73  96 up osd.81
>   48   hdd   10.81149  1.0  11 TiB 5.3 TiB 5.2 TiB  15 KiB  14 GiB
> 5.5 TiB 49.08 0.74  95 up osd.48
> 154   hdd   10.81149  1.0  11 TiB 5.5 TiB 5.4 TiB 2.6 GiB  15 GiB
> 5.3 TiB 50.95 0.76  96 up osd.154
> 129   hdd   10.81149  1.0  11 TiB 5.5 TiB 5.4 TiB 5.1 GiB  16 GiB
> 5.3 TiB 51.33 0.77  96 up osd.129
>   42   hdd   10.81149  1.0  11 TiB 5.6 TiB 5.5 TiB 2.6 GiB  14 GiB
> 5.2 TiB 51.81 0.78  96 up osd.42
> 122   hdd   10.81149  1.0  11 TiB 5.7 TiB 5.6 TiB  16 KiB  14 GiB
> 5.1 TiB 52.47 0.79  96 up osd.122
> 120   hdd   10.81149  1.0  11 TiB 5.7 TiB 5.6 TiB 2.6 GiB  15 GiB
> 5.1 TiB 52.92 0.79  95 up osd.120
>   96   hdd   10.81149  1.0  11 TiB 5.8 TiB 5.7 TiB 2.6 GiB  15 GiB
> 5.0 TiB 53.58 0.80  96 up osd.96
>   26   hdd   10.81149  1.0  11 TiB 5.8 TiB 5.7 TiB  20 KiB  15 GiB
> 5.0 TiB 53.68 0.80  97 up osd.26
> ...
>6   hdd   10.81149  1.0  11 TiB 8.3 TiB 8.2 TiB  88 KiB  18 GiB
> 2.5 TiB 77.14 1.16  96 up osd.6
>   16   hdd   10.81149  1.0  11 TiB 8.4 TiB 8.3 TiB  28 KiB  18 GiB
> 2.4 TiB 77.56 1.16  95 up osd.16
>0   hdd   10.81149  1.0  11 TiB 8.6 TiB 8.4 TiB  48 KiB  17 GiB
> 2.2 TiB 79.24 1.19  96 up osd.0
> 144   hdd   10.81149  1.0  11 TiB 8.6 TiB 8.5 TiB 2.6 GiB  18 GiB
> 2.2 TiB 79.57 1.19  95 up osd.144
> 136   hdd   10.81149  1.0  11 TiB 8.6 TiB 8.5 TiB  48 KiB  17 GiB
> 2.2 TiB 79.60 1.19  95 up osd.136
>   63   hdd   10.81149  1.0  11 TiB 8.6 TiB 8.5 TiB 2.6 GiB  17 GiB
> 2.2 TiB 79.60 1.19  95 up osd.63
> 155   hdd   10.81149  1.0  11 TiB 8.6 TiB 8.5 TiB   8 KiB  19 GiB
> 2.2 TiB 79.85 1.20  95 up osd.155
>   89   hdd   10.81149  1.0  11 TiB 8.7 TiB 8.5 TiB  12 KiB  20 GiB
> 2.2 TiB 80.04 1.20  96 up osd.89
> 106   hdd   10.81149  1.0  11 TiB 8.8 TiB 8.7 TiB  64 KiB  19 GiB
> 2.0 TiB 81.38 1.22  96 up osd.106
>   94   hdd   10.81149  1.0  11 TiB 9.0 TiB 8.9 TiB 0 B  19 GiB
> 1.8 TiB 83.53 1.25  96 up osd.94
>   33   hdd   10.81149  1.0  11 TiB 9.1 TiB 9.0 TiB  44 KiB  19 GiB
> 1.7 TiB 84.40 1.27  96 up osd.33
>   15   hdd   10.81149  1.0  11 TiB  10 TiB 9.8 TiB  16 KiB  20 GiB
> 877 GiB 92.08 1.38  96 up osd.15
>   53   hdd   10.81149  1.0  11 TiB  10 TiB  10 TiB 2.6 GiB  20 GiB
> 676 GiB 93.90 1.41  96 up osd.53
>   51   hdd   10.81149  1.0  11 TiB  10 TiB  10 TiB 2.6 GiB  20 GiB
> 666 GiB 93.98 1.41  96 up osd.51
>   10   hdd   10.81149  1.0  11 TiB  10 TiB  10 TiB  40 KiB  22 GiB
> 552 GiB 95.01 1.42  97 up osd.10
>
> So the fullest one is at 95.01%, the emptiest one at 48.4%, so there's
> some balancing to be done.
>
> > You might be able to manually fix things by using `ceph osd reweight
> > ...` on the most full osds to move data elsewhere.
>
> I'll look into this, but I was hoping that the balancer module would
> take care of this...
>
> >
> > Otherwise, in general, its good to setup monitoring so you notice and
> > take action well before the osds fill up.
>
> Yes, I'm still working on this, I want to add some checks to our
> check_mk+icinga setup using native plugins, but my python skills are not
> quite up to the task, at least, not yet ;-)
>
> Cheers
>
> /Simon
>
> >
> > Cheers, Dan
> >
> > On Mon, Aug 26, 2019 at 11:09 AM Simon Oosthoek
> >  wrote:
> >>
> >> Hi all,
> >>
> >> we're building up our experience with our ceph cluster before we take it
> >> into production. I've now tried to fill up the cluster with cephfs,
> >> which we plan to use for about 95% of all data on the cluster.
> >>
> >> The cephfs pools are full when the cluster reports 67% raw capacity
> >> used. There are 4 pools we use for cephfs data, 3-copy, 4-copy, EC 8+3
> >> and EC 5+7. The balancer module is turned on and `ceph balancer eval`
> >> gives `current cluster score 0.013255 (lower is better)`, so well within

Re: [ceph-users] cephfs full, 2/3 Raw capacity used

2019-08-26 Thread Simon Oosthoek

On 26-08-19 11:16, Dan van der Ster wrote:

Hi,

Which version of ceph are you using? Which balancer mode?


Nautilus (14.2.2), balancer is in upmap mode.


The balancer score isn't a percent-error or anything humanly usable.
`ceph osd df tree` can better show you exactly which osds are
over/under utilized and by how much.



Aha, I ran this and sorted on the %full column:

 81   hdd   10.81149  1.0  11 TiB 5.2 TiB 5.1 TiB   4 KiB  14 GiB 
5.6 TiB 48.40 0.73  96 up osd.81
 48   hdd   10.81149  1.0  11 TiB 5.3 TiB 5.2 TiB  15 KiB  14 GiB 
5.5 TiB 49.08 0.74  95 up osd.48
154   hdd   10.81149  1.0  11 TiB 5.5 TiB 5.4 TiB 2.6 GiB  15 GiB 
5.3 TiB 50.95 0.76  96 up osd.154
129   hdd   10.81149  1.0  11 TiB 5.5 TiB 5.4 TiB 5.1 GiB  16 GiB 
5.3 TiB 51.33 0.77  96 up osd.129
 42   hdd   10.81149  1.0  11 TiB 5.6 TiB 5.5 TiB 2.6 GiB  14 GiB 
5.2 TiB 51.81 0.78  96 up osd.42
122   hdd   10.81149  1.0  11 TiB 5.7 TiB 5.6 TiB  16 KiB  14 GiB 
5.1 TiB 52.47 0.79  96 up osd.122
120   hdd   10.81149  1.0  11 TiB 5.7 TiB 5.6 TiB 2.6 GiB  15 GiB 
5.1 TiB 52.92 0.79  95 up osd.120
 96   hdd   10.81149  1.0  11 TiB 5.8 TiB 5.7 TiB 2.6 GiB  15 GiB 
5.0 TiB 53.58 0.80  96 up osd.96
 26   hdd   10.81149  1.0  11 TiB 5.8 TiB 5.7 TiB  20 KiB  15 GiB 
5.0 TiB 53.68 0.80  97 up osd.26

...
  6   hdd   10.81149  1.0  11 TiB 8.3 TiB 8.2 TiB  88 KiB  18 GiB 
2.5 TiB 77.14 1.16  96 up osd.6
 16   hdd   10.81149  1.0  11 TiB 8.4 TiB 8.3 TiB  28 KiB  18 GiB 
2.4 TiB 77.56 1.16  95 up osd.16
  0   hdd   10.81149  1.0  11 TiB 8.6 TiB 8.4 TiB  48 KiB  17 GiB 
2.2 TiB 79.24 1.19  96 up osd.0
144   hdd   10.81149  1.0  11 TiB 8.6 TiB 8.5 TiB 2.6 GiB  18 GiB 
2.2 TiB 79.57 1.19  95 up osd.144
136   hdd   10.81149  1.0  11 TiB 8.6 TiB 8.5 TiB  48 KiB  17 GiB 
2.2 TiB 79.60 1.19  95 up osd.136
 63   hdd   10.81149  1.0  11 TiB 8.6 TiB 8.5 TiB 2.6 GiB  17 GiB 
2.2 TiB 79.60 1.19  95 up osd.63
155   hdd   10.81149  1.0  11 TiB 8.6 TiB 8.5 TiB   8 KiB  19 GiB 
2.2 TiB 79.85 1.20  95 up osd.155
 89   hdd   10.81149  1.0  11 TiB 8.7 TiB 8.5 TiB  12 KiB  20 GiB 
2.2 TiB 80.04 1.20  96 up osd.89
106   hdd   10.81149  1.0  11 TiB 8.8 TiB 8.7 TiB  64 KiB  19 GiB 
2.0 TiB 81.38 1.22  96 up osd.106
 94   hdd   10.81149  1.0  11 TiB 9.0 TiB 8.9 TiB 0 B  19 GiB 
1.8 TiB 83.53 1.25  96 up osd.94
 33   hdd   10.81149  1.0  11 TiB 9.1 TiB 9.0 TiB  44 KiB  19 GiB 
1.7 TiB 84.40 1.27  96 up osd.33
 15   hdd   10.81149  1.0  11 TiB  10 TiB 9.8 TiB  16 KiB  20 GiB 
877 GiB 92.08 1.38  96 up osd.15
 53   hdd   10.81149  1.0  11 TiB  10 TiB  10 TiB 2.6 GiB  20 GiB 
676 GiB 93.90 1.41  96 up osd.53
 51   hdd   10.81149  1.0  11 TiB  10 TiB  10 TiB 2.6 GiB  20 GiB 
666 GiB 93.98 1.41  96 up osd.51
 10   hdd   10.81149  1.0  11 TiB  10 TiB  10 TiB  40 KiB  22 GiB 
552 GiB 95.01 1.42  97 up osd.10


So the fullest one is at 95.01%, the emptiest one at 48.4%, so there's 
some balancing to be done.



You might be able to manually fix things by using `ceph osd reweight
...` on the most full osds to move data elsewhere.


I'll look into this, but I was hoping that the balancer module would 
take care of this...




Otherwise, in general, its good to setup monitoring so you notice and
take action well before the osds fill up.


Yes, I'm still working on this, I want to add some checks to our 
check_mk+icinga setup using native plugins, but my python skills are not 
quite up to the task, at least, not yet ;-)


Cheers

/Simon



Cheers, Dan

On Mon, Aug 26, 2019 at 11:09 AM Simon Oosthoek
 wrote:


Hi all,

we're building up our experience with our ceph cluster before we take it
into production. I've now tried to fill up the cluster with cephfs,
which we plan to use for about 95% of all data on the cluster.

The cephfs pools are full when the cluster reports 67% raw capacity
used. There are 4 pools we use for cephfs data, 3-copy, 4-copy, EC 8+3
and EC 5+7. The balancer module is turned on and `ceph balancer eval`
gives `current cluster score 0.013255 (lower is better)`, so well within
the default 5% margin. Is there a setting we can tweak to increase the
usable RAW capacity to say 85% or 90%, or is this the most we can expect
to store on the cluster?

[root@cephmon1 ~]# ceph df
RAW STORAGE:
  CLASS SIZEAVAIL   USEDRAW USED %RAW USED
  hdd   1.8 PiB 605 TiB 1.2 PiB  1.2 PiB 66.71
  TOTAL 1.8 PiB 605 TiB 1.2 PiB  1.2 PiB

Re: [ceph-users] cephfs full, 2/3 Raw capacity used

2019-08-26 Thread Dan van der Ster
Hi,

Which version of ceph are you using? Which balancer mode?
The balancer score isn't a percent-error or anything humanly usable.
`ceph osd df tree` can better show you exactly which osds are
over/under utilized and by how much.

You might be able to manually fix things by using `ceph osd reweight
...` on the most full osds to move data elsewhere.

Otherwise, in general, its good to setup monitoring so you notice and
take action well before the osds fill up.

Cheers, Dan

On Mon, Aug 26, 2019 at 11:09 AM Simon Oosthoek
 wrote:
>
> Hi all,
>
> we're building up our experience with our ceph cluster before we take it
> into production. I've now tried to fill up the cluster with cephfs,
> which we plan to use for about 95% of all data on the cluster.
>
> The cephfs pools are full when the cluster reports 67% raw capacity
> used. There are 4 pools we use for cephfs data, 3-copy, 4-copy, EC 8+3
> and EC 5+7. The balancer module is turned on and `ceph balancer eval`
> gives `current cluster score 0.013255 (lower is better)`, so well within
> the default 5% margin. Is there a setting we can tweak to increase the
> usable RAW capacity to say 85% or 90%, or is this the most we can expect
> to store on the cluster?
>
> [root@cephmon1 ~]# ceph df
> RAW STORAGE:
>  CLASS SIZEAVAIL   USEDRAW USED %RAW USED
>  hdd   1.8 PiB 605 TiB 1.2 PiB  1.2 PiB 66.71
>  TOTAL 1.8 PiB 605 TiB 1.2 PiB  1.2 PiB 66.71
>
> POOLS:
>  POOLID STORED  OBJECTS USED
> %USED  MAX AVAIL
>  cephfs_data  1 111 MiB  79.26M 1.2 GiB
> 100.00   0 B
>  cephfs_metadata  2  52 GiB   4.91M  52 GiB
> 100.00   0 B
>  cephfs_data_4copy3 106 TiB  46.36M 428 TiB
> 100.00   0 B
>  cephfs_data_3copy8  93 TiB  42.08M 282 TiB
> 100.00   0 B
>  cephfs_data_ec8313 106 TiB  50.11M 161 TiB
> 100.00   0 B
>  rbd 14  21 GiB   5.62k  63 GiB
> 100.00   0 B
>  .rgw.root   15 1.2 KiB   4   1 MiB
> 100.00   0 B
>  default.rgw.control 16 0 B   8 0 B
>  0   0 B
>  default.rgw.meta17   765 B   4   1 MiB
> 100.00   0 B
>  default.rgw.log 18 0 B 207 0 B
>  0   0 B
>  scbench 19 133 GiB  34.14k 400 GiB
> 100.00   0 B
>  cephfs_data_ec5720 126 TiB  51.84M 320 TiB
> 100.00   0 B
> [root@cephmon1 ~]# ceph balancer eval
> current cluster score 0.013255 (lower is better)
>
>
> Being full at 2/3 Raw used is a bit too "pretty" to be accidental, it
> seems like this could be a parameter for cephfs, however, I couldn't
> find anything like this in the documentation for Nautilus.
>
>
> The logs in the dashboard show this:
> 2019-08-26 11:00:00.000630
> [ERR]
> overall HEALTH_ERR 3 backfillfull osd(s); 1 full osd(s); 12 pool(s) full
>
> 2019-08-26 10:57:44.539964
> [INF]
> Health check cleared: POOL_BACKFILLFULL (was: 12 pool(s) backfillfull)
>
> 2019-08-26 10:57:44.539944
> [WRN]
> Health check failed: 12 pool(s) full (POOL_FULL)
>
> 2019-08-26 10:57:44.539926
> [ERR]
> Health check failed: 1 full osd(s) (OSD_FULL)
>
> 2019-08-26 10:57:44.539899
> [WRN]
> Health check update: 3 backfillfull osd(s) (OSD_BACKFILLFULL)
>
> 2019-08-26 10:00:00.88
> [WRN]
> overall HEALTH_WARN 4 backfillfull osd(s); 12 pool(s) backfillfull
>
> So it seems that ceph is completely stuck at 2/3 full, while we
> anticipated being able to fill up the cluster to at least 85-90% of the
> raw capacity. Or at least so that we would keep a functioning cluster
> when we have a single osd node fail.
>
> Cheers
>
> /Simon
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] cephfs full, 2/3 Raw capacity used

2019-08-26 Thread Simon Oosthoek

Hi all,

we're building up our experience with our ceph cluster before we take it 
into production. I've now tried to fill up the cluster with cephfs, 
which we plan to use for about 95% of all data on the cluster.


The cephfs pools are full when the cluster reports 67% raw capacity 
used. There are 4 pools we use for cephfs data, 3-copy, 4-copy, EC 8+3 
and EC 5+7. The balancer module is turned on and `ceph balancer eval` 
gives `current cluster score 0.013255 (lower is better)`, so well within 
the default 5% margin. Is there a setting we can tweak to increase the 
usable RAW capacity to say 85% or 90%, or is this the most we can expect 
to store on the cluster?


[root@cephmon1 ~]# ceph df
RAW STORAGE:
CLASS SIZEAVAIL   USEDRAW USED %RAW USED
hdd   1.8 PiB 605 TiB 1.2 PiB  1.2 PiB 66.71
TOTAL 1.8 PiB 605 TiB 1.2 PiB  1.2 PiB 66.71

POOLS:
POOLID STORED  OBJECTS USED 
%USED  MAX AVAIL
cephfs_data  1 111 MiB  79.26M 1.2 GiB 
100.00   0 B
cephfs_metadata  2  52 GiB   4.91M  52 GiB 
100.00   0 B
cephfs_data_4copy3 106 TiB  46.36M 428 TiB 
100.00   0 B
cephfs_data_3copy8  93 TiB  42.08M 282 TiB 
100.00   0 B
cephfs_data_ec8313 106 TiB  50.11M 161 TiB 
100.00   0 B
rbd 14  21 GiB   5.62k  63 GiB 
100.00   0 B
.rgw.root   15 1.2 KiB   4   1 MiB 
100.00   0 B
default.rgw.control 16 0 B   8 0 B 
0   0 B
default.rgw.meta17   765 B   4   1 MiB 
100.00   0 B
default.rgw.log 18 0 B 207 0 B 
0   0 B
scbench 19 133 GiB  34.14k 400 GiB 
100.00   0 B
cephfs_data_ec5720 126 TiB  51.84M 320 TiB 
100.00   0 B

[root@cephmon1 ~]# ceph balancer eval
current cluster score 0.013255 (lower is better)


Being full at 2/3 Raw used is a bit too "pretty" to be accidental, it 
seems like this could be a parameter for cephfs, however, I couldn't 
find anything like this in the documentation for Nautilus.



The logs in the dashboard show this:
2019-08-26 11:00:00.000630
[ERR]
overall HEALTH_ERR 3 backfillfull osd(s); 1 full osd(s); 12 pool(s) full

2019-08-26 10:57:44.539964
[INF]
Health check cleared: POOL_BACKFILLFULL (was: 12 pool(s) backfillfull)

2019-08-26 10:57:44.539944
[WRN]
Health check failed: 12 pool(s) full (POOL_FULL)

2019-08-26 10:57:44.539926
[ERR]
Health check failed: 1 full osd(s) (OSD_FULL)

2019-08-26 10:57:44.539899
[WRN]
Health check update: 3 backfillfull osd(s) (OSD_BACKFILLFULL)

2019-08-26 10:00:00.88
[WRN]
overall HEALTH_WARN 4 backfillfull osd(s); 12 pool(s) backfillfull

So it seems that ceph is completely stuck at 2/3 full, while we 
anticipated being able to fill up the cluster to at least 85-90% of the 
raw capacity. Or at least so that we would keep a functioning cluster 
when we have a single osd node fail.


Cheers

/Simon
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com