Re: [ceph-users] OSD_OUT_OF_ORDER_FULL even when the ratios are in order.

2017-09-15 Thread dE .
Hi,
This's just a test cluster so, I'm just testing the relationship
between these ratios.

I did changes as such --
failsafe_full = 1 (osd failsafe full ratio = 1 in ceph.conf)
backfillfull = 0.99
nearfull = 0.95
full = 0.96

But the ceph health detail output shows a different story (different from
what I set) --
OSD_OUT_OF_ORDER_FULL full ratio(s) out of order
full_ratio (0.96) < backfillfull_ratio (0.99), increased
osd_failsafe_full_ratio (0.97) < full_ratio (0.99), increased

Also as per the documentation
,
the expected order must be, backfillfull < nearfull.

Thanks of the response!

On Fri, Sep 15, 2017 at 1:27 AM, David Turner  wrote:

> The warning you are seeing is because those settings are out of order and
> it's showing you which ones are greater than the ones they should be.
>  backfillfull_ratio is supposed to be higher than nearfull_ratio and
> osd_failsafe_full_ratio is supposed to be higher than full_ratio.
>  nearfull_ratio is a warning that shows up in your ceph status, but doesn't
> prevent anything from happening; backfillfull_ratio prevents backfilling
> from happening; and full_ratio prevents any IO from happening at all.
>
> That is the answer to your question, but below is addressing the
> ridiculous values you are trying to set those to.
>
> Why are you using such high ratios?  By default 5% of the disk is reserved
> by root for root and nobody but root.  I think that can be adjusted when
> you create the filesystem, but I am unaware if ceph-deploy does that or
> not.  But if that is the setting and if you're running your OSDs as user
> ceph (Jewel or later), then they will cap out at 95% full and the OS will
> fail to write to the OSD disk.
>
> (assuming you set your ratios in the proper order) You are leaving
> yourself no room for your cluster to recover from any sort of down osds or
> failed osds.  I don't know what disks you're using, but I don't know of any
> that are guaranteed not to fail.  If your disks can't perform any
> backfilling, then you can't recover from anything... including just
> restarting an osd daemon or a node...  Based on 97% nearfull being your
> setting... you're giving yourself a 2% warning period to add more storage
> before your cluster is incapable of receiving reads or writes.  BUT you
> also set your cluster to not be able to backfill anything if the OSD is
> over 98% full.  Those settings pretty much guarantee that you will be 100%
> stuck and unable to even add more storage to your cluster if you wait until
> your nearfull_ratio is triggered.
>
> I'm just going to say it... DON'T RUN WITH THESE SETTINGS EVER.  DON'T
> EVEN COME CLOSE TO THESE SETTINGS, THEY ARE TERRIBLE!!!
>
> 90% full_ratio is good (95% is the default) because it is a setting you
> can change and if you get into a situation where you need to recover your
> cluster and your cluster is full because of a failed node or anything, then
> you can change the full_ratio and have a chance to still recover your
> cluster.
>
> 80% nearfull_ratio is good (85% is the default) because it gives you 10%
> usable disk space for you to add more storage to your cluster or clean up
> cruft in your cluster that you don't need.  If it takes you a long time to
> get new hardware or find things to delete in your cluster, consider a lower
> number for this warning.
>
> 85% backfillfull_ratio is good (90% is the default) because of the same
> reason as full_ratio.  You can increase it if you need to for a critical
> recovery.  But with these setting a backfilling operation won't bring you
> too close to your full_ratio that you are in a high danger of blocking all
> IO to your cluster.
>
> Even if you stick with the defaults you're in a good enough situation
> where you will be most likely able to recover from most failures in your
> cluster.  But don't push them up unless you are in the middle of a
> catastrophic failure and you're doing it specifically to recover after you
> have your game-plan resolution in place.
>
>
>
> On Thu, Sep 14, 2017 at 10:03 AM Ronny Aasen 
> wrote:
>
>> On 14. sep. 2017 11:58, dE . wrote:
>> > Hi,
>> >  I got a ceph cluster where I'm getting a OSD_OUT_OF_ORDER_FULL
>> > health error, even though it appears that it is in order --
>> >
>> > full_ratio 0.99
>> > backfillfull_ratio 0.97
>> > nearfull_ratio 0.98
>> >
>> > These don't seem like a mistake to me but ceph is complaining --
>> > OSD_OUT_OF_ORDER_FULL full ratio(s) out of order
>> >  backfillfull_ratio (0.97) < nearfull_ratio (0.98), increased
>> >  osd_failsafe_full_ratio (0.97) < full_ratio (0.99), increased
>> >
>> >
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>>
>>
>>
>> post output from
>>
>> ceph osd df
>> 

Re: [ceph-users] OSD_OUT_OF_ORDER_FULL even when the ratios are in order.

2017-09-15 Thread dE .
ID CLASS WEIGHT  REWEIGHT SIZE USE  AVAIL  %USE  VAR  PGS
 8   hdd 0.18549  1.0 189G 150G 40397M 79.23 1.00   1
 1   hdd 0.18549  1.0 189G 150G 40397M 79.23 1.00   1
TOTAL 379G 300G 80794M 79.23
MIN/MAX VAR: 1.00/1.00  STDDEV: 0


On Thu, Sep 14, 2017 at 7:33 PM, Ronny Aasen 
wrote:

> On 14. sep. 2017 11:58, dE . wrote:
>
>> Hi,
>>  I got a ceph cluster where I'm getting a OSD_OUT_OF_ORDER_FULL
>> health error, even though it appears that it is in order --
>>
>> full_ratio 0.99
>> backfillfull_ratio 0.97
>> nearfull_ratio 0.98
>>
>> These don't seem like a mistake to me but ceph is complaining --
>> OSD_OUT_OF_ORDER_FULL full ratio(s) out of order
>>  backfillfull_ratio (0.97) < nearfull_ratio (0.98), increased
>>  osd_failsafe_full_ratio (0.97) < full_ratio (0.99), increased
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>
>
> post output from
>
> ceph osd df
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD_OUT_OF_ORDER_FULL even when the ratios are in order.

2017-09-14 Thread David Turner
The warning you are seeing is because those settings are out of order and
it's showing you which ones are greater than the ones they should be.
 backfillfull_ratio is supposed to be higher than nearfull_ratio and
osd_failsafe_full_ratio is supposed to be higher than full_ratio.
 nearfull_ratio is a warning that shows up in your ceph status, but doesn't
prevent anything from happening; backfillfull_ratio prevents backfilling
from happening; and full_ratio prevents any IO from happening at all.

That is the answer to your question, but below is addressing the ridiculous
values you are trying to set those to.

Why are you using such high ratios?  By default 5% of the disk is reserved
by root for root and nobody but root.  I think that can be adjusted when
you create the filesystem, but I am unaware if ceph-deploy does that or
not.  But if that is the setting and if you're running your OSDs as user
ceph (Jewel or later), then they will cap out at 95% full and the OS will
fail to write to the OSD disk.

(assuming you set your ratios in the proper order) You are leaving yourself
no room for your cluster to recover from any sort of down osds or failed
osds.  I don't know what disks you're using, but I don't know of any that
are guaranteed not to fail.  If your disks can't perform any backfilling,
then you can't recover from anything... including just restarting an osd
daemon or a node...  Based on 97% nearfull being your setting... you're
giving yourself a 2% warning period to add more storage before your cluster
is incapable of receiving reads or writes.  BUT you also set your cluster
to not be able to backfill anything if the OSD is over 98% full.  Those
settings pretty much guarantee that you will be 100% stuck and unable to
even add more storage to your cluster if you wait until your nearfull_ratio
is triggered.

I'm just going to say it... DON'T RUN WITH THESE SETTINGS EVER.  DON'T EVEN
COME CLOSE TO THESE SETTINGS, THEY ARE TERRIBLE!!!

90% full_ratio is good (95% is the default) because it is a setting you can
change and if you get into a situation where you need to recover your
cluster and your cluster is full because of a failed node or anything, then
you can change the full_ratio and have a chance to still recover your
cluster.

80% nearfull_ratio is good (85% is the default) because it gives you 10%
usable disk space for you to add more storage to your cluster or clean up
cruft in your cluster that you don't need.  If it takes you a long time to
get new hardware or find things to delete in your cluster, consider a lower
number for this warning.

85% backfillfull_ratio is good (90% is the default) because of the same
reason as full_ratio.  You can increase it if you need to for a critical
recovery.  But with these setting a backfilling operation won't bring you
too close to your full_ratio that you are in a high danger of blocking all
IO to your cluster.

Even if you stick with the defaults you're in a good enough situation where
you will be most likely able to recover from most failures in your
cluster.  But don't push them up unless you are in the middle of a
catastrophic failure and you're doing it specifically to recover after you
have your game-plan resolution in place.



On Thu, Sep 14, 2017 at 10:03 AM Ronny Aasen 
wrote:

> On 14. sep. 2017 11:58, dE . wrote:
> > Hi,
> >  I got a ceph cluster where I'm getting a OSD_OUT_OF_ORDER_FULL
> > health error, even though it appears that it is in order --
> >
> > full_ratio 0.99
> > backfillfull_ratio 0.97
> > nearfull_ratio 0.98
> >
> > These don't seem like a mistake to me but ceph is complaining --
> > OSD_OUT_OF_ORDER_FULL full ratio(s) out of order
> >  backfillfull_ratio (0.97) < nearfull_ratio (0.98), increased
> >  osd_failsafe_full_ratio (0.97) < full_ratio (0.99), increased
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
>
>
> post output from
>
> ceph osd df
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD_OUT_OF_ORDER_FULL even when the ratios are in order.

2017-09-14 Thread Ronny Aasen

On 14. sep. 2017 11:58, dE . wrote:

Hi,
 I got a ceph cluster where I'm getting a OSD_OUT_OF_ORDER_FULL 
health error, even though it appears that it is in order --


full_ratio 0.99
backfillfull_ratio 0.97
nearfull_ratio 0.98

These don't seem like a mistake to me but ceph is complaining --
OSD_OUT_OF_ORDER_FULL full ratio(s) out of order
 backfillfull_ratio (0.97) < nearfull_ratio (0.98), increased
 osd_failsafe_full_ratio (0.97) < full_ratio (0.99), increased


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





post output from

ceph osd df
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] OSD_OUT_OF_ORDER_FULL even when the ratios are in order.

2017-09-14 Thread dE .
Hi,
I got a ceph cluster where I'm getting a OSD_OUT_OF_ORDER_FULL health
error, even though it appears that it is in order --

full_ratio
0.99
backfillfull_ratio 0.97
nearfull_ratio 0.98

These don't seem like a mistake to me but ceph is complaining --
OSD_OUT_OF_ORDER_FULL full ratio(s) out of order
backfillfull_ratio (0.97) < nearfull_ratio (0.98),
increased
osd_failsafe_full_ratio (0.97) < full_ratio (0.99), increased
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com