Re: [ceph-users] pgs stuck unclean after reweight

2016-07-19 Thread M Ranga Swami Reddy
Ok...try the same with osd.32 and osd.13...one by one (do the osd.32
and wait if any rebalance happens, if no changes, then do it on
osd.13).

thanks
Swami

On Wed, Jul 20, 2016 at 11:59 AM, Goncalo Borges
 wrote:
> Hi Swami.
>
> Did not make any difference.
>
> Cheers
>
> G.
>
>
>
> On 07/20/2016 03:31 PM, M Ranga Swami Reddy wrote:
>
> can you restart osd.32 and check the status?
>
> Thanks
> Swami
>
> On Wed, Jul 20, 2016 at 9:12 AM, Goncalo Borges
>  wrote:
>
> Hi All...
>
> Today we had a warning regarding 8 near full osd. Looking to the osds
> occupation, 3 of them were above 90%. In order to solve the situation, I've
> decided to reweigh those first using
>
> ceph osd crush reweight osd.1 2.67719
>
> ceph osd crush reweight osd.26 2.67719
>
> ceph osd crush reweight osd.53 2.67719
>
> Please note that I've started with a very conservative step since the
> original weight for all osds was 2.72710.
>
> After some rebalancing (which has now stopped) I've seen that the cluster is
> currently in the following state
>
> # ceph health detail
> HEALTH_WARN 4 pgs backfill_toofull; 4 pgs stuck unclean; recovery
> 20/39433323 objects degraded (0.000%); recovery 77898/39433323 objects
> misplaced (0.198%); 8 near full osd(s); crush map has legacy tunables
> (require bobtail, min is firefly)
> pg 6.e2 is stuck unclean for 9578.920997, current state
> active+remapped+backfill_toofull, last acting [49,38,11]
> pg 6.4 is stuck unclean for 9562.054680, current state
> active+remapped+backfill_toofull, last acting [53,6,26]
> pg 5.24 is stuck unclean for 10292.469037, current state
> active+remapped+backfill_toofull, last acting [32,13,51]
> pg 5.306 is stuck unclean for 10292.448364, current state
> active+remapped+backfill_toofull, last acting [44,7,59]
> pg 5.306 is active+remapped+backfill_toofull, acting [44,7,59]
> pg 5.24 is active+remapped+backfill_toofull, acting [32,13,51]
> pg 6.4 is active+remapped+backfill_toofull, acting [53,6,26]
> pg 6.e2 is active+remapped+backfill_toofull, acting [49,38,11]
> recovery 20/39433323 objects degraded (0.000%)
> recovery 77898/39433323 objects misplaced (0.198%)
> osd.1 is near full at 88%
> osd.14 is near full at 87%
> osd.24 is near full at 86%
> osd.26 is near full at 87%
> osd.37 is near full at 87%
> osd.53 is near full at 88%
> osd.56 is near full at 85%
> osd.62 is near full at 87%
>
>crush map has legacy tunables (require bobtail, min is firefly); see
> http://ceph.com/docs/master/rados/operations/crush-map/#tunables
>
> Not sure if it is worthwhile to mention, but after upgrading to Jewel, our
> cluster shows the warnings regarding tunables. We still have not migrated to
> the optimal tunables because the cluster will be very actively used during
> the 3 next weeks ( due to one of the main conference in our area) and we
> prefer to do that migration after this peak period,
>
>
> I am unsure what happen during the rebalacing but the mapping of these 4
> stuck pgs seems strange, namely the up and acting osds are different.
>
> # ceph pg dump_stuck unclean
> ok
> pg_statstateupup_primaryactingacting_primary
> 6.e2active+remapped+backfill_toofull[8,53,38]8[49,38,11]
> 49
> 6.4active+remapped+backfill_toofull[53,24,6]53[53,6,26]
> 53
> 5.24active+remapped+backfill_toofull[32,13,56]32[32,13,51]
> 32
> 5.306active+remapped+backfill_toofull[44,60,26]44[44,7,59]
> 44
>
> # ceph pg map 6.e2
> osdmap e1054 pg 6.e2 (6.e2) -> up [8,53,38] acting [49,38,11]
>
> # ceph pg map 6.4
> osdmap e1054 pg 6.4 (6.4) -> up [53,24,6] acting [53,6,26]
>
> # ceph pg map 5.24
> osdmap e1054 pg 5.24 (5.24) -> up [32,13,56] acting [32,13,51]
>
> # ceph pg map 5.306
> osdmap e1054 pg 5.306 (5.306) -> up [44,60,26] acting [44,7,59]
>
>
> To complete this information, I am also sending the output of pg query for
> one of these problematic pgs (ceph pg  5.306 query) after this email.
>
> What should be the procedure to try to recover those PGS before continuing
> with the reweighing?
>
> Than you in advance
> Goncalo
>
>
> # ceph pg  5.306 query
> {
> "state": "active+remapped+backfill_toofull",
> "snap_trimq": "[]",
> "epoch": 1054,
> "up": [
> 44,
> 60,
> 26
> ],
> "acting": [
> 44,
> 7,
> 59
> ],
> "backfill_targets": [
> "26",
> "60"
> ],
> "actingbackfill": [
> "7",
> "26",
> "44",
> "59",
> "60"
> ],
> "info": {
> "pgid": "5.306",
> "last_update": "1005'55174",
> "last_complete": "1005'55174",
> "log_tail": "1005'52106",
> "last_user_version": 55174,
> "last_backfill": "MAX",
> "last_backfill_bitwise": 0,
> "purged_snaps": "[]",
> "history": {
> "epoch_created": 339,
> "last_epoch_started": 1016,
> "last_epoch_clean": 996,
> 

Re: [ceph-users] pgs stuck unclean after reweight

2016-07-19 Thread Goncalo Borges

Hi Swami.

Did not make any difference.

Cheers

G.



On 07/20/2016 03:31 PM, M Ranga Swami Reddy wrote:

can you restart osd.32 and check the status?

Thanks
Swami

On Wed, Jul 20, 2016 at 9:12 AM, Goncalo Borges
 wrote:

Hi All...

Today we had a warning regarding 8 near full osd. Looking to the osds
occupation, 3 of them were above 90%. In order to solve the situation, I've
decided to reweigh those first using

 ceph osd crush reweight osd.1 2.67719

 ceph osd crush reweight osd.26 2.67719

 ceph osd crush reweight osd.53 2.67719

Please note that I've started with a very conservative step since the
original weight for all osds was 2.72710.

After some rebalancing (which has now stopped) I've seen that the cluster is
currently in the following state

# ceph health detail
HEALTH_WARN 4 pgs backfill_toofull; 4 pgs stuck unclean; recovery
20/39433323 objects degraded (0.000%); recovery 77898/39433323 objects
misplaced (0.198%); 8 near full osd(s); crush map has legacy tunables
(require bobtail, min is firefly)
pg 6.e2 is stuck unclean for 9578.920997, current state
active+remapped+backfill_toofull, last acting [49,38,11]
pg 6.4 is stuck unclean for 9562.054680, current state
active+remapped+backfill_toofull, last acting [53,6,26]
pg 5.24 is stuck unclean for 10292.469037, current state
active+remapped+backfill_toofull, last acting [32,13,51]
pg 5.306 is stuck unclean for 10292.448364, current state
active+remapped+backfill_toofull, last acting [44,7,59]
pg 5.306 is active+remapped+backfill_toofull, acting [44,7,59]
pg 5.24 is active+remapped+backfill_toofull, acting [32,13,51]
pg 6.4 is active+remapped+backfill_toofull, acting [53,6,26]
pg 6.e2 is active+remapped+backfill_toofull, acting [49,38,11]
recovery 20/39433323 objects degraded (0.000%)
recovery 77898/39433323 objects misplaced (0.198%)
osd.1 is near full at 88%
osd.14 is near full at 87%
osd.24 is near full at 86%
osd.26 is near full at 87%
osd.37 is near full at 87%
osd.53 is near full at 88%
osd.56 is near full at 85%
osd.62 is near full at 87%

crush map has legacy tunables (require bobtail, min is firefly); see
http://ceph.com/docs/master/rados/operations/crush-map/#tunables

Not sure if it is worthwhile to mention, but after upgrading to Jewel, our
cluster shows the warnings regarding tunables. We still have not migrated to
the optimal tunables because the cluster will be very actively used during
the 3 next weeks ( due to one of the main conference in our area) and we
prefer to do that migration after this peak period,


I am unsure what happen during the rebalacing but the mapping of these 4
stuck pgs seems strange, namely the up and acting osds are different.

# ceph pg dump_stuck unclean
ok
pg_statstateupup_primaryactingacting_primary
6.e2active+remapped+backfill_toofull[8,53,38]8[49,38,11]
49
6.4active+remapped+backfill_toofull[53,24,6]53[53,6,26]
53
5.24active+remapped+backfill_toofull[32,13,56]32[32,13,51]
32
5.306active+remapped+backfill_toofull[44,60,26]44[44,7,59]
44

# ceph pg map 6.e2
osdmap e1054 pg 6.e2 (6.e2) -> up [8,53,38] acting [49,38,11]

# ceph pg map 6.4
osdmap e1054 pg 6.4 (6.4) -> up [53,24,6] acting [53,6,26]

# ceph pg map 5.24
osdmap e1054 pg 5.24 (5.24) -> up [32,13,56] acting [32,13,51]

# ceph pg map 5.306
osdmap e1054 pg 5.306 (5.306) -> up [44,60,26] acting [44,7,59]


To complete this information, I am also sending the output of pg query for
one of these problematic pgs (ceph pg  5.306 query) after this email.

What should be the procedure to try to recover those PGS before continuing
with the reweighing?

Than you in advance
Goncalo


# ceph pg  5.306 query
{
 "state": "active+remapped+backfill_toofull",
 "snap_trimq": "[]",
 "epoch": 1054,
 "up": [
 44,
 60,
 26
 ],
 "acting": [
 44,
 7,
 59
 ],
 "backfill_targets": [
 "26",
 "60"
 ],
 "actingbackfill": [
 "7",
 "26",
 "44",
 "59",
 "60"
 ],
 "info": {
 "pgid": "5.306",
 "last_update": "1005'55174",
 "last_complete": "1005'55174",
 "log_tail": "1005'52106",
 "last_user_version": 55174,
 "last_backfill": "MAX",
 "last_backfill_bitwise": 0,
 "purged_snaps": "[]",
 "history": {
 "epoch_created": 339,
 "last_epoch_started": 1016,
 "last_epoch_clean": 996,
 "last_epoch_split": 0,
 "last_epoch_marked_full": 0,
 "same_up_since": 1015,
 "same_interval_since": 1015,
 "same_primary_since": 928,
 "last_scrub": "1005'55169",
 "last_scrub_stamp": "2016-07-19 14:31:45.790871",
 "last_deep_scrub": "1005'55169",
 "last_deep_scrub_stamp": "2016-07-19 14:31:45.790871",
 "last_clean_scrub_stamp": "20

[ceph-users] CephFS Samba VFS RHEL packages

2016-07-19 Thread Blair Bethwaite
Hi all,

We've started a CephFS Samba PoC on RHEL but just noticed the Samba
Ceph VFS doesn't seem to be included with Samba on RHEL, or we're not
looking in the right place. Trying to avoid needing to build Samba
from source if possible. Any pointers appreciated.

-- 
Cheers,
~Blairo
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] pgs stuck unclean after reweight

2016-07-19 Thread M Ranga Swami Reddy
can you restart osd.32 and check the status?

Thanks
Swami

On Wed, Jul 20, 2016 at 9:12 AM, Goncalo Borges
 wrote:
> Hi All...
>
> Today we had a warning regarding 8 near full osd. Looking to the osds
> occupation, 3 of them were above 90%. In order to solve the situation, I've
> decided to reweigh those first using
>
> ceph osd crush reweight osd.1 2.67719
>
> ceph osd crush reweight osd.26 2.67719
>
> ceph osd crush reweight osd.53 2.67719
>
> Please note that I've started with a very conservative step since the
> original weight for all osds was 2.72710.
>
> After some rebalancing (which has now stopped) I've seen that the cluster is
> currently in the following state
>
> # ceph health detail
> HEALTH_WARN 4 pgs backfill_toofull; 4 pgs stuck unclean; recovery
> 20/39433323 objects degraded (0.000%); recovery 77898/39433323 objects
> misplaced (0.198%); 8 near full osd(s); crush map has legacy tunables
> (require bobtail, min is firefly)
> pg 6.e2 is stuck unclean for 9578.920997, current state
> active+remapped+backfill_toofull, last acting [49,38,11]
> pg 6.4 is stuck unclean for 9562.054680, current state
> active+remapped+backfill_toofull, last acting [53,6,26]
> pg 5.24 is stuck unclean for 10292.469037, current state
> active+remapped+backfill_toofull, last acting [32,13,51]
> pg 5.306 is stuck unclean for 10292.448364, current state
> active+remapped+backfill_toofull, last acting [44,7,59]
> pg 5.306 is active+remapped+backfill_toofull, acting [44,7,59]
> pg 5.24 is active+remapped+backfill_toofull, acting [32,13,51]
> pg 6.4 is active+remapped+backfill_toofull, acting [53,6,26]
> pg 6.e2 is active+remapped+backfill_toofull, acting [49,38,11]
> recovery 20/39433323 objects degraded (0.000%)
> recovery 77898/39433323 objects misplaced (0.198%)
> osd.1 is near full at 88%
> osd.14 is near full at 87%
> osd.24 is near full at 86%
> osd.26 is near full at 87%
> osd.37 is near full at 87%
> osd.53 is near full at 88%
> osd.56 is near full at 85%
> osd.62 is near full at 87%
>
>crush map has legacy tunables (require bobtail, min is firefly); see
> http://ceph.com/docs/master/rados/operations/crush-map/#tunables
>
> Not sure if it is worthwhile to mention, but after upgrading to Jewel, our
> cluster shows the warnings regarding tunables. We still have not migrated to
> the optimal tunables because the cluster will be very actively used during
> the 3 next weeks ( due to one of the main conference in our area) and we
> prefer to do that migration after this peak period,
>
>
> I am unsure what happen during the rebalacing but the mapping of these 4
> stuck pgs seems strange, namely the up and acting osds are different.
>
> # ceph pg dump_stuck unclean
> ok
> pg_statstateupup_primaryactingacting_primary
> 6.e2active+remapped+backfill_toofull[8,53,38]8[49,38,11]
> 49
> 6.4active+remapped+backfill_toofull[53,24,6]53[53,6,26]
> 53
> 5.24active+remapped+backfill_toofull[32,13,56]32[32,13,51]
> 32
> 5.306active+remapped+backfill_toofull[44,60,26]44[44,7,59]
> 44
>
> # ceph pg map 6.e2
> osdmap e1054 pg 6.e2 (6.e2) -> up [8,53,38] acting [49,38,11]
>
> # ceph pg map 6.4
> osdmap e1054 pg 6.4 (6.4) -> up [53,24,6] acting [53,6,26]
>
> # ceph pg map 5.24
> osdmap e1054 pg 5.24 (5.24) -> up [32,13,56] acting [32,13,51]
>
> # ceph pg map 5.306
> osdmap e1054 pg 5.306 (5.306) -> up [44,60,26] acting [44,7,59]
>
>
> To complete this information, I am also sending the output of pg query for
> one of these problematic pgs (ceph pg  5.306 query) after this email.
>
> What should be the procedure to try to recover those PGS before continuing
> with the reweighing?
>
> Than you in advance
> Goncalo
>
>
> # ceph pg  5.306 query
> {
> "state": "active+remapped+backfill_toofull",
> "snap_trimq": "[]",
> "epoch": 1054,
> "up": [
> 44,
> 60,
> 26
> ],
> "acting": [
> 44,
> 7,
> 59
> ],
> "backfill_targets": [
> "26",
> "60"
> ],
> "actingbackfill": [
> "7",
> "26",
> "44",
> "59",
> "60"
> ],
> "info": {
> "pgid": "5.306",
> "last_update": "1005'55174",
> "last_complete": "1005'55174",
> "log_tail": "1005'52106",
> "last_user_version": 55174,
> "last_backfill": "MAX",
> "last_backfill_bitwise": 0,
> "purged_snaps": "[]",
> "history": {
> "epoch_created": 339,
> "last_epoch_started": 1016,
> "last_epoch_clean": 996,
> "last_epoch_split": 0,
> "last_epoch_marked_full": 0,
> "same_up_since": 1015,
> "same_interval_since": 1015,
> "same_primary_since": 928,
> "last_scrub": "1005'55169",
> "last_scrub_stamp": "2016-07-19 14:31:45.790871",
> "last_deep_scrub": "1005'55169",
>   

Re: [ceph-users] pgs stuck unclean after reweight

2016-07-19 Thread Goncalo Borges

I think I understood the source of the problem:

1. This is the original pg mapping before reweighing:
# egrep "(^6.e2\s|^6.4\s|^5.24\s|^5.306\s)" /tmp/pg_dump.1

6.e21273200004539146855330843084 
active+clean2016-07-19 19:06:56.6221851005'234027 1005:2817269 
*[49,11,38] *   49[49,11,38]49 1005'2339182016-07-19 
19:06:56.6221111005'231243201
6.41278700004577483565930053005 
active+clean2016-07-19 19:51:20.2852391005'128200 1005:669092 
*[53,26,6] *   53[53,26,6]53 1005'1281282016-07-19 
19:51:20.2851321005'125553 2016-07-18 18:59:22.257745
5.242230000030333033 active+clean
2016-07-19 18:54:29.4506531005'62360 1005:48062[32,13,51]32 
*[32,13,51]*32 1005'623602016-07-19 18:54:29.450514
1005'623602016-07-19 18:54:29.450514
5.3062300000030683068 active+clean
2016-07-19 14:31:45.7911941005'55174 1005:39712[44,7,59]44 
*[44,7,59]*44 1005'551692016-07-19 14:31:45.790871
1005'551692016-07-19 14:31:45.790871



2./ This is the pg dump after reweighting:

# ceph pg dump_stuck unclean
ok
pg_statstateupup_primaryactingacting_primary
6.e2active+remapped+backfill_toofull[8,53,38]8 [49,38,11]49
6.4active+remapped+backfill_toofull[53,24,6]53 [53,6,26]53
5.24active+remapped+backfill_toofull[32,13,56]32 
[32,13,51]32
5.306active+remapped+backfill_toofull[44,60,26]44 
[44,7,59]44


If you look carefully to the last dump, the _acting _OSDs are the 
original ones before reweighting, and the _up_ osds are (i guess) the 
new ones it tries to use. However,  in this new _up_ set, there is 
always one osd with the near full message.


Maybe that is why re balancing is on hold?

Maybe if I increase the thresold for the warning the rebalance will restart?

Cheers
G.


On 07/20/2016 01:42 PM, Goncalo Borges wrote:


Hi All...

Today we had a warning regarding 8 near full osd. Looking to the osds 
occupation, 3 of them were above 90%. In order to solve the situation, 
I've decided to reweigh those first using


ceph osd crush reweight osd.1 2.67719

ceph osd crush reweight osd.26 2.67719

ceph osd crush reweight osd.53 2.67719

Please note that I've started with a very conservative step since the 
original weight for all osds was 2.72710.


After some rebalancing (which has now stopped) I've seen that the 
cluster is currently in the following state


# ceph health detail
HEALTH_WARN 4 pgs backfill_toofull; 4 pgs stuck unclean; recovery
20/39433323 objects degraded (0.000%); recovery 77898/39433323
objects misplaced (0.198%); 8 near full osd(s); crush map has
legacy tunables (require bobtail, min is firefly)
pg 6.e2 is stuck unclean for 9578.920997, current state
active+remapped+backfill_toofull, last acting [49,38,11]
pg 6.4 is stuck unclean for 9562.054680, current state
active+remapped+backfill_toofull, last acting [53,6,26]
pg 5.24 is stuck unclean for 10292.469037, current state
active+remapped+backfill_toofull, last acting [32,13,51]
pg 5.306 is stuck unclean for 10292.448364, current state
active+remapped+backfill_toofull, last acting [44,7,59]
pg 5.306 is active+remapped+backfill_toofull, acting [44,7,59]
pg 5.24 is active+remapped+backfill_toofull, acting [32,13,51]
pg 6.4 is active+remapped+backfill_toofull, acting [53,6,26]
pg 6.e2 is active+remapped+backfill_toofull, acting [49,38,11]
recovery 20/39433323 objects degraded (0.000%)
recovery 77898/39433323 objects misplaced (0.198%)
osd.1 is near full at 88%
osd.14 is near full at 87%
osd.24 is near full at 86%
osd.26 is near full at 87%
osd.37 is near full at 87%
osd.53 is near full at 88%
osd.56 is near full at 85%
osd.62 is near full at 87%

   crush map has legacy tunables (require bobtail, min is 
firefly); see 
http://ceph.com/docs/master/rados/operations/crush-map/#tunables


Not sure if it is worthwhile to mention, but after upgrading to Jewel, 
our cluster shows the warnings regarding tunables. We still have not 
migrated to the optimal tunables because the cluster will be very 
actively used during the 3 next weeks ( due to one of the main 
conference in our area) and we prefer to do that migration after this 
peak period,



I am unsure what happen during the rebalacing but the mapping of these 
4 stuck pgs seems strange, namely the up and acting osds are different.


# ceph pg dump_stuck unclean
ok
pg_statstateupup_primaryactingacting_primary
6.e2active+remapped+backfill_toofull[8,53,38]8
[49,38,11]49
6.4active+remapped+backfill_toofull[53,24,6]53
[53,6,26]53
5.24active+remapped+backfill_toofull[32,13,56]32
[32,13,51]   

Re: [ceph-users] pgs stuck unclean after reweight

2016-07-19 Thread Goncalo Borges

Hi KK...

Thanks. I did set 'sortbitwise flag' since that was mentioned in the 
release notes.


However I do not understand how this relates to this problem.

Can you give a bit more info?

Cheers and Thanks

Goncalo


On 07/20/2016 02:10 PM, K K wrote:


Hi, Goncalo.

Do you set sortbitwise flag during update? If YES, try to unset it.  I 
faced with same problem when upgrade my ceph cluster from Hummer to 
Jewel. Maybe it's your: http://tracker.ceph.com/issues/16113


Среда, 20 июля 2016, 8:42 +05:00 от Goncalo Borges
:

Hi All...

Today we had a warning regarding 8 near full osd. Looking to the
osds occupation, 3 of them were above 90%. In order to solve the
situation, I've decided to reweigh those first using

ceph osd crush reweight osd.1 2.67719

ceph osd crush reweight osd.26 2.67719

ceph osd crush reweight osd.53 2.67719

Please note that I've started with a very conservative step since
the original weight for all osds was 2.72710.

After some rebalancing (which has now stopped) I've seen that the
cluster is currently in the following state

# ceph health detail
HEALTH_WARN 4 pgs backfill_toofull; 4 pgs stuck unclean;
recovery 20/39433323 objects degraded (0.000%); recovery
77898/39433323 objects misplaced (0.198%); 8 near full osd(s);
crush map has legacy tunables (require bobtail, min is firefly)
pg 6.e2 is stuck unclean for 9578.920997, current state
active+remapped+backfill_toofull, last acting [49,38,11]
pg 6.4 is stuck unclean for 9562.054680, current state
active+remapped+backfill_toofull, last acting [53,6,26]
pg 5.24 is stuck unclean for 10292.469037, current state
active+remapped+backfill_toofull, last acting [32,13,51]
pg 5.306 is stuck unclean for 10292.448364, current state
active+remapped+backfill_toofull, last acting [44,7,59]
pg 5.306 is active+remapped+backfill_toofull, acting [44,7,59]
pg 5.24 is active+remapped+backfill_toofull, acting [32,13,51]
pg 6.4 is active+remapped+backfill_toofull, acting [53,6,26]
pg 6.e2 is active+remapped+backfill_toofull, acting [49,38,11]
recovery 20/39433323 objects degraded (0.000%)
recovery 77898/39433323 objects misplaced (0.198%)
osd.1 is near full at 88%
osd.14 is near full at 87%
osd.24 is near full at 86%
osd.26 is near full at 87%
osd.37 is near full at 87%
osd.53 is near full at 88%
osd.56 is near full at 85%
osd.62 is near full at 87%

   crush map has legacy tunables (require bobtail, min is
firefly); see
http://ceph.com/docs/master/rados/operations/crush-map/#tunables

Not sure if it is worthwhile to mention, but after upgrading to
Jewel, our cluster shows the warnings regarding tunables. We still
have not migrated to the optimal tunables because the cluster will
be very actively used during the 3 next weeks ( due to one of the
main conference in our area) and we prefer to do that migration
after this peak period,


I am unsure what happen during the rebalacing but the mapping of
these 4 stuck pgs seems strange, namely the up and acting osds are
different.

# ceph pg dump_stuck unclean
ok
pg_statstateupup_primaryacting acting_primary
6.e2active+remapped+backfill_toofull [8,53,38]8   
[49,38,11]49
6.4active+remapped+backfill_toofull [53,24,6]53   
[53,6,26]53
5.24active+remapped+backfill_toofull [32,13,56]32   
[32,13,51]32
5.306active+remapped+backfill_toofull [44,60,26]44   
[44,7,59]44


# ceph pg map 6.e2
osdmap e1054 pg 6.e2 (6.e2) -> up [8,53,38] acting [49,38,11]

# ceph pg map 6.4
osdmap e1054 pg 6.4 (6.4) -> up [53,24,6] acting [53,6,26]

# ceph pg map 5.24
osdmap e1054 pg 5.24 (5.24) -> up [32,13,56] acting [32,13,51]

# ceph pg map 5.306
osdmap e1054 pg 5.306 (5.306) -> up [44,60,26] acting [44,7,59]


To complete this information, I am also sending the output of pg
query for one of these problematic pgs (ceph pg  5.306 query)
after this email.

What should be the procedure to try to recover those PGS before
continuing with the reweighing?

Than you in advance
Goncalo


# ceph pg  5.306 query
{
"state": "active+remapped+backfill_toofull",
"snap_trimq": "[]",
"epoch": 1054,
"up": [
44,
60,
26
],
"acting": [
44,
7,
59
],
"backfill_targets": [
"26",
"60"
],
"actingbackfill": [
"7",
"26",
"44",
"59",
"60"
],
"

[ceph-users] pgs stuck unclean after reweight

2016-07-19 Thread Goncalo Borges

Hi All...

Today we had a warning regarding 8 near full osd. Looking to the osds 
occupation, 3 of them were above 90%. In order to solve the situation, 
I've decided to reweigh those first using


ceph osd crush reweight osd.1 2.67719

ceph osd crush reweight osd.26 2.67719

ceph osd crush reweight osd.53 2.67719

Please note that I've started with a very conservative step since the 
original weight for all osds was 2.72710.


After some rebalancing (which has now stopped) I've seen that the 
cluster is currently in the following state


   # ceph health detail
   HEALTH_WARN 4 pgs backfill_toofull; 4 pgs stuck unclean; recovery
   20/39433323 objects degraded (0.000%); recovery 77898/39433323
   objects misplaced (0.198%); 8 near full osd(s); crush map has legacy
   tunables (require bobtail, min is firefly)
   pg 6.e2 is stuck unclean for 9578.920997, current state
   active+remapped+backfill_toofull, last acting [49,38,11]
   pg 6.4 is stuck unclean for 9562.054680, current state
   active+remapped+backfill_toofull, last acting [53,6,26]
   pg 5.24 is stuck unclean for 10292.469037, current state
   active+remapped+backfill_toofull, last acting [32,13,51]
   pg 5.306 is stuck unclean for 10292.448364, current state
   active+remapped+backfill_toofull, last acting [44,7,59]
   pg 5.306 is active+remapped+backfill_toofull, acting [44,7,59]
   pg 5.24 is active+remapped+backfill_toofull, acting [32,13,51]
   pg 6.4 is active+remapped+backfill_toofull, acting [53,6,26]
   pg 6.e2 is active+remapped+backfill_toofull, acting [49,38,11]
   recovery 20/39433323 objects degraded (0.000%)
   recovery 77898/39433323 objects misplaced (0.198%)
   osd.1 is near full at 88%
   osd.14 is near full at 87%
   osd.24 is near full at 86%
   osd.26 is near full at 87%
   osd.37 is near full at 87%
   osd.53 is near full at 88%
   osd.56 is near full at 85%
   osd.62 is near full at 87%

   crush map has legacy tunables (require bobtail, min is firefly); 
see http://ceph.com/docs/master/rados/operations/crush-map/#tunables


Not sure if it is worthwhile to mention, but after upgrading to Jewel, 
our cluster shows the warnings regarding tunables. We still have not 
migrated to the optimal tunables because the cluster will be very 
actively used during the 3 next weeks ( due to one of the main 
conference in our area) and we prefer to do that migration after this 
peak period,



I am unsure what happen during the rebalacing but the mapping of these 4 
stuck pgs seems strange, namely the up and acting osds are different.


   # ceph pg dump_stuck unclean
   ok
   pg_statstateupup_primaryactingacting_primary
   6.e2active+remapped+backfill_toofull[8,53,38]8
   [49,38,11]49
   6.4active+remapped+backfill_toofull[53,24,6]53
   [53,6,26]53
   5.24active+remapped+backfill_toofull[32,13,56]32
   [32,13,51]32
   5.306active+remapped+backfill_toofull[44,60,26]44
   [44,7,59]44

   # ceph pg map 6.e2
   osdmap e1054 pg 6.e2 (6.e2) -> up [8,53,38] acting [49,38,11]

   # ceph pg map 6.4
   osdmap e1054 pg 6.4 (6.4) -> up [53,24,6] acting [53,6,26]

   # ceph pg map 5.24
   osdmap e1054 pg 5.24 (5.24) -> up [32,13,56] acting [32,13,51]

   # ceph pg map 5.306
   osdmap e1054 pg 5.306 (5.306) -> up [44,60,26] acting [44,7,59]


To complete this information, I am also sending the output of pg query 
for one of these problematic pgs (ceph pg  5.306 query) after this email.


What should be the procedure to try to recover those PGS before 
continuing with the reweighing?


Than you in advance
Goncalo


# ceph pg  5.306 query
{
"state": "active+remapped+backfill_toofull",
"snap_trimq": "[]",
"epoch": 1054,
"up": [
44,
60,
26
],
"acting": [
44,
7,
59
],
"backfill_targets": [
"26",
"60"
],
"actingbackfill": [
"7",
"26",
"44",
"59",
"60"
],
"info": {
"pgid": "5.306",
"last_update": "1005'55174",
"last_complete": "1005'55174",
"log_tail": "1005'52106",
"last_user_version": 55174,
"last_backfill": "MAX",
"last_backfill_bitwise": 0,
"purged_snaps": "[]",
"history": {
"epoch_created": 339,
"last_epoch_started": 1016,
"last_epoch_clean": 996,
"last_epoch_split": 0,
"last_epoch_marked_full": 0,
"same_up_since": 1015,
"same_interval_since": 1015,
"same_primary_since": 928,
"last_scrub": "1005'55169",
"last_scrub_stamp": "2016-07-19 14:31:45.790871",
"last_deep_scrub": "1005'55169",
"last_deep_scrub_stamp": "2016-07-19 14:31:45.790871",
"last_clean_scrub_stamp": "2016-07-19 14:31:45.790871"
},
"stats": {
"version": "1005'55174",
"reported_s

Re: [ceph-users] how to use cache tiering with proxy in ceph-10.2.2

2016-07-19 Thread m13913886148
But the 0.94 version works fine(In fact, IO was greatly improved).
This problem occurs only in version 10.x.
Like you said that the IO was going to the cold storage mostly .  And IO is 
going slowly.what can I do to improve IO performance of cache tiering in 
version 10.x ? How does cache tiering works in version 10.x ?is it a bug? Or 
configure are very different 0.94 version ?Too few  information in Official 
website about this.

 

On Tuesday, July 19, 2016 9:25 PM, Christian Balzer  wrote:
 

 
Hello,

On Tue, 19 Jul 2016 12:24:01 +0200 Oliver Dzombic wrote:

> Hi,
> 
> i have in my ceph.conf under [OSD] Section:
> 
> osd_tier_promote_max_bytes_sec = 1610612736
> osd_tier_promote_max_objects_sec = 2
> 
> #ceph --show-config is showing:
> 
> osd_tier_promote_max_objects_sec = 5242880
> osd_tier_promote_max_bytes_sec = 25
> 
> But in fact its working. Maybe some Bug in showing the correct value.
> 
> I had Problems too, that the IO was going to the cold storage mostly.
> 
> After i changed this values ( and restarted >every< node inside the
> cluster ) the problem was gone.
> 
> So i assume, that its simply showing the wrong values if you call the
> show-config. Or there is some other miracle going on.
> 
> I just checked:
> 
> #ceph --show-config | grep osd_tier
> 
> shows:
> 
> osd_tier_default_cache_hit_set_count = 4
> osd_tier_default_cache_hit_set_period = 1200
> 
> while
> 
> #ceph osd pool get ssd_cache hit_set_count
> #ceph osd pool get ssd_cache hit_set_period
> 
> show
> 
> hit_set_count: 1
> hit_set_period: 120
> 
Apples and oranges.

Your first query is about the config (and thus default, as it says in the
output) options, the second one is for a specific pool.

There might be still any sorts of breakage with show-config and having to
restart OSDs to have changes take effect is inelegant at least, but the
above is not a bug.

Christian

> 
> So you can obviously ignore the ceph --show-config command. Its simply
> not working correctly.
> 
> 


-- 
Christian Balzer        Network/Systems Engineer                
ch...@gol.com      Global OnLine Japan/Rakuten Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


  ___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Too much pgs backfilling

2016-07-19 Thread Somnath Roy
The settings are per OSD and the messages you are seeing aggregated on the 
cluster with multiple OSDs doing backfill (working on multiple PGs in 
parallel)..

Thanks & Regards
Somnath

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Jimmy 
Goffaux
Sent: Tuesday, July 19, 2016 5:19 PM
To: ceph-users@lists.ceph.com
Subject: [ceph-users] Too much pgs backfilling


Hello,



This is my configuration :

-> "osd_max_backfills": "1",
-> "osd_recovery_threads": "1"
->  "osd_recovery_max_active": "1",
-> "osd_recovery_op_priority": "3",

-> "osd_client_op_priority": "63",



I have run command :  ceph osd crush tunables optimal

After  upgrade Hammer to Jewel.



My cluster is overloaded on : pgs backfilling  : 15  
active+remapped+backfilling . ..



Why 15 ? My configuration is bad ? normally I should have max 1



Thanks

PLEASE NOTE: The information contained in this electronic mail message is 
intended only for the use of the designated recipient(s) named above. If the 
reader of this message is not the intended recipient, you are hereby notified 
that you have received this message in error and that any review, 
dissemination, distribution, or copying of this message is strictly prohibited. 
If you have received this communication in error, please notify the sender by 
telephone or e-mail (as shown above) immediately and destroy any and all copies 
of this message in your possession (whether hard copies or electronically 
stored copies).
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Too much pgs backfilling

2016-07-19 Thread Jimmy Goffaux
 

Hello, 

This is my configuration : 

-> "osd_max_backfills":
"1",
-> "osd_recovery_threads": "1"
-> "osd_recovery_max_active":
"1",
-> "osd_recovery_op_priority": "3", 

-> "osd_client_op_priority":
"63", 

I have run command : ceph osd crush tunables optimal 

After
upgrade Hammer to Jewel. 

My cluster is overloaded on : pgs backfilling
: 15 active+remapped+backfilling . .. 

Why 15 ? My configuration is bad
? normally I should have max 1

Thanks

 ___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Multi-device BlueStore testing

2016-07-19 Thread Somnath Roy
I don't think ceph-disk has support for separating block.db and block.wal yet 
(?).
You need to create the cluster manually by running mkfs.
Or if you have old mkcephfs script (which sadly deprecated) you can point the 
db / wal path and it will create cluster for you. I am using that to configure 
bluestore on multiple devices.
Alternatively, vstart.sh also has support for multiple device bluestore config 
I believe.

Thanks & Regards
Somnath

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
Stillwell, Bryan J
Sent: Tuesday, July 19, 2016 3:36 PM
To: ceph-users@lists.ceph.com
Subject: [ceph-users] Multi-device BlueStore testing

I would like to do some BlueStore testing using multiple devices like mentioned 
here:

https://www.sebastien-han.fr/blog/2016/05/04/Ceph-Jewel-configure-BlueStore-with-multiple-devices/

However, simply creating the block.db and block.wal symlinks and pointing them 
at empty partitions doesn't appear to be enough:

2016-07-19 21:30:15.717827 7f48ec4d9800  1 bluestore(/var/lib/ceph/osd/ceph-0) 
mount path /var/lib/ceph/osd/ceph-0
2016-07-19 21:30:15.717855 7f48ec4d9800  1 bluestore(/var/lib/ceph/osd/ceph-0) 
fsck
2016-07-19 21:30:15.717869 7f48ec4d9800  1 bdev create path 
/var/lib/ceph/osd/ceph-0/block type kernel
2016-07-19 21:30:15.718367 7f48ec4d9800  1 bdev(/var/lib/ceph/osd/ceph-0/block) 
open path /var/lib/ceph/osd/ceph-0/block
2016-07-19 21:30:15.718462 7f48ec4d9800  1 bdev(/var/lib/ceph/osd/ceph-0/block) 
open size 6001069202944 (5588 GB) block_size 4096 (4096 B)
2016-07-19 21:30:15.718786 7f48ec4d9800  1 bdev create path 
/var/lib/ceph/osd/ceph-0/block.db type kernel
2016-07-19 21:30:15.719305 7f48ec4d9800  1 
bdev(/var/lib/ceph/osd/ceph-0/block.db) open path 
/var/lib/ceph/osd/ceph-0/block.db
2016-07-19 21:30:15.719388 7f48ec4d9800  1 
bdev(/var/lib/ceph/osd/ceph-0/block.db) open size 1023410176 (976 MB) 
block_size 4096 (4096 B)
2016-07-19 21:30:15.719394 7f48ec4d9800  1 bluefs add_block_device bdev 1 path 
/var/lib/ceph/osd/ceph-0/block.db size 976 MB
2016-07-19 21:30:15.719586 7f48ec4d9800 -1 
bluestore(/var/lib/ceph/osd/ceph-0/block.db) _read_bdev_label unable to decode 
label at offset 66: buffer::malformed_input: void 
bluestore_bdev_label_t::decode(ceph::buffer::list::iterator&) decode past end 
of struct encoding
2016-07-19 21:30:15.719597 7f48ec4d9800 -1 bluestore(/var/lib/ceph/osd/ceph-0) 
_open_db check block device(/var/lib/ceph/osd/ceph-0/block.db) label returned: 
(22) Invalid argument
2016-07-19 21:30:15.719602 7f48ec4d9800  1 
bdev(/var/lib/ceph/osd/ceph-0/block.db) close
2016-07-19 21:30:15.999311 7f48ec4d9800  1 bdev(/var/lib/ceph/osd/ceph-0/block) 
close
2016-07-19 21:30:16.243312 7f48ec4d9800 -1 osd.0 0 OSD:init: unable to mount 
object store

I originally used 'ceph-disk prepare --bluestore' to create the OSD, but I feel 
like there is some kind of initialization step I need to do when moving the db 
and wal over to an NVMe device.  My google searches just aren't turning up 
much.  Could someone point me in the right direction?

Thanks,
Bryan
PLEASE NOTE: The information contained in this electronic mail message is 
intended only for the use of the designated recipient(s) named above. If the 
reader of this message is not the intended recipient, you are hereby notified 
that you have received this message in error and that any review, 
dissemination, distribution, or copying of this message is strictly prohibited. 
If you have received this communication in error, please notify the sender by 
telephone or e-mail (as shown above) immediately and destroy any and all copies 
of this message in your possession (whether hard copies or electronically 
stored copies).
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Multi-device BlueStore testing

2016-07-19 Thread Stillwell, Bryan J
I would like to do some BlueStore testing using multiple devices like mentioned 
here:

https://www.sebastien-han.fr/blog/2016/05/04/Ceph-Jewel-configure-BlueStore-with-multiple-devices/

However, simply creating the block.db and block.wal symlinks and pointing them 
at empty partitions doesn't appear to be enough:

2016-07-19 21:30:15.717827 7f48ec4d9800  1 bluestore(/var/lib/ceph/osd/ceph-0) 
mount path /var/lib/ceph/osd/ceph-0
2016-07-19 21:30:15.717855 7f48ec4d9800  1 bluestore(/var/lib/ceph/osd/ceph-0) 
fsck
2016-07-19 21:30:15.717869 7f48ec4d9800  1 bdev create path 
/var/lib/ceph/osd/ceph-0/block type kernel
2016-07-19 21:30:15.718367 7f48ec4d9800  1 bdev(/var/lib/ceph/osd/ceph-0/block) 
open path /var/lib/ceph/osd/ceph-0/block
2016-07-19 21:30:15.718462 7f48ec4d9800  1 bdev(/var/lib/ceph/osd/ceph-0/block) 
open size 6001069202944 (5588 GB) block_size 4096 (4096 B)
2016-07-19 21:30:15.718786 7f48ec4d9800  1 bdev create path 
/var/lib/ceph/osd/ceph-0/block.db type kernel
2016-07-19 21:30:15.719305 7f48ec4d9800  1 
bdev(/var/lib/ceph/osd/ceph-0/block.db) open path 
/var/lib/ceph/osd/ceph-0/block.db
2016-07-19 21:30:15.719388 7f48ec4d9800  1 
bdev(/var/lib/ceph/osd/ceph-0/block.db) open size 1023410176 (976 MB) 
block_size 4096 (4096 B)
2016-07-19 21:30:15.719394 7f48ec4d9800  1 bluefs add_block_device bdev 1 path 
/var/lib/ceph/osd/ceph-0/block.db size 976 MB
2016-07-19 21:30:15.719586 7f48ec4d9800 -1 
bluestore(/var/lib/ceph/osd/ceph-0/block.db) _read_bdev_label unable to decode 
label at offset 66: buffer::malformed_input: void 
bluestore_bdev_label_t::decode(ceph::buffer::list::iterator&) decode past end 
of struct encoding
2016-07-19 21:30:15.719597 7f48ec4d9800 -1 bluestore(/var/lib/ceph/osd/ceph-0) 
_open_db check block device(/var/lib/ceph/osd/ceph-0/block.db) label returned: 
(22) Invalid argument
2016-07-19 21:30:15.719602 7f48ec4d9800  1 
bdev(/var/lib/ceph/osd/ceph-0/block.db) close
2016-07-19 21:30:15.999311 7f48ec4d9800  1 bdev(/var/lib/ceph/osd/ceph-0/block) 
close
2016-07-19 21:30:16.243312 7f48ec4d9800 -1 osd.0 0 OSD:init: unable to mount 
object store

I originally used 'ceph-disk prepare --bluestore' to create the OSD, but I feel 
like there is some kind of initialization step I need to do when moving the db 
and wal over to an NVMe device.  My google searches just aren't turning up 
much.  Could someone point me in the right direction?

Thanks,
Bryan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS write performance

2016-07-19 Thread Gregory Farnum
On Tue, Jul 19, 2016 at 9:39 AM, Patrick Donnelly  wrote:
> On Tue, Jul 19, 2016 at 10:25 AM, Fabiano de O. Lucchese
>  wrote:
>> I configured the cluster to replicate data twice (3 copies), so these
>> numbers fall within my expectations. So far so good, but here's comes the
>> issue: I configured CephFS and mounted a share locally on one of my servers.
>> When I write data to it, it shows abnormally high performance at the
>> beginning for about 5 seconds, stalls for about 20 seconds and then picks up
>> again. For long running tests, the observed write throughput is very close
>> to what the rados bench provided (about 640 MB/s), but for short-lived
>> tests, I get peak performances of over 5GB/s. I know that journaling is
>> expected to cause spiky performance patters like that, but not to this
>> level, which makes me think that CephFS is buffering my writes and returning
>> the control back to client before persisting them to the jounal, which looks
>> undesirable.
>
> The client is buffering the writes to RADOS which would give you the
> abnormally high initial performance until the cache needs flushed. You
> might try tweaking certain osd settings:
>
> http://docs.ceph.com/docs/hammer/rados/configuration/osd-config-ref/
>
> in particular: "osd client message size cap". Also:

I am reasonably sure you don't want to change the message size cap;
that's entirely an OSD-side throttle about how much dirty data it lets
in before it stops reading off the wire — and I don't think the client
feeds back from outgoing data. More likely it's about how much dirty
data is being absorbed by the Client before it forces writes out to
the OSDs and you want to look at

client_oc_size (default 1024*1024*200, aka 200MB)
client_oc_max_dirty (default 100MB)
client_oc_target_dirty (default 8MB)

and turn down the max dirty limits if you're finding it's too bumpy a ride.
-Greg

>
> http://docs.ceph.com/docs/hammer/rados/configuration/journal-ref/
>
> --
> Patrick Donnelly
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS write performance

2016-07-19 Thread John Spray
On Tue, Jul 19, 2016 at 3:25 PM, Fabiano de O. Lucchese
 wrote:
> Hi, folks.
>
> I'm conducting a series of experiments and tests with CephFS and have been
> facing a behavior over which I can't seem to have much control.
>
> I configured a 5-node Ceph cluster running on enterprise servers. Each
> server has 10 x 6TB HDDs and 2 x 800GB SSDs. I configured the SSDs as a
> RAID-1 device for journaling and also two of the HDDs for the same purpose
> for the sake of comparison. All other 8 HDDs are configured as OSDs. The
> servers have 196GB of RAM and our private network is backed by a 40GB/s
> Brocade switch (frontend is 10Gb/s).
>
> When benchmarking the HDDs directly, here's the performance I get:
>
> dd if=/dev/zero of=/var/lib/ceph/osd/ceph-0/deleteme bs=10G count=1
> oflag=direct &
>
> 0+1 records in
> 0+1 records out
> 2147479552 bytes (2.1 GB) copied, 11.684 s, 184 MB/s
>
> For read performance:
>
> dd if=/var/lib/ceph/osd/ceph-0/deleteme of=/dev/null bs=10G count=1
> iflag=direct &
>
> 0+1 records in
> 0+1 records out
> 2147479552 bytes (2.1 GB) copied, 8.30168 s, 259 MB/s
>
> Now, when I benchmark the OSDs configured with HDD-based journaling, here's
> what I get:
>
> [root@cephnode1 ceph-cluster]# ceph tell osd.1 bench
>
> {
> "bytes_written": 1073741824,
> "blocksize": 4194304,
> "bytes_per_sec": 426840870.00
> }
>
> which looks coherent. If I switch to the SDD-based journal, here's the new
> figure:
>
> [root@cephnode1 ~]# ceph tell osd.1 bench
> {
> "bytes_written": 1073741824,
> "blocksize": 4194304,
> "bytes_per_sec": 805229549.00
> }
>
> which, again, looks as expected to me.
>
> Finally, when I run the rados bench, here's what I get:
>
> rados bench -p cephfs_data 300 write --no-cleanup && rados bench -p
> cephfs_data 300 seq
>
> Total time run: 300.345098
> Total writes made:  48327
> Write size: 4194304
> Bandwidth (MB/sec): 643.620
>
> Stddev Bandwidth:   114.222
> Max bandwidth (MB/sec): 1196
> Min bandwidth (MB/sec): 0
> Average Latency:0.0994289
> Stddev Latency: 0.112926
> Max latency:1.85983
> Min latency:0.0139412
>
> 
>
> Total time run:300.121930
> Total reads made:  31990
> Read size: 4194304
> Bandwidth (MB/sec):426.360
>
> Average Latency:   0.149346
> Max latency:   1.77489
> Min latency:   0.00382452
>
> I configured the cluster to replicate data twice (3 copies), so these
> numbers fall within my expectations. So far so good, but here's comes the
> issue: I configured CephFS and mounted a share locally on one of my servers.
> When I write data to it, it shows abnormally high performance at the
> beginning for about 5 seconds, stalls for about 20 seconds and then picks up
> again. For long running tests, the observed write throughput is very close
> to what the rados bench provided (about 640 MB/s), but for short-lived
> tests, I get peak performances of over 5GB/s. I know that journaling is
> expected to cause spiky performance patters like that, but not to this
> level, which makes me think that CephFS is buffering my writes and returning
> the control back to client before persisting them to the jounal, which looks
> undesirable.

If you want to skip the caching in any filesystem, use the O_DIRECT
flag when opening a file.

You don't say exactly what your benchmark is, but presumably you have
a shortage of fsync calls, so you're not actually waiting for things
to persist?

John

> I searched the web for a couple of days looking for ways to disable this
> apparent write buffering, but couldn't find anything. So here comes my
> question: how can I disable it?
>
> Thanks and regards,
>
> F. Lucchese
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS write performance

2016-07-19 Thread Patrick Donnelly
On Tue, Jul 19, 2016 at 10:25 AM, Fabiano de O. Lucchese
 wrote:
> I configured the cluster to replicate data twice (3 copies), so these
> numbers fall within my expectations. So far so good, but here's comes the
> issue: I configured CephFS and mounted a share locally on one of my servers.
> When I write data to it, it shows abnormally high performance at the
> beginning for about 5 seconds, stalls for about 20 seconds and then picks up
> again. For long running tests, the observed write throughput is very close
> to what the rados bench provided (about 640 MB/s), but for short-lived
> tests, I get peak performances of over 5GB/s. I know that journaling is
> expected to cause spiky performance patters like that, but not to this
> level, which makes me think that CephFS is buffering my writes and returning
> the control back to client before persisting them to the jounal, which looks
> undesirable.

The client is buffering the writes to RADOS which would give you the
abnormally high initial performance until the cache needs flushed. You
might try tweaking certain osd settings:

http://docs.ceph.com/docs/hammer/rados/configuration/osd-config-ref/

in particular: "osd client message size cap". Also:

http://docs.ceph.com/docs/hammer/rados/configuration/journal-ref/

-- 
Patrick Donnelly
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Is anyone seeing iissues with task_numa_find_cpu?

2016-07-19 Thread Alex Gorbachev
On Mon, Jul 18, 2016 at 4:41 AM, Василий Ангапов  wrote:
> Guys,
>
> This bug is hitting me constantly, may be once per several days. Does
> anyone know is there a solution already?


I see there is a fix available, and am waiting for a backport to a
longterm kernel:

https://lkml.org/lkml/2016/7/12/919

https://lkml.org/lkml/2016/7/12/297

--
Alex Gorbachev
Storcium




>
> 2016-07-05 11:47 GMT+03:00 Nick Fisk :
>>> -Original Message-
>>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
>>> Alex Gorbachev
>>> Sent: 04 July 2016 20:50
>>> To: Campbell Steven 
>>> Cc: ceph-users ; Tim Bishop >> li...@bishnet.net>
>>> Subject: Re: [ceph-users] Is anyone seeing iissues with
>>> task_numa_find_cpu?
>>>
>>> On Wed, Jun 29, 2016 at 5:41 AM, Campbell Steven 
>>> wrote:
>>> > Hi Alex/Stefan,
>>> >
>>> > I'm in the middle of testing 4.7rc5 on our test cluster to confirm
>>> > once and for all this particular issue has been completely resolved by
>>> > Peter's recent patch to sched/fair.c refereed to by Stefan above. For
>>> > us anyway the patches that Stefan applied did not solve the issue and
>>> > neither did any 4.5.x or 4.6.x released kernel thus far, hopefully it
>>> > does the trick for you. We could get about 4 hours uptime before
>>> > things went haywire for us.
>>> >
>>> > It's interesting how it seems the CEPH workload triggers this bug so
>>> > well as it's quite a long standing issue that's only just been
>>> > resolved, another user chimed in on the lkml thread a couple of days
>>> > ago as well and again his trace had ceph-osd in it as well.
>>> >
>>> > https://lkml.org/lkml/headers/2016/6/21/491
>>> >
>>> > Campbell
>>>
>>> Campbell, any luck with testing 4.7rc5?  rc6 came out just now, and I am
>>> having trouble booting it on an ubuntu box due to some other unrelated
>>> problem.  So dropping to kernel 4.2.0 for now, which does not seem to have
>>> this load related problem.
>>>
>>> I looked at the fair.c code in kernel source tree 4.4.14 and it is quite
>> different
>>> than Peter's patch (assuming 4.5.x source), so the patch does not apply
>>> cleanly.  Maybe another 4.4.x kernel will get the update.
>>
>> I put in a new 16.04 node yesterday and went straight to 4.7.rc6. It's been
>> backfilling for just under 24 hours now with no drama. Disks are set to use
>> CFQ.
>>
>>>
>>> Thanks,
>>> Alex
>>>
>>>
>>>
>>> >
>>> > On 29 June 2016 at 18:29, Stefan Priebe - Profihost AG
>>> >  wrote:
>>> >>
>>> >> Am 29.06.2016 um 04:30 schrieb Alex Gorbachev:
>>> >>> Hi Stefan,
>>> >>>
>>> >>> On Tue, Jun 28, 2016 at 1:46 PM, Stefan Priebe - Profihost AG
>>> >>>  wrote:
>>>  Please be aware that you may need even more patches. Overall this
>>>  needs 3 patches. Where the first two try to fix a bug and the 3rd
>>>  one fixes the fixes + even more bugs related to the scheduler. I've
>>>  no idea on which patch level Ubuntu is.
>>> >>>
>>> >>> Stefan, would you be able to please point to the other two patches
>>> >>> beside https://lkml.org/lkml/diff/2016/6/22/102/1 ?
>>> >>
>>> >> Sorry sure yes:
>>> >>
>>> >> 1. 2b8c41daba32 ("sched/fair: Initiate a new task's util avg to a
>>> >> bounded value")
>>> >>
>>> >> 2.) 40ed9cba24bb7e01cc380a02d3f04065b8afae1d ("sched/fair: Fix
>>> >> post_init_entity_util_avg() serialization")
>>> >>
>>> >> 3.) the one listed at lkml.
>>> >>
>>> >> Stefan
>>> >>
>>> >>>
>>> >>> Thank you,
>>> >>> Alex
>>> >>>
>>> 
>>>  Stefan
>>> 
>>>  Excuse my typo sent from my mobile phone.
>>> 
>>>  Am 28.06.2016 um 17:59 schrieb Tim Bishop :
>>> 
>>>  Yes - I noticed this today on Ubuntu 16.04 with the default kernel.
>>>  No useful information to add other than it's not just you.
>>> 
>>>  Tim.
>>> 
>>>  On Tue, Jun 28, 2016 at 11:05:40AM -0400, Alex Gorbachev wrote:
>>> 
>>>  After upgrading to kernel 4.4.13 on Ubuntu, we are seeing a few of
>>> 
>>>  these issues where an OSD would fail with the stack below.  I
>>>  logged a
>>> 
>>>  bug at https://bugzilla.kernel.org/show_bug.cgi?id=121101 and there
>>>  is
>>> 
>>>  a similar description at https://lkml.org/lkml/2016/6/22/102, but
>>>  the
>>> 
>>>  odd part is we have turned off CFQ and blk-mq/scsi-mq and are using
>>> 
>>>  just the noop scheduler.
>>> 
>>> 
>>>  Does the ceph kernel code somehow use the fair scheduler code
>>> block?
>>> 
>>> 
>>>  Thanks
>>> 
>>>  --
>>> 
>>>  Alex Gorbachev
>>> 
>>>  Storcium
>>> 
>>> 
>>>  Jun 28 09:46:41 roc04r-sca090 kernel: [137912.684974] CPU: 30 PID:
>>> 
>>>  10403 Comm: ceph-osd Not tainted 4.4.13-040413-generic
>>>  #201606072354
>>> 
>>>  Jun 28 09:46:41 roc04r-sca090 kernel: [137912.684991] Hardware name:
>>> 
>>>  Supermicro X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS
>>> 3.2
>>> 
>>>  03/04/2015
>>> 
>>>  Jun 28 09:46:41 

[ceph-users] CephFS write performance

2016-07-19 Thread Fabiano de O. Lucchese
Hi, folks.



I'm conducting a series of experiments and tests with CephFS and have been 
facing a behavior over which I can't seem to have much control.
I configured a 5-node Ceph cluster running on enterprise servers. Each server 
has 10 x 6TB HDDs and 2 x 800GB SSDs. I configured the SSDs as a RAID-1 device 
for journaling and also two of the HDDs for the same purpose for the sake of 
comparison. All other 8 HDDs are configured as OSDs. The servers have 196GB of 
RAM and our private network is backed by a 40GB/s Brocade switch (frontend is 
10Gb/s).
When benchmarking the HDDs directly, here's the performance I get:
dd if=/dev/zero of=/var/lib/ceph/osd/ceph-0/deleteme bs=10G count=1 
oflag=direct &
0+1 records in0+1 records out2147479552 bytes (2.1 GB) copied, 11.684 s, 184 
MB/s
For read performance:
dd if=/var/lib/ceph/osd/ceph-0/deleteme of=/dev/null bs=10G count=1 
iflag=direct &
0+1 records in0+1 records out2147479552 bytes (2.1 GB) copied, 8.30168 s, 259 
MB/s
Now, when I benchmark the OSDs configured with HDD-based journaling, here's 
what I get:
[root@cephnode1 ceph-cluster]# ceph tell osd.1 bench
{    "bytes_written": 1073741824,    "blocksize": 4194304,    "bytes_per_sec": 
426840870.00}
which looks coherent. If I switch to the SDD-based journal, here's the new 
figure:
[root@cephnode1 ~]# ceph tell osd.1 bench{    "bytes_written": 1073741824,    
"blocksize": 4194304,    "bytes_per_sec": 805229549.00}
which, again, looks as expected to me.
Finally, when I run the rados bench, here's what I get:
rados bench -p cephfs_data 300 write --no-cleanup && rados bench -p cephfs_data 
300 seq
Total time run:         300.345098Total writes made:      48327Write size:      
       4194304Bandwidth (MB/sec):     643.620
Stddev Bandwidth:       114.222Max bandwidth (MB/sec): 1196Min bandwidth 
(MB/sec): 0Average Latency:        0.0994289Stddev Latency:         0.112926Max 
latency:            1.85983Min latency:            0.0139412

Total time run:        300.121930Total reads made:      31990Read size:         
    4194304Bandwidth (MB/sec):    426.360
Average Latency:       0.149346Max latency:           1.77489Min latency:       
    0.00382452
I configured the cluster to replicate data twice (3 copies), so these numbers 
fall within my expectations. So far so good, but here's comes the issue: I 
configured CephFS and mounted a share locally on one of my servers. When I 
write data to it, it shows abnormally high performance at the beginning for 
about 5 seconds, stalls for about 20 seconds and then picks up again. For long 
running tests, the observed write throughput is very close to what the rados 
bench provided (about 640 MB/s), but for short-lived tests, I get peak 
performances of over 5GB/s. I know that journaling is expected to cause spiky 
performance patters like that, but not to this level, which makes me think that 
CephFS is buffering my writes and returning the control back to client before 
persisting them to the jounal, which looks undesirable.
I searched the web for a couple of days looking for ways to disable this 
apparent write buffering, but couldn't find anything. So here comes my 
question: how can I disable it?
Thanks and regards,
F. Lucchese

  ___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] CephFS write performance

2016-07-19 Thread Fabiano de O. Lucchese
Hi, folks.
I'm conducting a series of experiments and tests with CephFS and have been 
facing a behavior over which I can't seem to have much control.
I configured a 5-node Ceph cluster running on enterprise servers. Each server 
has 10 x 6TB HDDs and 2 x 800GB SSDs. I configured the SSDs as a RAID-1 device 
for journaling and also two of the HDDs for the same purpose for the sake of 
comparison. All other 8 HDDs are configured as OSDs. The servers have 196GB of 
RAM and our private network is backed by a 40GB/s Brocade switch (frontend is 
10Gb/s).
When benchmarking the HDDs directly, here's the performance I get:
dd if=/dev/zero of=/var/lib/ceph/osd/ceph-0/deleteme bs=10G count=1 
oflag=direct &
0+1 records in0+1 records out2147479552 bytes (2.1 GB) copied, 11.684 s, 184 
MB/s
For read performance:
dd if=/var/lib/ceph/osd/ceph-0/deleteme of=/dev/null bs=10G count=1 
iflag=direct &
0+1 records in0+1 records out2147479552 bytes (2.1 GB) copied, 8.30168 s, 259 
MB/s
Now, when I benchmark the OSDs configured with HDD-based journaling, here's 
what I get:
[root@cephnode1 ceph-cluster]# ceph tell osd.1 bench
{    "bytes_written": 1073741824,    "blocksize": 4194304,    "bytes_per_sec": 
426840870.00}
which looks coherent. If I switch to the SDD-based journal, here's the new 
figure:
[root@cephnode1 ~]# ceph tell osd.1 bench{    "bytes_written": 1073741824,    
"blocksize": 4194304,    "bytes_per_sec": 805229549.00}
which, again, looks as expected to me.
Finally, when I run the rados bench, here's what I get:
rados bench -p cephfs_data 300 write --no-cleanup && rados bench -p cephfs_data 
300 seq
Total time run:         300.345098Total writes made:      48327Write size:      
       4194304Bandwidth (MB/sec):     643.620
Stddev Bandwidth:       114.222Max bandwidth (MB/sec): 1196Min bandwidth 
(MB/sec): 0Average Latency:        0.0994289Stddev Latency:         0.112926Max 
latency:            1.85983Min latency:            0.0139412

Total time run:        300.121930Total reads made:      31990Read size:         
    4194304Bandwidth (MB/sec):    426.360
Average Latency:       0.149346Max latency:           1.77489Min latency:       
    0.00382452
I configured the cluster to replicate data twice (3 copies), so these numbers 
fall within my expectations. So far so good, but here's comes the issue: I 
configured CephFS and mounted a share locally on one of my servers. When I 
write data to it, it shows abnormally high performance at the beginning for 
about 5 seconds, stalls for about 20 seconds and then picks up again. For long 
running tests, the observed write throughput is very close to what the rados 
bench provided (about 640 MB/s), but for short-lived tests, I get peak 
performances of over 5GB/s. I know that journaling is expected to cause spiky 
performance patters like that, but not to this level, which makes me think that 
CephFS is buffering my writes and returning the control back to client before 
persisting them to the jounal, which looks undesirable.
I searched the web for a couple of days looking for ways to disable this 
apparent write buffering, but couldn't find anything. So here comes my 
question: how can I disable it?
Thanks and regards,
F. Lucchese___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Storage tiering in Ceph

2016-07-19 Thread Andrey Ptashnik
Hi Team,

Is there any way to implement storage tiering in Ceph Jewel?
I’ve read about different placing different pool on different performance 
hardware, however is there any automation possible in Ceph that will promote 
data from slow hardware to fast one and back?

Regards,

Andrey

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Cache Tier configuration

2016-07-19 Thread Christian Balzer

Hello,

On Tue, 19 Jul 2016 15:15:55 +0200 Mateusz Skała wrote:

> Hello,
> 
> > -Original Message-
> > From: Christian Balzer [mailto:ch...@gol.com]
> > Sent: Wednesday, July 13, 2016 4:03 AM
> > To: ceph-users@lists.ceph.com
> > Cc: Mateusz Skała 
> > Subject: Re: [ceph-users] Cache Tier configuration
> > 
> > 
> > Hello,
> > 
> > On Tue, 12 Jul 2016 11:01:30 +0200 Mateusz Skała wrote:
> > 
> > > Thank You for replay. Answers below.
> > >
> > > > -Original Message-
> > > > From: Christian Balzer [mailto:ch...@gol.com]
> > > > Sent: Tuesday, July 12, 2016 3:37 AM
> > > > To: ceph-users@lists.ceph.com
> > > > Cc: Mateusz Skała 
> > > > Subject: Re: [ceph-users] Cache Tier configuration
> > > >
> > > >
> > > > Hello,
> > > >
> > > > On Mon, 11 Jul 2016 16:19:58 +0200 Mateusz Skała wrote:
> > > >
> > > > > Hello Cephers.
> > > > >
> > > > > Can someone help me in my cache tier configuration? I have 4 
> > > > > same SSD drives 176GB (184196208K) in SSD pool, how to determine
> > > > target_max_bytes?
> > > >
> > > > What exact SSD models are these?
> > > > What version of Ceph?
> > >
> > > Intel DC S3610 (SSDSC2BX200G401), ceph version 9.2.1
> > > (752b6a3020c3de74e07d2a8b4c5e48dab5a6b6fd)
> > >
> > 
> > Good, these are decent SSDs and at 3DWPD probably durable enough, too.
> > You will want to monitor their wear-out level anyway, though.
> > 
> > Remember, dead cache pool means unaccessible and/or lost data.
> > 
> > Jewel has improved cache controls and a different, less aggressive 
> > default behavior, you may want to consider upgrading to it, especially 
> > if you don't want to become a cache tiering specialist. ^o^
> > 
> > Also Infernalis is no longer receiving updates.
> 
> We are planning upgrade in first week of August. 
> 
You might want to wait until the next version of Jewel is out, unless you
have a test/staging cluster to verify your upgrade procedure on.

Jewel is a better choice than Infernalis, but with still a number of bugs
and also a LOT of poorly or not at all documented massive changes it
doesn't make me all that eager to upgrade right here, right now.

> > > > > I assume
> > > > > that should be (4 drives* 188616916992 bytes )/ 3 replica =
> > > > > 251489222656 bytes *85% (because of full disk warning)
> > > >
> > > > In theory correct, but you might want to consider (like with all
> > > > pools) the impact of loosing a single SSD.
> > > > In short, backfilling and then the remaining 3 getting full anyway.
> > > >
> > >
> > > OK, so better to make lower max target bates than I have space? For
> > example 170GB? Then I will have 1 osd reserve.
> > >
> > Something like this, though failures with these SSDs are very unlikely.
> > 
> > > > > It will be 213765839257 bytes ~200GB. I make this little bit 
> > > > > lower
> > > > > (160GB) and after some time whole cluster stops on full disk error.
> > > > > One of SSD drives are full. I see that use of space at the osd is not 
> > > > > equal:
> > > > >
> > > > > 32 0.17099  1.0   175G   127G 49514M 72.47 1.77  95
> > > > >
> > > > > 42 0.17099  1.0   175G   120G 56154M 68.78 1.68  90
> > > > >
> > > > > 37 0.17099  1.0   175G   136G 39670M 77.95 1.90 102
> > > > >
> > > > > 47 0.17099  1.0   175G   130G 46599M 74.09 1.80  97
> > > > >
> > > >
> > > > What's the exact error message?
> > > >
> > > > None of these are over 85 or 95%, how are they full?
> > >
> > > Osd.37 was full on 96%, after error (heath ERR, 1 full osd).Then I 
> > > set
> > max_target_bytes on 100GB. Flushing reduced used space, now cluster is 
> > working ok, but I want to clarify my configuration.
> > >
> > Don't get flushing (copying dirty objects to the backing pool) and 
> > eviction (deleting, really zero-ing, clean objects).
> > Eviction is what frees up space, but it needs flushed (clean) objects 
> > to work with.
> > 
> 
> OK, I understand that evicting frees space?
> 
Yes, re-read the relevant documentation.

> > >
> > > >
> > > > If the above is a snapshot of when Ceph thinks something is 
> > > > "full", it may be an indication that you've reached 
> > > > target_max_bytes and Ceph simply has no clean (flushed) objects ready 
> > > > to evict.
> > > > Which means a configuration problem (all ratios, not the defaults, 
> > > > for this pool please) or your cache filling up faster than it can flush.
> > > >
> > > Above snapshot is at this time, when cluster Is working OK. Filling 
> > > faster than flushing is very possible, when the error become I have 
> > > in config min 'promote' set at 1, like this
> > >
> > > "osd_tier_default_cache_min_read_recency_for_promote": "1",
> > > "osd_tier_default_cache_min_write_recency_for_promote": "1",
> > >
> > > Now I changed this to 3, and looks like is working, 3 days without 
> > > near full
> > osd.
> > >
> > There are a number of other options to control things, especially with 
> > Jewel.
> > Also setting your cache mode to readforward might be a good idea 
> > depending on 

Re: [ceph-users] how to use cache tiering with proxy in ceph-10.2.2

2016-07-19 Thread Christian Balzer

Hello,

On Tue, 19 Jul 2016 12:24:01 +0200 Oliver Dzombic wrote:

> Hi,
> 
> i have in my ceph.conf under [OSD] Section:
> 
> osd_tier_promote_max_bytes_sec = 1610612736
> osd_tier_promote_max_objects_sec = 2
> 
> #ceph --show-config is showing:
> 
> osd_tier_promote_max_objects_sec = 5242880
> osd_tier_promote_max_bytes_sec = 25
> 
> But in fact its working. Maybe some Bug in showing the correct value.
> 
> I had Problems too, that the IO was going to the cold storage mostly.
> 
> After i changed this values ( and restarted >every< node inside the
> cluster ) the problem was gone.
> 
> So i assume, that its simply showing the wrong values if you call the
> show-config. Or there is some other miracle going on.
> 
> I just checked:
> 
> #ceph --show-config | grep osd_tier
> 
> shows:
> 
> osd_tier_default_cache_hit_set_count = 4
> osd_tier_default_cache_hit_set_period = 1200
> 
> while
> 
> #ceph osd pool get ssd_cache hit_set_count
> #ceph osd pool get ssd_cache hit_set_period
> 
> show
> 
> hit_set_count: 1
> hit_set_period: 120
> 
Apples and oranges.

Your first query is about the config (and thus default, as it says in the
output) options, the second one is for a specific pool.

There might be still any sorts of breakage with show-config and having to
restart OSDs to have changes take effect is inelegant at least, but the
above is not a bug.

Christian

> 
> So you can obviously ignore the ceph --show-config command. Its simply
> not working correctly.
> 
> 


-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Rakuten Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Cache Tier configuration

2016-07-19 Thread Mateusz Skała
Hello,

> -Original Message-
> From: Christian Balzer [mailto:ch...@gol.com]
> Sent: Wednesday, July 13, 2016 4:03 AM
> To: ceph-users@lists.ceph.com
> Cc: Mateusz Skała 
> Subject: Re: [ceph-users] Cache Tier configuration
> 
> 
> Hello,
> 
> On Tue, 12 Jul 2016 11:01:30 +0200 Mateusz Skała wrote:
> 
> > Thank You for replay. Answers below.
> >
> > > -Original Message-
> > > From: Christian Balzer [mailto:ch...@gol.com]
> > > Sent: Tuesday, July 12, 2016 3:37 AM
> > > To: ceph-users@lists.ceph.com
> > > Cc: Mateusz Skała 
> > > Subject: Re: [ceph-users] Cache Tier configuration
> > >
> > >
> > > Hello,
> > >
> > > On Mon, 11 Jul 2016 16:19:58 +0200 Mateusz Skała wrote:
> > >
> > > > Hello Cephers.
> > > >
> > > > Can someone help me in my cache tier configuration? I have 4 
> > > > same SSD drives 176GB (184196208K) in SSD pool, how to determine
> > > target_max_bytes?
> > >
> > > What exact SSD models are these?
> > > What version of Ceph?
> >
> > Intel DC S3610 (SSDSC2BX200G401), ceph version 9.2.1
> > (752b6a3020c3de74e07d2a8b4c5e48dab5a6b6fd)
> >
> 
> Good, these are decent SSDs and at 3DWPD probably durable enough, too.
> You will want to monitor their wear-out level anyway, though.
> 
> Remember, dead cache pool means unaccessible and/or lost data.
> 
> Jewel has improved cache controls and a different, less aggressive 
> default behavior, you may want to consider upgrading to it, especially 
> if you don't want to become a cache tiering specialist. ^o^
> 
> Also Infernalis is no longer receiving updates.

We are planning upgrade in first week of August. 

> > > > I assume
> > > > that should be (4 drives* 188616916992 bytes )/ 3 replica =
> > > > 251489222656 bytes *85% (because of full disk warning)
> > >
> > > In theory correct, but you might want to consider (like with all
> > > pools) the impact of loosing a single SSD.
> > > In short, backfilling and then the remaining 3 getting full anyway.
> > >
> >
> > OK, so better to make lower max target bates than I have space? For
> example 170GB? Then I will have 1 osd reserve.
> >
> Something like this, though failures with these SSDs are very unlikely.
> 
> > > > It will be 213765839257 bytes ~200GB. I make this little bit 
> > > > lower
> > > > (160GB) and after some time whole cluster stops on full disk error.
> > > > One of SSD drives are full. I see that use of space at the osd is not 
> > > > equal:
> > > >
> > > > 32 0.17099  1.0   175G   127G 49514M 72.47 1.77  95
> > > >
> > > > 42 0.17099  1.0   175G   120G 56154M 68.78 1.68  90
> > > >
> > > > 37 0.17099  1.0   175G   136G 39670M 77.95 1.90 102
> > > >
> > > > 47 0.17099  1.0   175G   130G 46599M 74.09 1.80  97
> > > >
> > >
> > > What's the exact error message?
> > >
> > > None of these are over 85 or 95%, how are they full?
> >
> > Osd.37 was full on 96%, after error (heath ERR, 1 full osd).Then I 
> > set
> max_target_bytes on 100GB. Flushing reduced used space, now cluster is 
> working ok, but I want to clarify my configuration.
> >
> Don't get flushing (copying dirty objects to the backing pool) and 
> eviction (deleting, really zero-ing, clean objects).
> Eviction is what frees up space, but it needs flushed (clean) objects 
> to work with.
> 

OK, I understand that evicting frees space?

> >
> > >
> > > If the above is a snapshot of when Ceph thinks something is 
> > > "full", it may be an indication that you've reached 
> > > target_max_bytes and Ceph simply has no clean (flushed) objects ready to 
> > > evict.
> > > Which means a configuration problem (all ratios, not the defaults, 
> > > for this pool please) or your cache filling up faster than it can flush.
> > >
> > Above snapshot is at this time, when cluster Is working OK. Filling 
> > faster than flushing is very possible, when the error become I have 
> > in config min 'promote' set at 1, like this
> >
> > "osd_tier_default_cache_min_read_recency_for_promote": "1",
> > "osd_tier_default_cache_min_write_recency_for_promote": "1",
> >
> > Now I changed this to 3, and looks like is working, 3 days without 
> > near full
> osd.
> >
> There are a number of other options to control things, especially with Jewel.
> Also setting your cache mode to readforward might be a good idea 
> depending on your use case.
> 
I'm considering this move, especially we are also using SSD Journal. Please 
confirm, can I use cache tire readforward with pool size 1? It is safe? Then I 
will have 3 times more space for cache tier.

> > > Space is never equal with Ceph, you need a high enough number of 
> > > PGs for starters and then some fine-tuning.
> > >
> > > After fiddling with the weights my cache-tier SSD OSDs are all 
> > > very close to each other:
> > > ---
> > > ID WEIGHT  REWEIGHT SIZE  USEAVAIL  %USE  VAR
> > > 18 0.64999  1.0  679G   543G   136G 79.96 4.35
> > > 19 0.67000  1.0  679G   540G   138G 79.61 4.33
> > > 20 0.64999  1.0  679G   534G   144G 78.70 4.28
> > > 21 0.64999  1.

Re: [ceph-users] ceph OSD with 95% full

2016-07-19 Thread M Ranga Swami Reddy
+1 .. I agree

Thanks
Swami

On Tue, Jul 19, 2016 at 4:57 PM, Lionel Bouton  wrote:
> Hi,
>
> On 19/07/2016 13:06, Wido den Hollander wrote:
>>> Op 19 juli 2016 om 12:37 schreef M Ranga Swami Reddy :
>>>
>>>
>>> Thanks for the correction...so even one OSD reaches to 95% full, the
>>> total ceph cluster IO (R/W) will be blocked...Ideally read IO should
>>> work...
>> That should be a config option, since reading while writes still block is 
>> also a danger. Multiple clients could read the same object, perform a 
>> in-memory change and their write will block.
>>
>> Now, which client will 'win' after the full flag has been removed?
>>
>> That could lead to data corruption.
>
> If it did, the clients would be broken as normal usage (without writes
> being blocked) doesn't prevent multiple clients from reading the same
> data and trying to write at the same time. So if multiple writes (I
> suppose on the same data blocks) are possibly waiting the order in which
> they are performed *must not* matter in your system. The alternative is
> to prevent simultaneous write accesses from multiple clients (this is
> how non-cluster filesystems must be configured on top of Ceph/RBD, they
> must even be prevented from read-only accessing an already mounted fs).
>
>>
>> Just make sure you have proper monitoring on your Ceph cluster. At nearfull 
>> it goes into WARN and you should act on that.
>
>
> +1 : monitoring is not an option.
>
> Lionel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph OSD with 95% full

2016-07-19 Thread M Ranga Swami Reddy
>That should be a config option, since reading while writes still block is also 
>a danger. Multiple clients could read the same object, >perform a in-memory 
>change and their write will block.
>Now, which client will 'win' after the full flag has been removed?

>That could lead to data corruption.

Read ops may not do the any issue...but, I agree with you - that write
IO is an issue and its blocked.

>Just make sure you have proper monitoring on your Ceph cluster. At nearfull it 
>goes into WARN and you should act on that.

Yes..

Thanks
Swami

On Tue, Jul 19, 2016 at 4:36 PM, Wido den Hollander  wrote:
>
>> Op 19 juli 2016 om 12:37 schreef M Ranga Swami Reddy :
>>
>>
>> Thanks for the correction...so even one OSD reaches to 95% full, the
>> total ceph cluster IO (R/W) will be blocked...Ideally read IO should
>> work...
>
> That should be a config option, since reading while writes still block is 
> also a danger. Multiple clients could read the same object, perform a 
> in-memory change and their write will block.
>
> Now, which client will 'win' after the full flag has been removed?
>
> That could lead to data corruption.
>
> Just make sure you have proper monitoring on your Ceph cluster. At nearfull 
> it goes into WARN and you should act on that.
>
> Wido
>
>>
>> Thanks
>> Swami
>>
>> On Tue, Jul 19, 2016 at 3:41 PM, Wido den Hollander  wrote:
>> >
>> >> Op 19 juli 2016 om 11:55 schreef M Ranga Swami Reddy 
>> >> :
>> >>
>> >>
>> >> Thanks for detail...
>> >> When an OSD is 95% full, then that specific OSD's write IO blocked.
>> >>
>> >
>> > No, the *whole* cluster will block. In the OSDMap the flag 'full' is set 
>> > which causes all I/O to stop (even read!) until you make sure the OSD 
>> > drops below 95%.
>> >
>> > Wido
>> >
>> >> Thanks
>> >> Swami
>> >>
>> >> On Tue, Jul 19, 2016 at 3:07 PM, Christian Balzer  wrote:
>> >> >
>> >> > Hello,
>> >> >
>> >> > On Tue, 19 Jul 2016 14:23:32 +0530 M Ranga Swami Reddy wrote:
>> >> >
>> >> >> >> Using ceph cluster with 100+ OSDs and cluster is filled with 60% 
>> >> >> >> data.
>> >> >> >> One of the OSD is 95% full.
>> >> >> >> If an OSD is 95% full, is it impact the any storage operation? Is 
>> >> >> >> this
>> >> >> >> impacts on VM/Instance?
>> >> >>
>> >> >> >Yes, one OSD will impact whole cluster. It will block write 
>> >> >> >operations to the cluster
>> >> >>
>> >> >> Thanks for clarification. Really?? Is this(OSD 95%) full designed to
>> >> >> block write I/O of ceph cluster?
>> >> >>
>> >> > Really.
>> >> > To be more precise, any I/O that touches any PG on that OSD will block.
>> >> > So with a sufficiently large cluster you may have some, few, I/Os still 
>> >> > go
>> >> > through as they don't use that OSD at all.
>> >> >
>> >> > That's why:
>> >> >
>> >> > 1. Ceph has the near-full warning (which of course may need to be
>> >> > adjusted to correctly reflect things, especially with smaller clusters).
>> >> > Once you get that warning, you NEED to take action immediately.
>> >> >
>> >> > 2. You want to graph the space utilization of all your OSDs with 
>> >> > something
>> >> > like graphite. That allows you to spot trends of uneven data 
>> >> > distribution
>> >> > early and thus react early to it.
>> >> > I re-weight (CRUSH re-weight, as this is permanent and my clusters 
>> >> > aren't
>> >> > growing frequently) OSDs so they they are at least within 10% of each
>> >> > other.
>> >> >
>> >> > Christian
>> >> >> Because I have around 251 OSDs out which one OSD is 95% full, but
>> >> >> other 250 OSDs not in near full also...
>> >> >>
>> >> >> Thanks
>> >> >> Swami
>> >> >>
>> >> >>
>> >> >> On Tue, Jul 19, 2016 at 2:17 PM, Henrik Korkuc  wrote:
>> >> >> > On 16-07-19 11:44, M Ranga Swami Reddy wrote:
>> >> >> >>
>> >> >> >> Hi,
>> >> >> >> Using ceph cluster with 100+ OSDs and cluster is filled with 60% 
>> >> >> >> data.
>> >> >> >> One of the OSD is 95% full.
>> >> >> >> If an OSD is 95% full, is it impact the any storage operation? Is 
>> >> >> >> this
>> >> >> >> impacts on VM/Instance?
>> >> >> >
>> >> >> > Yes, one OSD will impact whole cluster. It will block write 
>> >> >> > operations to
>> >> >> > the cluster
>> >> >> >>
>> >> >> >> Immediately I have reduced the OSD weight, which was filled with 95 
>> >> >> >> %
>> >> >> >> data. After re-weight, data rebalanaced and OSD came to normal state
>> >> >> >> (ie < 80%) with 1 hour time frame.
>> >> >> >>
>> >> >> >>
>> >> >> >> Thanks
>> >> >> >> Swami
>> >> >> >> ___
>> >> >> >> ceph-users mailing list
>> >> >> >> ceph-users@lists.ceph.com
>> >> >> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > ___
>> >> >> > ceph-users mailing list
>> >> >> > ceph-users@lists.ceph.com
>> >> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >> >> ___
>> >> >> ceph-users mailing list
>> >> >> ceph-u

Re: [ceph-users] ceph OSD with 95% full

2016-07-19 Thread Lionel Bouton
Hi,

On 19/07/2016 13:06, Wido den Hollander wrote:
>> Op 19 juli 2016 om 12:37 schreef M Ranga Swami Reddy :
>>
>>
>> Thanks for the correction...so even one OSD reaches to 95% full, the
>> total ceph cluster IO (R/W) will be blocked...Ideally read IO should
>> work...
> That should be a config option, since reading while writes still block is 
> also a danger. Multiple clients could read the same object, perform a 
> in-memory change and their write will block.
>
> Now, which client will 'win' after the full flag has been removed?
>
> That could lead to data corruption.

If it did, the clients would be broken as normal usage (without writes
being blocked) doesn't prevent multiple clients from reading the same
data and trying to write at the same time. So if multiple writes (I
suppose on the same data blocks) are possibly waiting the order in which
they are performed *must not* matter in your system. The alternative is
to prevent simultaneous write accesses from multiple clients (this is
how non-cluster filesystems must be configured on top of Ceph/RBD, they
must even be prevented from read-only accessing an already mounted fs).

>
> Just make sure you have proper monitoring on your Ceph cluster. At nearfull 
> it goes into WARN and you should act on that.


+1 : monitoring is not an option.

Lionel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph OSD with 95% full

2016-07-19 Thread Wido den Hollander

> Op 19 juli 2016 om 12:37 schreef M Ranga Swami Reddy :
> 
> 
> Thanks for the correction...so even one OSD reaches to 95% full, the
> total ceph cluster IO (R/W) will be blocked...Ideally read IO should
> work...

That should be a config option, since reading while writes still block is also 
a danger. Multiple clients could read the same object, perform a in-memory 
change and their write will block.

Now, which client will 'win' after the full flag has been removed?

That could lead to data corruption.

Just make sure you have proper monitoring on your Ceph cluster. At nearfull it 
goes into WARN and you should act on that.

Wido

> 
> Thanks
> Swami
> 
> On Tue, Jul 19, 2016 at 3:41 PM, Wido den Hollander  wrote:
> >
> >> Op 19 juli 2016 om 11:55 schreef M Ranga Swami Reddy 
> >> :
> >>
> >>
> >> Thanks for detail...
> >> When an OSD is 95% full, then that specific OSD's write IO blocked.
> >>
> >
> > No, the *whole* cluster will block. In the OSDMap the flag 'full' is set 
> > which causes all I/O to stop (even read!) until you make sure the OSD drops 
> > below 95%.
> >
> > Wido
> >
> >> Thanks
> >> Swami
> >>
> >> On Tue, Jul 19, 2016 at 3:07 PM, Christian Balzer  wrote:
> >> >
> >> > Hello,
> >> >
> >> > On Tue, 19 Jul 2016 14:23:32 +0530 M Ranga Swami Reddy wrote:
> >> >
> >> >> >> Using ceph cluster with 100+ OSDs and cluster is filled with 60% 
> >> >> >> data.
> >> >> >> One of the OSD is 95% full.
> >> >> >> If an OSD is 95% full, is it impact the any storage operation? Is 
> >> >> >> this
> >> >> >> impacts on VM/Instance?
> >> >>
> >> >> >Yes, one OSD will impact whole cluster. It will block write operations 
> >> >> >to the cluster
> >> >>
> >> >> Thanks for clarification. Really?? Is this(OSD 95%) full designed to
> >> >> block write I/O of ceph cluster?
> >> >>
> >> > Really.
> >> > To be more precise, any I/O that touches any PG on that OSD will block.
> >> > So with a sufficiently large cluster you may have some, few, I/Os still 
> >> > go
> >> > through as they don't use that OSD at all.
> >> >
> >> > That's why:
> >> >
> >> > 1. Ceph has the near-full warning (which of course may need to be
> >> > adjusted to correctly reflect things, especially with smaller clusters).
> >> > Once you get that warning, you NEED to take action immediately.
> >> >
> >> > 2. You want to graph the space utilization of all your OSDs with 
> >> > something
> >> > like graphite. That allows you to spot trends of uneven data distribution
> >> > early and thus react early to it.
> >> > I re-weight (CRUSH re-weight, as this is permanent and my clusters aren't
> >> > growing frequently) OSDs so they they are at least within 10% of each
> >> > other.
> >> >
> >> > Christian
> >> >> Because I have around 251 OSDs out which one OSD is 95% full, but
> >> >> other 250 OSDs not in near full also...
> >> >>
> >> >> Thanks
> >> >> Swami
> >> >>
> >> >>
> >> >> On Tue, Jul 19, 2016 at 2:17 PM, Henrik Korkuc  wrote:
> >> >> > On 16-07-19 11:44, M Ranga Swami Reddy wrote:
> >> >> >>
> >> >> >> Hi,
> >> >> >> Using ceph cluster with 100+ OSDs and cluster is filled with 60% 
> >> >> >> data.
> >> >> >> One of the OSD is 95% full.
> >> >> >> If an OSD is 95% full, is it impact the any storage operation? Is 
> >> >> >> this
> >> >> >> impacts on VM/Instance?
> >> >> >
> >> >> > Yes, one OSD will impact whole cluster. It will block write 
> >> >> > operations to
> >> >> > the cluster
> >> >> >>
> >> >> >> Immediately I have reduced the OSD weight, which was filled with 95 %
> >> >> >> data. After re-weight, data rebalanaced and OSD came to normal state
> >> >> >> (ie < 80%) with 1 hour time frame.
> >> >> >>
> >> >> >>
> >> >> >> Thanks
> >> >> >> Swami
> >> >> >> ___
> >> >> >> ceph-users mailing list
> >> >> >> ceph-users@lists.ceph.com
> >> >> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> >> >
> >> >> >
> >> >> >
> >> >> > ___
> >> >> > ceph-users mailing list
> >> >> > ceph-users@lists.ceph.com
> >> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> >> ___
> >> >> ceph-users mailing list
> >> >> ceph-users@lists.ceph.com
> >> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> >>
> >> >
> >> >
> >> > --
> >> > Christian BalzerNetwork/Systems Engineer
> >> > ch...@gol.com   Global OnLine Japan/Rakuten Communications
> >> > http://www.gol.com/
> >> ___
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph admin socket from non root

2016-07-19 Thread Stefan Priebe - Profihost AG
Am 18.07.2016 um 20:14 schrieb Gregory Farnum:
> I'm not familiar with how it's set up but skimming and searching
> through the code I'm not seeing anything, no. We've got a chown but no
> chmod.

That's odd ;-) how do all the people do their monitoring? running as root?

> That's a reasonably feature idea though, and presumably you
> could add a chmod it to your init scripts?

Yes i could hack that into the init script. I just had the feeling that
the feature must exist and i'm just missing something.

Greets,
Stefan

> -Greg
> 
> On Mon, Jul 18, 2016 at 3:02 AM, Stefan Priebe - Profihost AG
>  wrote:
>>
>> Nobody? Is it at least possible with jewel to give the sockets group
>> write permissions?
>>
>> Am 10.07.2016 um 23:51 schrieb Stefan Priebe - Profihost AG:
>>> Hi,
>>>
>>> is there a proposed way how to connect from non root f.e. a monitoring
>>> system to the ceph admin socket?
>>>
>>> In the past they were created with 777 permissions but now they're 755
>>> which prevents me from connecting from our monitoring daemon. I don't
>>> like to set CAP_DAC_OVERRIDE for the monitoring agent.
>>>
>>> Greets,
>>> Stefan
>>>
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph OSD with 95% full

2016-07-19 Thread M Ranga Swami Reddy
Thanks for the correction...so even one OSD reaches to 95% full, the
total ceph cluster IO (R/W) will be blocked...Ideally read IO should
work...

Thanks
Swami

On Tue, Jul 19, 2016 at 3:41 PM, Wido den Hollander  wrote:
>
>> Op 19 juli 2016 om 11:55 schreef M Ranga Swami Reddy :
>>
>>
>> Thanks for detail...
>> When an OSD is 95% full, then that specific OSD's write IO blocked.
>>
>
> No, the *whole* cluster will block. In the OSDMap the flag 'full' is set 
> which causes all I/O to stop (even read!) until you make sure the OSD drops 
> below 95%.
>
> Wido
>
>> Thanks
>> Swami
>>
>> On Tue, Jul 19, 2016 at 3:07 PM, Christian Balzer  wrote:
>> >
>> > Hello,
>> >
>> > On Tue, 19 Jul 2016 14:23:32 +0530 M Ranga Swami Reddy wrote:
>> >
>> >> >> Using ceph cluster with 100+ OSDs and cluster is filled with 60% data.
>> >> >> One of the OSD is 95% full.
>> >> >> If an OSD is 95% full, is it impact the any storage operation? Is this
>> >> >> impacts on VM/Instance?
>> >>
>> >> >Yes, one OSD will impact whole cluster. It will block write operations 
>> >> >to the cluster
>> >>
>> >> Thanks for clarification. Really?? Is this(OSD 95%) full designed to
>> >> block write I/O of ceph cluster?
>> >>
>> > Really.
>> > To be more precise, any I/O that touches any PG on that OSD will block.
>> > So with a sufficiently large cluster you may have some, few, I/Os still go
>> > through as they don't use that OSD at all.
>> >
>> > That's why:
>> >
>> > 1. Ceph has the near-full warning (which of course may need to be
>> > adjusted to correctly reflect things, especially with smaller clusters).
>> > Once you get that warning, you NEED to take action immediately.
>> >
>> > 2. You want to graph the space utilization of all your OSDs with something
>> > like graphite. That allows you to spot trends of uneven data distribution
>> > early and thus react early to it.
>> > I re-weight (CRUSH re-weight, as this is permanent and my clusters aren't
>> > growing frequently) OSDs so they they are at least within 10% of each
>> > other.
>> >
>> > Christian
>> >> Because I have around 251 OSDs out which one OSD is 95% full, but
>> >> other 250 OSDs not in near full also...
>> >>
>> >> Thanks
>> >> Swami
>> >>
>> >>
>> >> On Tue, Jul 19, 2016 at 2:17 PM, Henrik Korkuc  wrote:
>> >> > On 16-07-19 11:44, M Ranga Swami Reddy wrote:
>> >> >>
>> >> >> Hi,
>> >> >> Using ceph cluster with 100+ OSDs and cluster is filled with 60% data.
>> >> >> One of the OSD is 95% full.
>> >> >> If an OSD is 95% full, is it impact the any storage operation? Is this
>> >> >> impacts on VM/Instance?
>> >> >
>> >> > Yes, one OSD will impact whole cluster. It will block write operations 
>> >> > to
>> >> > the cluster
>> >> >>
>> >> >> Immediately I have reduced the OSD weight, which was filled with 95 %
>> >> >> data. After re-weight, data rebalanaced and OSD came to normal state
>> >> >> (ie < 80%) with 1 hour time frame.
>> >> >>
>> >> >>
>> >> >> Thanks
>> >> >> Swami
>> >> >> ___
>> >> >> ceph-users mailing list
>> >> >> ceph-users@lists.ceph.com
>> >> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >> >
>> >> >
>> >> >
>> >> > ___
>> >> > ceph-users mailing list
>> >> > ceph-users@lists.ceph.com
>> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >> ___
>> >> ceph-users mailing list
>> >> ceph-users@lists.ceph.com
>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >>
>> >
>> >
>> > --
>> > Christian BalzerNetwork/Systems Engineer
>> > ch...@gol.com   Global OnLine Japan/Rakuten Communications
>> > http://www.gol.com/
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] how to use cache tiering with proxy in ceph-10.2.2

2016-07-19 Thread Oliver Dzombic
Hi,

i have in my ceph.conf under [OSD] Section:

osd_tier_promote_max_bytes_sec = 1610612736
osd_tier_promote_max_objects_sec = 2

#ceph --show-config is showing:

osd_tier_promote_max_objects_sec = 5242880
osd_tier_promote_max_bytes_sec = 25

But in fact its working. Maybe some Bug in showing the correct value.

I had Problems too, that the IO was going to the cold storage mostly.

After i changed this values ( and restarted >every< node inside the
cluster ) the problem was gone.

So i assume, that its simply showing the wrong values if you call the
show-config. Or there is some other miracle going on.

I just checked:

#ceph --show-config | grep osd_tier

shows:

osd_tier_default_cache_hit_set_count = 4
osd_tier_default_cache_hit_set_period = 1200

while

#ceph osd pool get ssd_cache hit_set_count
#ceph osd pool get ssd_cache hit_set_period

show

hit_set_count: 1
hit_set_period: 120


So you can obviously ignore the ceph --show-config command. Its simply
not working correctly.


-- 
Mit freundlichen Gruessen / Best regards

Oliver Dzombic
IP-Interactive

mailto:i...@ip-interactive.de

Anschrift:

IP Interactive UG ( haftungsbeschraenkt )
Zum Sonnenberg 1-3
63571 Gelnhausen

HRB 93402 beim Amtsgericht Hanau
Geschäftsführung: Oliver Dzombic

Steuer Nr.: 35 236 3622 1
UST ID: DE274086107


Am 19.07.2016 um 08:43 schrieb m13913886...@yahoo.com:
> I have configured ceph.conf with "osd_tier_promote_max_bytes_sec" in
> [osd] Attributes. But it still invalid.
> I do command --show-config discovery that it has not been modified.
> 
> [root@node01 ~]# cat /etc/ceph/ceph.conf | grep tier
> osd_tier_promote_max_objects_sec=20
> osd_tier_promote_max_bytes_sec=16106127360
> 
>  [root@node01 ~]# ceph --show-config | grep tier
> mon_debug_unsafe_allow_tier_with_nonempty_snaps = false
> osd_tier_promote_max_objects_sec = 5242880
> osd_tier_promote_max_bytes_sec = 25
> osd_tier_default_cache_mode = writeback
> osd_tier_default_cache_hit_set_count = 4
> osd_tier_default_cache_hit_set_period = 1200
> osd_tier_default_cache_hit_set_type = bloom
> osd_tier_default_cache_min_read_recency_for_promote = 1
> osd_tier_default_cache_min_write_recency_for_promote = 1
> osd_tier_default_cache_hit_set_grade_decay_rate = 20
> osd_tier_default_cache_hit_set_search_last_n = 1
> 
> and cache tiering does not work , low iops.
> 
> 
> 
> On Monday, July 18, 2016 5:33 PM, "m13913886...@yahoo.com"
>  wrote:
> 
> 
> thank you very much! 
> 
> 
> On Monday, July 18, 2016 5:31 PM, Oliver Dzombic
>  wrote:
> 
> 
> Hi,
> 
> everything is here:
> 
> http://docs.ceph.com/docs/jewel/
> 
> except
> 
> osd_tier_promote_max_bytes_sec
> 
> and other stuff, but its enough there that you can make it work.
> 
> -- 
> Mit freundlichen Gruessen / Best regards
> 
> Oliver Dzombic
> IP-Interactive
> 
> mailto:i...@ip-interactive.de 
> 
> Anschrift:
> 
> IP Interactive UG ( haftungsbeschraenkt )
> Zum Sonnenberg 1-3
> 63571 Gelnhausen
> 
> HRB 93402 beim Amtsgericht Hanau
> Geschäftsführung: Oliver Dzombic
> 
> Steuer Nr.: 35 236 3622 1
> UST ID: DE274086107
> 
> 
> Am 18.07.2016 um 11:24 schrieb m13913886...@yahoo.com
> :
>> Where to find base docu?
>> Official website does not update the document
>>
>>
>> On Monday, July 18, 2016 5:16 PM, Oliver Dzombic
>> mailto:i...@ip-interactive.de>> wrote:
>>
>>
>> Hi
>>
>> i suggest you to read some base docu about that.
>>
>> osd_tier_promote_max_bytes_sec = how much bytes per second are going
> on tier
>>
>> ceph osd pool set ssd-pool target_max_bytes = maximum size in bytes on
>> this specific pool ( its like a quota )
>>
>> --
>> Mit freundlichen Gruessen / Best regards
>>
>> Oliver Dzombic
>> IP-Interactive
>>
>> mailto:i...@ip-interactive.de 
> >
>>
>> Anschrift:
>>
>> IP Interactive UG ( haftungsbeschraenkt )
>> Zum Sonnenberg 1-3
>> 63571 Gelnhausen
>>
>> HRB 93402 beim Amtsgericht Hanau
>> Geschäftsführung: Oliver Dzombic
>>
>> Steuer Nr.: 35 236 3622 1
>> UST ID: DE274086107
>>
>>
>> Am 18.07.2016 um 11:14 schrieb m13913886...@yahoo.com
> 
>> >:
>>> what is "osd_tier_promote_max_bytes_sec" in ceph.conf file  and command
>>> "ceph osd pool set ssd-pool target_max_bytes" are not the same ?
>>>
>>>
>>> On Monday, July 18, 2016 4:40 PM, Oliver Dzombic
>>> mailto:i...@ip-interactive.de>
> >> wrote:
>>>
>>>
>>> Hi,
>>>
>>> osd_tier_promote_max_bytes_sec
>>>
>>> is your friend.
>>>
>>> --
>>> Mit freundlichen Gruessen / Best regards
>>>
>>> Oliver Dzombic
>>> IP-Interactive
>>>
>>> mailto:i...@ip-interactive.de 
> >
>> 
> 

Re: [ceph-users] ceph OSD with 95% full

2016-07-19 Thread Wido den Hollander

> Op 19 juli 2016 om 11:55 schreef M Ranga Swami Reddy :
> 
> 
> Thanks for detail...
> When an OSD is 95% full, then that specific OSD's write IO blocked.
> 

No, the *whole* cluster will block. In the OSDMap the flag 'full' is set which 
causes all I/O to stop (even read!) until you make sure the OSD drops below 95%.

Wido

> Thanks
> Swami
> 
> On Tue, Jul 19, 2016 at 3:07 PM, Christian Balzer  wrote:
> >
> > Hello,
> >
> > On Tue, 19 Jul 2016 14:23:32 +0530 M Ranga Swami Reddy wrote:
> >
> >> >> Using ceph cluster with 100+ OSDs and cluster is filled with 60% data.
> >> >> One of the OSD is 95% full.
> >> >> If an OSD is 95% full, is it impact the any storage operation? Is this
> >> >> impacts on VM/Instance?
> >>
> >> >Yes, one OSD will impact whole cluster. It will block write operations to 
> >> >the cluster
> >>
> >> Thanks for clarification. Really?? Is this(OSD 95%) full designed to
> >> block write I/O of ceph cluster?
> >>
> > Really.
> > To be more precise, any I/O that touches any PG on that OSD will block.
> > So with a sufficiently large cluster you may have some, few, I/Os still go
> > through as they don't use that OSD at all.
> >
> > That's why:
> >
> > 1. Ceph has the near-full warning (which of course may need to be
> > adjusted to correctly reflect things, especially with smaller clusters).
> > Once you get that warning, you NEED to take action immediately.
> >
> > 2. You want to graph the space utilization of all your OSDs with something
> > like graphite. That allows you to spot trends of uneven data distribution
> > early and thus react early to it.
> > I re-weight (CRUSH re-weight, as this is permanent and my clusters aren't
> > growing frequently) OSDs so they they are at least within 10% of each
> > other.
> >
> > Christian
> >> Because I have around 251 OSDs out which one OSD is 95% full, but
> >> other 250 OSDs not in near full also...
> >>
> >> Thanks
> >> Swami
> >>
> >>
> >> On Tue, Jul 19, 2016 at 2:17 PM, Henrik Korkuc  wrote:
> >> > On 16-07-19 11:44, M Ranga Swami Reddy wrote:
> >> >>
> >> >> Hi,
> >> >> Using ceph cluster with 100+ OSDs and cluster is filled with 60% data.
> >> >> One of the OSD is 95% full.
> >> >> If an OSD is 95% full, is it impact the any storage operation? Is this
> >> >> impacts on VM/Instance?
> >> >
> >> > Yes, one OSD will impact whole cluster. It will block write operations to
> >> > the cluster
> >> >>
> >> >> Immediately I have reduced the OSD weight, which was filled with 95 %
> >> >> data. After re-weight, data rebalanaced and OSD came to normal state
> >> >> (ie < 80%) with 1 hour time frame.
> >> >>
> >> >>
> >> >> Thanks
> >> >> Swami
> >> >> ___
> >> >> ceph-users mailing list
> >> >> ceph-users@lists.ceph.com
> >> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> >
> >> >
> >> >
> >> > ___
> >> > ceph-users mailing list
> >> > ceph-users@lists.ceph.com
> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> ___
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>
> >
> >
> > --
> > Christian BalzerNetwork/Systems Engineer
> > ch...@gol.com   Global OnLine Japan/Rakuten Communications
> > http://www.gol.com/
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph OSD with 95% full

2016-07-19 Thread M Ranga Swami Reddy
Thanks for detail...
When an OSD is 95% full, then that specific OSD's write IO blocked.

Thanks
Swami

On Tue, Jul 19, 2016 at 3:07 PM, Christian Balzer  wrote:
>
> Hello,
>
> On Tue, 19 Jul 2016 14:23:32 +0530 M Ranga Swami Reddy wrote:
>
>> >> Using ceph cluster with 100+ OSDs and cluster is filled with 60% data.
>> >> One of the OSD is 95% full.
>> >> If an OSD is 95% full, is it impact the any storage operation? Is this
>> >> impacts on VM/Instance?
>>
>> >Yes, one OSD will impact whole cluster. It will block write operations to 
>> >the cluster
>>
>> Thanks for clarification. Really?? Is this(OSD 95%) full designed to
>> block write I/O of ceph cluster?
>>
> Really.
> To be more precise, any I/O that touches any PG on that OSD will block.
> So with a sufficiently large cluster you may have some, few, I/Os still go
> through as they don't use that OSD at all.
>
> That's why:
>
> 1. Ceph has the near-full warning (which of course may need to be
> adjusted to correctly reflect things, especially with smaller clusters).
> Once you get that warning, you NEED to take action immediately.
>
> 2. You want to graph the space utilization of all your OSDs with something
> like graphite. That allows you to spot trends of uneven data distribution
> early and thus react early to it.
> I re-weight (CRUSH re-weight, as this is permanent and my clusters aren't
> growing frequently) OSDs so they they are at least within 10% of each
> other.
>
> Christian
>> Because I have around 251 OSDs out which one OSD is 95% full, but
>> other 250 OSDs not in near full also...
>>
>> Thanks
>> Swami
>>
>>
>> On Tue, Jul 19, 2016 at 2:17 PM, Henrik Korkuc  wrote:
>> > On 16-07-19 11:44, M Ranga Swami Reddy wrote:
>> >>
>> >> Hi,
>> >> Using ceph cluster with 100+ OSDs and cluster is filled with 60% data.
>> >> One of the OSD is 95% full.
>> >> If an OSD is 95% full, is it impact the any storage operation? Is this
>> >> impacts on VM/Instance?
>> >
>> > Yes, one OSD will impact whole cluster. It will block write operations to
>> > the cluster
>> >>
>> >> Immediately I have reduced the OSD weight, which was filled with 95 %
>> >> data. After re-weight, data rebalanaced and OSD came to normal state
>> >> (ie < 80%) with 1 hour time frame.
>> >>
>> >>
>> >> Thanks
>> >> Swami
>> >> ___
>> >> ceph-users mailing list
>> >> ceph-users@lists.ceph.com
>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>> >
>> >
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
>
> --
> Christian BalzerNetwork/Systems Engineer
> ch...@gol.com   Global OnLine Japan/Rakuten Communications
> http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph OSD with 95% full

2016-07-19 Thread Christian Balzer

Hello,

On Tue, 19 Jul 2016 14:23:32 +0530 M Ranga Swami Reddy wrote:

> >> Using ceph cluster with 100+ OSDs and cluster is filled with 60% data.
> >> One of the OSD is 95% full.
> >> If an OSD is 95% full, is it impact the any storage operation? Is this
> >> impacts on VM/Instance?
> 
> >Yes, one OSD will impact whole cluster. It will block write operations to 
> >the cluster
> 
> Thanks for clarification. Really?? Is this(OSD 95%) full designed to
> block write I/O of ceph cluster?
>
Really.
To be more precise, any I/O that touches any PG on that OSD will block.
So with a sufficiently large cluster you may have some, few, I/Os still go
through as they don't use that OSD at all.

That's why:

1. Ceph has the near-full warning (which of course may need to be
adjusted to correctly reflect things, especially with smaller clusters).
Once you get that warning, you NEED to take action immediately. 

2. You want to graph the space utilization of all your OSDs with something
like graphite. That allows you to spot trends of uneven data distribution
early and thus react early to it.
I re-weight (CRUSH re-weight, as this is permanent and my clusters aren't
growing frequently) OSDs so they they are at least within 10% of each
other.

Christian
> Because I have around 251 OSDs out which one OSD is 95% full, but
> other 250 OSDs not in near full also...
> 
> Thanks
> Swami
> 
> 
> On Tue, Jul 19, 2016 at 2:17 PM, Henrik Korkuc  wrote:
> > On 16-07-19 11:44, M Ranga Swami Reddy wrote:
> >>
> >> Hi,
> >> Using ceph cluster with 100+ OSDs and cluster is filled with 60% data.
> >> One of the OSD is 95% full.
> >> If an OSD is 95% full, is it impact the any storage operation? Is this
> >> impacts on VM/Instance?
> >
> > Yes, one OSD will impact whole cluster. It will block write operations to
> > the cluster
> >>
> >> Immediately I have reduced the OSD weight, which was filled with 95 %
> >> data. After re-weight, data rebalanaced and OSD came to normal state
> >> (ie < 80%) with 1 hour time frame.
> >>
> >>
> >> Thanks
> >> Swami
> >> ___
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 


-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Rakuten Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-19 Thread Yan, Zheng
On Tue, Jul 19, 2016 at 1:03 PM, Goncalo Borges
 wrote:
> Hi All...
>
> We do have some good news.
>
> As promised, I've recompiled ceph 10.2.2 (in an intel processor without
> AVX2) with and without the patch provided by Zheng. It turns out that
> Zheng's patch is the solution for the segfaults we saw in ObjectCacher when
> ceph-fuse runs in AMD 62xx processors.
>
> To convince ourselves that the problem was really solved, we executed 40
> jobs (with the user application where the ObjectCacher segfault was seen for
> the first time) in a dozen of AMD 62XX VMs, and none failed. Before,
> ceph-fuse was always segfaulting a couple of minutes after job startup.
>
> Thank you for all the help. With all the bits and pieces from everyone we
> were able to nail this one.
>
> I am a bit surprised that no other complains appeared in the mailing list
> for both of the issues we saw: first the locking issue and then the
> ObjectCacher issue. This makes me think that we are using ceph-fuse in a
> different way than others (probably exposing it to real user applications
> more heavily than other communities). If you actually need a beta tester
> next time, I think it is also in our best interest to participate.
>
> I do have a last question.
>
> While searching / googling for tips, I saw an issue claiming that
> 'fuse_disable_pagecache' should be set to true in ceph.conf. Can you briefly
> explain is this is correct and what is the con of not using it? (just or me
> to understand it).

For ceph-fuse, there are two caches, one is in ceph-fuse, another one
is kernel pagecache. When multiple clients read/write a file at the
same time, ceph-fuse needs to disable cache and let reads/writes go to
OSDs directly. ceph-fuse can disable its own cache, but there is no
way to disable the kernel pagecache dynamically.  So client may read
stale data from the kernel pagecache.

>
> Thank you in Advance
>
> Cheers
>
> Goncalo
>
>
>
> On 07/15/2016 01:35 PM, Goncalo Borges wrote:
>
> Thanks Zheng...
>
> Now that we have identified the exact context when the segfault appears
> (only in AMD 62XX) I think it should be safe to understand in each situation
> does the crash appears.
>
> My current compilation is ongoing and I will then test it.
>
> If it fails, I will recompile including your patch.
>
> Will report here afterwards.
>
> Thanks for the feedback.
>
> Cheers
>
> Goncalo
>
>
> On 07/15/2016 01:19 PM, Yan, Zheng wrote:
>
> On Fri, Jul 15, 2016 at 9:35 AM, Goncalo Borges
>  wrote:
>
> Hi All...
>
> I've seen that Zheng, Brad, Pat and Greg already updated or made some
> comments on the bug issue. Zheng also proposes a simple patch. However, I do
> have a bit more information. We do think we have identified the source of
> the problem and that we can correct it. Therefore, I would propose that you
> hold any work on the issue until we test our hypothesis. I'll try to
> summarize it:
>
> 1./ After being convinced that the ceph-fuse segfault we saw in specific VMs
> was not memory related, I decided to run the user application in multiple
> zones of the openstack cloud we use. We scale up our resources by using a
> public funded openstack cloud which spawns machines (using always the same
> image) in multiple availability zones. In the majority of the cases we limit
> our VMs to (normally) the same availability zone because it seats in the
> same data center as our infrastructure. This experiment showed that
> ceph-fuse does not segfaults in other availability zones with multiple VMS
> of different sizes and types. So the problem was restricted to the
> availability zone we normally use as our default one.
>
> 2./ I've them created new VMs of multiple sizes and types  in our 'default'
> availability zone and rerun the user application. This new experiment,
> running in newly created VMs, showed ceph-fuse segfaults independent of the
> VM types but not in all VMs. For example, in this new test, ceph-fuse was
> segfaulting in some 4 and 8 core VMs but not in all.
>
> 3./ I've then decided to inspect the CPU types, and the breakthrough was
> that I got a 100% correlation of ceph-fuse segfaults with AMD 62xx processor
> VMs. This availability zone has only 2 types of hypervisors: an old one with
> AMD 62xx processors, and a new one with Intel processors. If my jobs run in
> a VM with Intel, everything is ok. If my jobs run in AMD 62xx, ceph-fuse
> segfaults. Actually, the segfault is almost immediate in 4 core AMD 62xx VMs
> but takes much more time in 8-core AMD62xx VMs.
>
> 4./ I've then crosschecked what processors were used in the successful jobs
> executed in the other availability zones: Several types of intel, AMD 63xx
> but not AMD 62xx processors.
>
> 5./ Talking with my awesome colleague Sean, he remembered some discussions
> about applications segfaulting in AMD processors when compiled in an Intel
> processor with AVX2 extension. Actually, I compiled ceph 10.2.2 in an intel
> processor with AVX2 but ceph 9.2.0 was compiled seve

Re: [ceph-users] ceph OSD with 95% full

2016-07-19 Thread M Ranga Swami Reddy
>> Using ceph cluster with 100+ OSDs and cluster is filled with 60% data.
>> One of the OSD is 95% full.
>> If an OSD is 95% full, is it impact the any storage operation? Is this
>> impacts on VM/Instance?

>Yes, one OSD will impact whole cluster. It will block write operations to the 
>cluster

Thanks for clarification. Really?? Is this(OSD 95%) full designed to
block write I/O of ceph cluster?
Because I have around 251 OSDs out which one OSD is 95% full, but
other 250 OSDs not in near full also...

Thanks
Swami


On Tue, Jul 19, 2016 at 2:17 PM, Henrik Korkuc  wrote:
> On 16-07-19 11:44, M Ranga Swami Reddy wrote:
>>
>> Hi,
>> Using ceph cluster with 100+ OSDs and cluster is filled with 60% data.
>> One of the OSD is 95% full.
>> If an OSD is 95% full, is it impact the any storage operation? Is this
>> impacts on VM/Instance?
>
> Yes, one OSD will impact whole cluster. It will block write operations to
> the cluster
>>
>> Immediately I have reduced the OSD weight, which was filled with 95 %
>> data. After re-weight, data rebalanaced and OSD came to normal state
>> (ie < 80%) with 1 hour time frame.
>>
>>
>> Thanks
>> Swami
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph OSD with 95% full

2016-07-19 Thread Henrik Korkuc

On 16-07-19 11:44, M Ranga Swami Reddy wrote:

Hi,
Using ceph cluster with 100+ OSDs and cluster is filled with 60% data.
One of the OSD is 95% full.
If an OSD is 95% full, is it impact the any storage operation? Is this
impacts on VM/Instance?
Yes, one OSD will impact whole cluster. It will block write operations 
to the cluster

Immediately I have reduced the OSD weight, which was filled with 95 %
data. After re-weight, data rebalanaced and OSD came to normal state
(ie < 80%) with 1 hour time frame.


Thanks
Swami
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph OSD with 95% full

2016-07-19 Thread M Ranga Swami Reddy
Hi,
Using ceph cluster with 100+ OSDs and cluster is filled with 60% data.
One of the OSD is 95% full.
If an OSD is 95% full, is it impact the any storage operation? Is this
impacts on VM/Instance?

Immediately I have reduced the OSD weight, which was filled with 95 %
data. After re-weight, data rebalanaced and OSD came to normal state
(ie < 80%) with 1 hour time frame.


Thanks
Swami
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] how to use cache tiering with proxy in ceph-10.2.2

2016-07-19 Thread Nick Fisk
 

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
m13913886...@yahoo.com
Sent: 19 July 2016 07:44
To: Oliver Dzombic ; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] how to use cache tiering with proxy in ceph-10.2.2

 

I have configured ceph.conf with "osd_tier_promote_max_bytes_sec" in [osd] 
Attributes. But it still invalid.

I do command --show-config discovery that it has not been modified.

 

I don’t know why they are not getting picked up (did you restart the OSD’s), 
but you don’t want to do that anyway. Promoting too much will slow you down.

Note: Its caching tiering not just a cache. You only want stuff in cache if 
it’s hot.

 

 

[root@node01 ~]# cat /etc/ceph/ceph.conf | grep tier
osd_tier_promote_max_objects_sec=20
osd_tier_promote_max_bytes_sec=16106127360

 

As above, way too high. Defaults are sensible, maybe 2x/4x if you need the 
cache to warm up quicker

 

 [root@node01 ~]# ceph --show-config | grep tier
mon_debug_unsafe_allow_tier_with_nonempty_snaps = false
osd_tier_promote_max_objects_sec = 5242880
osd_tier_promote_max_bytes_sec = 25
osd_tier_default_cache_mode = writeback
osd_tier_default_cache_hit_set_count = 4
osd_tier_default_cache_hit_set_period = 1200

 

Drop this to 60, unless your workload is very infrequent IO

 


osd_tier_default_cache_hit_set_type = bloom
osd_tier_default_cache_min_read_recency_for_promote = 1
osd_tier_default_cache_min_write_recency_for_promote = 1

 

Make these at least 2, otherwise you will promote on every IO


osd_tier_default_cache_hit_set_grade_decay_rate = 20
osd_tier_default_cache_hit_set_search_last_n = 1

 

and cache tiering does not work , low iops.

 

 

On Monday, July 18, 2016 5:33 PM, "m13913886...@yahoo.com 
 " mailto:m13913886...@yahoo.com> > wrote:

 

thank you very much! 

 

On Monday, July 18, 2016 5:31 PM, Oliver Dzombic mailto:i...@ip-interactive.de> > wrote:

 

Hi,

everything is here:

http://docs.ceph.com/docs/jewel/

except

osd_tier_promote_max_bytes_sec

and other stuff, but its enough there that you can make it work.

-- 
Mit freundlichen Gruessen / Best regards

Oliver Dzombic
IP-Interactive

mailto:i...@ip-interactive.de  

Anschrift:

IP Interactive UG ( haftungsbeschraenkt )
Zum Sonnenberg 1-3
63571 Gelnhausen

HRB 93402 beim Amtsgericht Hanau
Geschäftsführung: Oliver Dzombic

Steuer Nr.: 35 236 3622 1
UST ID: DE274086107


Am 18.07.2016 um 11:24 schrieb m13913886...@yahoo.com 
 :
> Where to find base docu?
> Official website does not update the document
> 
> 
> On Monday, July 18, 2016 5:16 PM, Oliver Dzombic
> mailto:i...@ip-interactive.de> > wrote:
> 
> 
> Hi
> 
> i suggest you to read some base docu about that.
> 
> osd_tier_promote_max_bytes_sec = how much bytes per second are going on tier
> 
> ceph osd pool set ssd-pool target_max_bytes = maximum size in bytes on
> this specific pool ( its like a quota )
> 
> -- 
> Mit freundlichen Gruessen / Best regards
> 
> Oliver Dzombic
> IP-Interactive
> 
> mailto:i...@ip-interactive.de   
>  >
> 
> Anschrift:
> 
> IP Interactive UG ( haftungsbeschraenkt )
> Zum Sonnenberg 1-3
> 63571 Gelnhausen
> 
> HRB 93402 beim Amtsgericht Hanau
> Geschäftsführung: Oliver Dzombic
> 
> Steuer Nr.: 35 236 3622 1
> UST ID: DE274086107
> 
> 
> Am 18.07.2016 um 11:14 schrieb m13913886...@yahoo.com 
>  
>  >:
>> what is "osd_tier_promote_max_bytes_sec" in ceph.conf file  and command
>> "ceph osd pool set ssd-pool target_max_bytes" are not the same ?
>>
>>
>> On Monday, July 18, 2016 4:40 PM, Oliver Dzombic
>> mailto:i...@ip-interactive.de>  
>>  >> wrote:
>>
>>
>> Hi,
>>
>> osd_tier_promote_max_bytes_sec
>>
>> is your friend.
>>
>> --
>> Mit freundlichen Gruessen / Best regards
>>
>> Oliver Dzombic
>> IP-Interactive
>>
>> mailto:i...@ip-interactive.de   
>>  >
>   
>  >>
>>
>> Anschrift:
>>
>> IP Interactive UG ( haftungsbeschraenkt )
>> Zum Sonnenberg 1-3
>> 63571 Gelnhausen
>>
>> HRB 93402 beim Amtsgericht Hanau
>> Geschäftsführung: Oliver Dzombic
>>
>> Steuer Nr.: 35 236 3622 1
>> UST ID: DE274086107
>>
>>
>> Am 18.07.2016 um 10:19 schrieb m13913886...@yahoo.com 
>>  
>  >
>>   
>>  >>:
>>> hello cepher!
>>>I have a problem like this :
>>>I want to config a cache tiering to m