Re: [ceph-users] Ceph full cluster

2016-09-26 Thread Yoann Moulin
Hello,

> Yes, you are right!
> I've changed this for all pools, but not for last two!
> 
> pool 1 '.rgw.root' replicated size 2 min_size 2 crush_ruleset 0 object_hash 
> rjenkins pg_num 8 pgp_num 8 last_change 27 owner
> 18446744073709551615 flags hashpspool strip
> e_width 0
> pool 2 'default.rgw.control' replicated size 2 min_size 2 crush_ruleset 0 
> object_hash rjenkins pg_num 8 pgp_num 8 last_change 29 owner
> 18446744073709551615 flags hashps
> pool stripe_width 0
> pool 3 'default.rgw.data.root' replicated size 2 min_size 2 crush_ruleset 0 
> object_hash rjenkins pg_num 8 pgp_num 8 last_change 31 owner
> 18446744073709551615 flags hash
> pspool stripe_width 0
> pool 4 'default.rgw.gc' replicated size 2 min_size 2 crush_ruleset 0 
> object_hash rjenkins pg_num 8 pgp_num 8 last_change 33 owner
> 18446744073709551615 flags hashpspool
> stripe_width 0
> pool 5 'default.rgw.log' replicated size 2 min_size 2 crush_ruleset 0 
> object_hash rjenkins pg_num 8 pgp_num 8 last_change 35 owner
> 18446744073709551615 flags hashpspool
> stripe_width 0
> pool 6 'default.rgw.users.uid' replicated size 2 min_size 2 crush_ruleset 0 
> object_hash rjenkins pg_num 8 pgp_num 8 last_change 37 owner
> 18446744073709551615 flags hash
> pspool stripe_width 0
> pool 7 'default.rgw.users.keys' replicated size 2 min_size 2 crush_ruleset 0 
> object_hash rjenkins pg_num 8 pgp_num 8 last_change 39 owner
> 18446744073709551615 flags has
> hpspool stripe_width 0
> pool 8 'default.rgw.meta' replicated size 2 min_size 2 crush_ruleset 0 
> object_hash rjenkins pg_num 8 pgp_num 8 last_change 41 owner
> 18446744073709551615 flags hashpspoo
> l stripe_width 0
> pool 9 'default.rgw.buckets.index' replicated size 3 min_size 2 crush_ruleset 
> 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 43 flags
> hashpspool stripe_width 0
> pool 10 'default.rgw.buckets.data' replicated size 3 min_size 2 crush_ruleset 
> 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 45 flags
> hashpspool stripe_width 0

Be-careful, if you set size 2 and min_size 2, your cluster will be in 
HEALTH_ERR state if you loose only OSD, if you want to set "size 2" (which
is not recommended) you should set min_size to 1.

Best Regards.

Yoann Moulin

> On Mon, Sep 26, 2016 at 2:05 PM, Burkhard Linke 
>  > wrote:
> 
> Hi,
> 
> 
> On 09/26/2016 12:58 PM, Dmitriy Lock wrote:
>> Hello all!
>> I need some help with my Ceph cluster.
>> I've installed ceph cluster with two physical servers with osd /data 40G 
>> on each.
>> Here is ceph.conf:
>> [global]
>> fsid = 377174ff-f11f-48ec-ad8b-ff450d43391c
>> mon_initial_members = vm35, vm36
>> mon_host = 192.168.1.35,192.168.1.36
>> auth_cluster_required = cephx
>> auth_service_required = cephx
>> auth_client_required = cephx
>>
>> osd pool default size = 2  # Write an object 2 times.
>> osd pool default min size = 1 # Allow writing one copy in a degraded 
>> state.
>>
>> osd pool default pg num = 200
>> osd pool default pgp num = 200
>>
>> Right after creation it was HEALTH_OK, and i've started with filling it. 
>> I've wrote 40G data to cluster using Rados gateway, but cluster
>> uses all avaiable space and keep growing after i've added two another 
>> osd - 10G /data1 on each server.
>> Here is tree output:
>> # ceph osd tree
>> ID WEIGHT  TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY  
>> -1 0.09756 root default 
>> -2 0.04878 host vm35
>> 0 0.03899 osd.0  up  1.0  1.0  
>> 2 0.00980 osd.2  up  1.0  1.0  
>> -3 0.04878 host vm36
>> 1 0.03899 osd.1  up  1.0  1.0  
>> 3 0.00980 osd.3  up  1.0  1.0 
>>
>> and health:
>> root@vm35:/etc# ceph health
>> HEALTH_ERR 5 pgs backfill_toofull; 15 pgs degraded; 16 pgs stuck 
>> unclean; 15 pgs undersized; recovery 87176/300483 objects degraded
>> (29.012%); recovery 62272/300483 obj
>> ects misplaced (20.724%); 1 full osd(s); 2 near full osd(s); pool 
>> default.rgw.buckets.data has many more objects per pg than average (too
>> few pgs?)
>> root@vm35:/etc# ceph health detail
>> HEALTH_ERR 5 pgs backfill_toofull; 15 pgs degraded; 16 pgs stuck 
>> unclean; 15 pgs undersized; recovery 87176/300483 objects degraded
>> (29.012%); recovery 62272/300483 obj
>> ects misplaced (20.724%); 1 full osd(s); 2 near full osd(s); pool 
>> default.rgw.buckets.data has many more objects per pg than average (too
>> few pgs?)
>> pg 10.5 is stuck unclean since forever, current state 
>> active+undersized+degraded, last acting [1,0]
>> pg 9.6 is stuck unclean since forever, current state 
>> 

Re: [ceph-users] Ceph full cluster

2016-09-26 Thread Dmitriy Lock
Yes, you are right!
I've changed this for all pools, but not for last two!

pool 1 '.rgw.root' replicated size 2 min_size 2 crush_ruleset 0 object_hash
rjenkins pg_num 8 pgp_num 8 last_change 27 owner 18446744073709551615 flags
hashpspool strip
e_width 0
pool 2 'default.rgw.control' replicated size 2 min_size 2 crush_ruleset 0
object_hash rjenkins pg_num 8 pgp_num 8 last_change 29 owner
18446744073709551615 flags hashps
pool stripe_width 0
pool 3 'default.rgw.data.root' replicated size 2 min_size 2 crush_ruleset 0
object_hash rjenkins pg_num 8 pgp_num 8 last_change 31 owner
18446744073709551615 flags hash
pspool stripe_width 0
pool 4 'default.rgw.gc' replicated size 2 min_size 2 crush_ruleset 0
object_hash rjenkins pg_num 8 pgp_num 8 last_change 33 owner
18446744073709551615 flags hashpspool
stripe_width 0
pool 5 'default.rgw.log' replicated size 2 min_size 2 crush_ruleset 0
object_hash rjenkins pg_num 8 pgp_num 8 last_change 35 owner
18446744073709551615 flags hashpspool
stripe_width 0
pool 6 'default.rgw.users.uid' replicated size 2 min_size 2 crush_ruleset 0
object_hash rjenkins pg_num 8 pgp_num 8 last_change 37 owner
18446744073709551615 flags hash
pspool stripe_width 0
pool 7 'default.rgw.users.keys' replicated size 2 min_size 2 crush_ruleset
0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 39 owner
18446744073709551615 flags has
hpspool stripe_width 0
pool 8 'default.rgw.meta' replicated size 2 min_size 2 crush_ruleset 0
object_hash rjenkins pg_num 8 pgp_num 8 last_change 41 owner
18446744073709551615 flags hashpspoo
l stripe_width 0
pool 9 'default.rgw.buckets.index' replicated size 3 min_size 2
crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 43
flags hashpspool stripe_width 0
pool 10 'default.rgw.buckets.data' replicated size 3 min_size 2
crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 45
flags hashpspool stripe_width 0

Changing right now.
Thank you very much!

On Mon, Sep 26, 2016 at 2:05 PM, Burkhard Linke <
burkhard.li...@computational.bio.uni-giessen.de> wrote:

> Hi,
>
> On 09/26/2016 12:58 PM, Dmitriy Lock wrote:
>
> Hello all!
> I need some help with my Ceph cluster.
> I've installed ceph cluster with two physical servers with osd /data 40G
> on each.
> Here is ceph.conf:
> [global]
> fsid = 377174ff-f11f-48ec-ad8b-ff450d43391c
> mon_initial_members = vm35, vm36
> mon_host = 192.168.1.35,192.168.1.36
> auth_cluster_required = cephx
> auth_service_required = cephx
> auth_client_required = cephx
>
> osd pool default size = 2  # Write an object 2 times.
> osd pool default min size = 1 # Allow writing one copy in a degraded
> state.
>
> osd pool default pg num = 200
> osd pool default pgp num = 200
>
> Right after creation it was HEALTH_OK, and i've started with filling it.
> I've wrote 40G data to cluster using Rados gateway, but cluster uses all
> avaiable space and keep growing after i've added two another osd - 10G
> /data1 on each server.
> Here is tree output:
> # ceph osd tree
> ID WEIGHT  TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
> -1 0.09756 root default
> -2 0.04878 host vm35
> 0 0.03899 osd.0  up  1.0  1.0
> 2 0.00980 osd.2  up  1.0  1.0
> -3 0.04878 host vm36
> 1 0.03899 osd.1  up  1.0  1.0
> 3 0.00980 osd.3  up  1.0  1.0
>
> and health:
> root@vm35:/etc# ceph health
> HEALTH_ERR 5 pgs backfill_toofull; 15 pgs degraded; 16 pgs stuck unclean;
> 15 pgs undersized; recovery 87176/300483 objects degraded (29.012%);
> recovery 62272/300483 obj
> ects misplaced (20.724%); 1 full osd(s); 2 near full osd(s); pool
> default.rgw.buckets.data has many more objects per pg than average (too few
> pgs?)
> root@vm35:/etc# ceph health detail
> HEALTH_ERR 5 pgs backfill_toofull; 15 pgs degraded; 16 pgs stuck unclean;
> 15 pgs undersized; recovery 87176/300483 objects degraded (29.012%);
> recovery 62272/300483 obj
> ects misplaced (20.724%); 1 full osd(s); 2 near full osd(s); pool
> default.rgw.buckets.data has many more objects per pg than average (too few
> pgs?)
> pg 10.5 is stuck unclean since forever, current state
> active+undersized+degraded, last acting [1,0]
> pg 9.6 is stuck unclean since forever, current state
> active+undersized+degraded+remapped+backfill_toofull, last acting [1,0]
> pg 10.4 is stuck unclean since forever, current state active+remapped,
> last acting [3,0,1]
> pg 9.7 is stuck unclean since forever, current state
> active+undersized+degraded+remapped+backfill_toofull, last acting [1,0]
> pg 10.7 is stuck unclean since forever, current state
> active+undersized+degraded+remapped+backfill_toofull, last acting [0,1]
> pg 9.4 is stuck unclean since forever, current state
> active+undersized+degraded, last acting [1,0]
> pg 9.1 is stuck unclean since forever, current state
> active+undersized+degraded, last acting [0,3]
> pg 10.2 is stuck unclean since forever, current state
> active+undersized+degraded, 

Re: [ceph-users] Ceph full cluster

2016-09-26 Thread Burkhard Linke

Hi,


On 09/26/2016 12:58 PM, Dmitriy Lock wrote:

Hello all!
I need some help with my Ceph cluster.
I've installed ceph cluster with two physical servers with osd /data 
40G on each.

Here is ceph.conf:
[global]
fsid = 377174ff-f11f-48ec-ad8b-ff450d43391c
mon_initial_members = vm35, vm36
mon_host = 192.168.1.35,192.168.1.36
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx

osd pool default size = 2  # Write an object 2 times.
osd pool default min size = 1 # Allow writing one copy in a degraded 
state.


osd pool default pg num = 200
osd pool default pgp num = 200

Right after creation it was HEALTH_OK, and i've started with filling 
it. I've wrote 40G data to cluster using Rados gateway, but cluster 
uses all avaiable space and keep growing after i've added two another 
osd - 10G /data1 on each server.

Here is tree output:
# ceph osd tree
ID WEIGHT  TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 0.09756 root default
-2 0.04878 host vm35
0 0.03899 osd.0  up  1.0  1.0
2 0.00980 osd.2  up  1.0  1.0
-3 0.04878 host vm36
1 0.03899 osd.1  up  1.0  1.0
3 0.00980 osd.3  up  1.0  1.0

and health:
root@vm35:/etc# ceph health
HEALTH_ERR 5 pgs backfill_toofull; 15 pgs degraded; 16 pgs stuck 
unclean; 15 pgs undersized; recovery 87176/300483 objects degraded 
(29.012%); recovery 62272/300483 obj
ects misplaced (20.724%); 1 full osd(s); 2 near full osd(s); pool 
default.rgw.buckets.data has many more objects per pg than average 
(too few pgs?)

root@vm35:/etc# ceph health detail
HEALTH_ERR 5 pgs backfill_toofull; 15 pgs degraded; 16 pgs stuck 
unclean; 15 pgs undersized; recovery 87176/300483 objects degraded 
(29.012%); recovery 62272/300483 obj
ects misplaced (20.724%); 1 full osd(s); 2 near full osd(s); pool 
default.rgw.buckets.data has many more objects per pg than average 
(too few pgs?)
pg 10.5 is stuck unclean since forever, current state 
active+undersized+degraded, last acting [1,0]
pg 9.6 is stuck unclean since forever, current state 
active+undersized+degraded+remapped+backfill_toofull, last acting [1,0]
pg 10.4 is stuck unclean since forever, current state active+remapped, 
last acting [3,0,1]
pg 9.7 is stuck unclean since forever, current state 
active+undersized+degraded+remapped+backfill_toofull, last acting [1,0]
pg 10.7 is stuck unclean since forever, current state 
active+undersized+degraded+remapped+backfill_toofull, last acting [0,1]
pg 9.4 is stuck unclean since forever, current state 
active+undersized+degraded, last acting [1,0]
pg 9.1 is stuck unclean since forever, current state 
active+undersized+degraded, last acting [0,3]
pg 10.2 is stuck unclean since forever, current state 
active+undersized+degraded, last acting [1,0]
pg 9.0 is stuck unclean since forever, current state 
active+undersized+degraded, last acting [1,2]
pg 10.3 is stuck unclean since forever, current state 
active+undersized+degraded, last acting [2,1]
pg 9.3 is stuck unclean since forever, current state 
active+undersized+degraded+remapped+backfill_toofull, last acting [1,0]
pg 10.0 is stuck unclean since forever, current state 
active+undersized+degraded+remapped+backfill_toofull, last acting [1,0]
pg 9.2 is stuck unclean since forever, current state 
active+undersized+degraded, last acting [0,1]
pg 10.1 is stuck unclean since forever, current state 
active+undersized+degraded, last acting [0,1]
pg 9.5 is stuck unclean since forever, current state 
active+undersized+degraded, last acting [1,0]
pg 10.6 is stuck unclean since forever, current state 
active+undersized+degraded, last acting [0,1]

pg 9.1 is active+undersized+degraded, acting [0,3]
pg 10.2 is active+undersized+degraded, acting [1,0]
pg 9.0 is active+undersized+degraded, acting [1,2]
pg 10.3 is active+undersized+degraded, acting [2,1]
pg 9.3 is active+undersized+degraded+remapped+backfill_toofull, acting 
[1,0]
pg 10.0 is active+undersized+degraded+remapped+backfill_toofull, 
acting [1,0]

pg 9.2 is active+undersized+degraded, acting [0,1]
pg 10.1 is active+undersized+degraded, acting [0,1]
pg 9.5 is active+undersized+degraded, acting [1,0]
pg 10.6 is active+undersized+degraded, acting [0,1]
pg 9.4 is active+undersized+degraded, acting [1,0]
pg 10.7 is active+undersized+degraded+remapped+backfill_toofull, 
acting [0,1]
pg 9.7 is active+undersized+degraded+remapped+backfill_toofull, acting 
[1,0]
pg 9.6 is active+undersized+degraded+remapped+backfill_toofull, acting 
[1,0]

pg 10.5 is active+undersized+degraded, acting [1,0]
recovery 87176/300483 objects degraded (29.012%)
recovery 62272/300483 objects misplaced (20.724%)
osd.1 is full at 95%
osd.2 is near full at 91%
osd.3 is near full at 91%
pool default.rgw.buckets.data objects per pg (12438) is more than 
17.8451 times cluster average (697)


In log i see this:
2016-09-26 10:37:21.688849 mon.0 192.168.1.35:6789/0 
 

[ceph-users] Ceph full cluster

2016-09-26 Thread Dmitriy Lock
Hello all!
I need some help with my Ceph cluster.
I've installed ceph cluster with two physical servers with osd /data 40G on
each.
Here is ceph.conf:
[global]
fsid = 377174ff-f11f-48ec-ad8b-ff450d43391c
mon_initial_members = vm35, vm36
mon_host = 192.168.1.35,192.168.1.36
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx

osd pool default size = 2  # Write an object 2 times.
osd pool default min size = 1 # Allow writing one copy in a degraded state.

osd pool default pg num = 200
osd pool default pgp num = 200

Right after creation it was HEALTH_OK, and i've started with filling it.
I've wrote 40G data to cluster using Rados gateway, but cluster uses all
avaiable space and keep growing after i've added two another osd - 10G
/data1 on each server.
Here is tree output:
# ceph osd tree
ID WEIGHT  TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 0.09756 root default
-2 0.04878 host vm35
0 0.03899 osd.0  up  1.0  1.0
2 0.00980 osd.2  up  1.0  1.0
-3 0.04878 host vm36
1 0.03899 osd.1  up  1.0  1.0
3 0.00980 osd.3  up  1.0  1.0

and health:
root@vm35:/etc# ceph health
HEALTH_ERR 5 pgs backfill_toofull; 15 pgs degraded; 16 pgs stuck unclean;
15 pgs undersized; recovery 87176/300483 objects degraded (29.012%);
recovery 62272/300483 obj
ects misplaced (20.724%); 1 full osd(s); 2 near full osd(s); pool
default.rgw.buckets.data has many more objects per pg than average (too few
pgs?)
root@vm35:/etc# ceph health detail
HEALTH_ERR 5 pgs backfill_toofull; 15 pgs degraded; 16 pgs stuck unclean;
15 pgs undersized; recovery 87176/300483 objects degraded (29.012%);
recovery 62272/300483 obj
ects misplaced (20.724%); 1 full osd(s); 2 near full osd(s); pool
default.rgw.buckets.data has many more objects per pg than average (too few
pgs?)
pg 10.5 is stuck unclean since forever, current state
active+undersized+degraded, last acting [1,0]
pg 9.6 is stuck unclean since forever, current state
active+undersized+degraded+remapped+backfill_toofull, last acting [1,0]
pg 10.4 is stuck unclean since forever, current state active+remapped, last
acting [3,0,1]
pg 9.7 is stuck unclean since forever, current state
active+undersized+degraded+remapped+backfill_toofull, last acting [1,0]
pg 10.7 is stuck unclean since forever, current state
active+undersized+degraded+remapped+backfill_toofull, last acting [0,1]
pg 9.4 is stuck unclean since forever, current state
active+undersized+degraded, last acting [1,0]
pg 9.1 is stuck unclean since forever, current state
active+undersized+degraded, last acting [0,3]
pg 10.2 is stuck unclean since forever, current state
active+undersized+degraded, last acting [1,0]
pg 9.0 is stuck unclean since forever, current state
active+undersized+degraded, last acting [1,2]
pg 10.3 is stuck unclean since forever, current state
active+undersized+degraded, last acting [2,1]
pg 9.3 is stuck unclean since forever, current state
active+undersized+degraded+remapped+backfill_toofull, last acting [1,0]
pg 10.0 is stuck unclean since forever, current state
active+undersized+degraded+remapped+backfill_toofull, last acting [1,0]
pg 9.2 is stuck unclean since forever, current state
active+undersized+degraded, last acting [0,1]
pg 10.1 is stuck unclean since forever, current state
active+undersized+degraded, last acting [0,1]
pg 9.5 is stuck unclean since forever, current state
active+undersized+degraded, last acting [1,0]
pg 10.6 is stuck unclean since forever, current state
active+undersized+degraded, last acting [0,1]
pg 9.1 is active+undersized+degraded, acting [0,3]
pg 10.2 is active+undersized+degraded, acting [1,0]
pg 9.0 is active+undersized+degraded, acting [1,2]
pg 10.3 is active+undersized+degraded, acting [2,1]
pg 9.3 is active+undersized+degraded+remapped+backfill_toofull, acting
[1,0]
pg 10.0 is active+undersized+degraded+remapped+backfill_toofull, acting
[1,0]
pg 9.2 is active+undersized+degraded, acting [0,1]
pg 10.1 is active+undersized+degraded, acting [0,1]
pg 9.5 is active+undersized+degraded, acting [1,0]
pg 10.6 is active+undersized+degraded, acting [0,1]
pg 9.4 is active+undersized+degraded, acting [1,0]
pg 10.7 is active+undersized+degraded+remapped+backfill_toofull, acting
[0,1]
pg 9.7 is active+undersized+degraded+remapped+backfill_toofull, acting
[1,0]
pg 9.6 is active+undersized+degraded+remapped+backfill_toofull, acting
[1,0]
pg 10.5 is active+undersized+degraded, acting [1,0]
recovery 87176/300483 objects degraded (29.012%)
recovery 62272/300483 objects misplaced (20.724%)
osd.1 is full at 95%
osd.2 is near full at 91%
osd.3 is near full at 91%
pool default.rgw.buckets.data objects per pg (12438) is more than 17.8451
times cluster average (697)

In log i see this:
2016-09-26 10:37:21.688849 mon.0 192.168.1.35:6789/0 4836 : cluster [INF]
pgmap v8364: 144 pgs: 5
active+undersized+degraded+remapped+backfill_toofull, 1 active+remapped,
128