[ceph-users] pg calculation question

2018-07-29 Thread Satish Patel
Folks,

I am building new ceph storage and i have currently 8 OSD total (in
future i am going to add more)

Based on official document I should do following to calculate Total PG

8 * 100 / 3 = 266  ( nearest power of 2 is 512 )

Now i have 2 pool at present in my ceph cluster (images & vms) so per
pool PG would be

8 * 100 / 3 / 2 = 133 ( nearest power of 2 is 256 )

so my pg_num and pgp_num both would be 256, am i correct?

Lets say after 1 week i add 8 more OSD in cluster then should i need
to re-run same calculation with 16 OSD and adjust all my pool pg_num &
pgp_num ?  is it safe to do in production?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Degraded data redundancy (low space): 1 pg backfill_toofull

2018-07-29 Thread Gregory Farnum
The backfill_toofull state means that one PG which tried to backfill
couldn’t do so because the *target* for backfilling didn’t have the amount
of free space necessary (with a large buffer so we don’t screw up!). It
doesn’t indicate anything about the overall state of the cluster, will
often resolve itself as the target OSD evacuates PGs of its own, and since
you did a pretty large rebalance is not very surprising. :)

On Sat, Jul 28, 2018 at 5:50 AM Sebastian Igerl  wrote:

> Hi,
>
> i added 4 more OSDs on my 4 node Test Cluster and now i'm in HEALTH_ERR
> state. Right now its still recovering, but still, should this happen ? None
> of my OSDs are full. Maybe i need more PGs ? But since my %USE is < 40% it
> should be still ok to recover without HEALTH_ERR ?
>
>   data:
> pools:   7 pools, 484 pgs
> objects: 2.70 M objects, 10 TiB
> usage:   31 TiB used, 114 TiB / 146 TiB avail
> pgs: 2422839/8095065 objects misplaced (29.930%)
>  343 active+clean
>  101 active+remapped+backfill_wait
>  39  active+remapped+backfilling
>  1   active+remapped+backfill_wait+backfill_toofull
>
>   io:
> recovery: 315 MiB/s, 78 objects/s
>
>
>
>
>
> ceph osd df
> ID CLASS WEIGHT  REWEIGHT SIZEUSE AVAIL   %USE  VAR  PGS
>  0   hdd 2.72890  1.0 2.7 TiB 975 GiB 1.8 TiB 34.89 1.62  31
>  1   hdd 2.72899  1.0 2.7 TiB 643 GiB 2.1 TiB 23.00 1.07  36
>  8   hdd 7.27739  1.0 7.3 TiB 1.7 TiB 5.5 TiB 23.85 1.11  83
> 12   hdd 7.27730  1.0 7.3 TiB 1.1 TiB 6.2 TiB 14.85 0.69  81
> 16   hdd 7.27730  1.0 7.3 TiB 2.0 TiB 5.3 TiB 27.68 1.29  74
> 20   hdd 9.09569  1.0 9.1 TiB 108 GiB 9.0 TiB  1.16 0.05  43
>  2   hdd 2.72899  1.0 2.7 TiB 878 GiB 1.9 TiB 31.40 1.46  36
>  3   hdd 2.72899  1.0 2.7 TiB 783 GiB 2.0 TiB 28.02 1.30  39
>  9   hdd 7.27739  1.0 7.3 TiB 2.0 TiB 5.3 TiB 27.58 1.28  85
> 13   hdd 7.27730  1.0 7.3 TiB 2.2 TiB 5.1 TiB 30.10 1.40  78
> 17   hdd 7.27730  1.0 7.3 TiB 2.1 TiB 5.2 TiB 28.23 1.31  84
> 21   hdd 9.09569  1.0 9.1 TiB 192 GiB 8.9 TiB  2.06 0.10  41
>  4   hdd 2.72899  1.0 2.7 TiB 927 GiB 1.8 TiB 33.18 1.54  34
>  5   hdd 2.72899  1.0 2.7 TiB 1.0 TiB 1.7 TiB 37.57 1.75  28
> 10   hdd 7.27739  1.0 7.3 TiB 2.2 TiB 5.0 TiB 30.66 1.43  87
> 14   hdd 7.27730  1.0 7.3 TiB 1.8 TiB 5.5 TiB 24.23 1.13  89
> 18   hdd 7.27730  1.0 7.3 TiB 2.5 TiB 4.8 TiB 33.83 1.57  93
> 22   hdd 9.09569  1.0 9.1 TiB 210 GiB 8.9 TiB  2.26 0.10  44
>  6   hdd 2.72899  1.0 2.7 TiB 350 GiB 2.4 TiB 12.51 0.58  21
>  7   hdd 2.72899  1.0 2.7 TiB 980 GiB 1.8 TiB 35.07 1.63  35
> 11   hdd 7.27739  1.0 7.3 TiB 2.8 TiB 4.4 TiB 39.14 1.82  99
> 15   hdd 7.27730  1.0 7.3 TiB 1.6 TiB 5.6 TiB 22.49 1.05  82
> 19   hdd 7.27730  1.0 7.3 TiB 2.1 TiB 5.2 TiB 28.49 1.32  77
> 23   hdd 9.09569  1.0 9.1 TiB 285 GiB 8.8 TiB  3.06 0.14  52
> TOTAL 146 TiB  31 TiB 114 TiB 21.51
> MIN/MAX VAR: 0.05/1.82  STDDEV: 11.78
>
>
>
>
> Right after adding the osds it showed degraded for a few minutes, since
> all my pools have a redundancy of 3 and i'm adding osd i'm a bit confused
> why this happens ? I get why it's misplaced, but undersized and degraded ?
>
> pgs: 4611/8095032 objects degraded (0.057%)
>  2626460/8095032 objects misplaced (32.445%)
>  215 active+clean
>  192 active+remapped+backfill_wait
>  26  active+recovering+undersized+remapped
>  17  active+recovery_wait+undersized+degraded+remapped
>  16  active+recovering
>  11  active+recovery_wait+degraded
>  6   active+remapped+backfilling
>  1   active+remapped+backfill_toofull
>
>
> Maybe someone can give me some pointers on what i'm missing to understand
> whats happening here ?
>
> Thanks!
>
> Sebastian
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] HELP! --> CLUSER DOWN (was "v13.2.1 Mimic released")

2018-07-29 Thread Nathan Cutler

Strange...
- wouldn't swear, but pretty sure v13.2.0 was working ok before
- so what do others say/see?
  - no one on v13.2.1 so far (hard to believe) OR
  - just don't have this "systemctl ceph-osd.target" problem and all just works?

If you also __MIGRATED__ from Luminous (say ~ v12.2.5 or older) to Mimic (say 
v13.2.0 -> v13.2.1) and __DO NOT__ see the same systemctl problems, whats your 
Linix OS and version (I'm on RHEL 7.5 here) ? :O


Hi ceph.novice:

I'm the one to blame for this regretful incident. Today I have 
reproduced the issue in teuthology:


2018-07-29T18:20:07.288 INFO:teuthology.orchestra.run.ovh093:Running: 
'sudo TESTDIR=/home/ubuntu/cephtest bash -c ceph-detect-init'
2018-07-29T18:20:07.796 
INFO:teuthology.orchestra.run.ovh093.stderr:Traceback (most recent call 
last):
2018-07-29T18:20:07.797 INFO:teuthology.orchestra.run.ovh093.stderr: 
File "/bin/ceph-detect-init", line 9, in 
2018-07-29T18:20:07.797 INFO:teuthology.orchestra.run.ovh093.stderr: 
load_entry_point('ceph-detect-init==1.0.1', 'console_scripts', 
'ceph-detect-init')()
2018-07-29T18:20:07.797 INFO:teuthology.orchestra.run.ovh093.stderr: 
File "/usr/lib/python2.7/site-packages/ceph_detect_init/main.py", line 
56, in run
2018-07-29T18:20:07.797 INFO:teuthology.orchestra.run.ovh093.stderr: 
print(ceph_detect_init.get(args.use_rhceph).init)
2018-07-29T18:20:07.797 INFO:teuthology.orchestra.run.ovh093.stderr: 
File "/usr/lib/python2.7/site-packages/ceph_detect_init/__init__.py", 
line 42, in get
2018-07-29T18:20:07.797 INFO:teuthology.orchestra.run.ovh093.stderr: 
release=release)
2018-07-29T18:20:07.797 
INFO:teuthology.orchestra.run.ovh093.stderr:ceph_detect_init.exc.UnsupportedPlatform: 
Platform is not supported.: rhel  7.5


Just to be sure, can you confirm? (I.e. issue the command 
"ceph-detect-init" on your RHEL 7.5 system. Instead of saying "systemd" 
it gives an error like above?)


I'm working on a fix now at https://github.com/ceph/ceph/pull/23303

Nathan

On 07/29/2018 11:16 AM, ceph.nov...@habmalnefrage.de wrote:
>



  


Gesendet: Sonntag, 29. Juli 2018 um 03:15 Uhr
Von: "Vasu Kulkarni" 
An: ceph.nov...@habmalnefrage.de
Cc: "Sage Weil" , ceph-users , "Ceph 
Development" 
Betreff: Re: [ceph-users] HELP! --> CLUSER DOWN (was "v13.2.1 Mimic released")
On Sat, Jul 28, 2018 at 6:02 PM,  wrote:

Have you guys changed something with the systemctl startup of the OSDs?


I think there is some kind of systemd issue hidden in mimic,
https://tracker.ceph.com/issues/25004
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] HELP! --> CLUSER DOWN (was "v13.2.1 Mimic released")

2018-07-29 Thread ceph . novice

Strange...
- wouldn't swear, but pretty sure v13.2.0 was working ok before
- so what do others say/see?
 - no one on v13.2.1 so far (hard to believe) OR
 - just don't have this "systemctl ceph-osd.target" problem and all just works?

If you also __MIGRATED__ from Luminous (say ~ v12.2.5 or older) to Mimic (say 
v13.2.0 -> v13.2.1) and __DO NOT__ see the same systemctl problems, whats your 
Linix OS and version (I'm on RHEL 7.5 here) ? :O

 

Gesendet: Sonntag, 29. Juli 2018 um 03:15 Uhr
Von: "Vasu Kulkarni" 
An: ceph.nov...@habmalnefrage.de
Cc: "Sage Weil" , ceph-users , 
"Ceph Development" 
Betreff: Re: [ceph-users] HELP! --> CLUSER DOWN (was "v13.2.1 Mimic released")
On Sat, Jul 28, 2018 at 6:02 PM,  wrote:
> Have you guys changed something with the systemctl startup of the OSDs?

I think there is some kind of systemd issue hidden in mimic,
https://tracker.ceph.com/issues/25004
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com