Re: [ceph-users] Ceiling on number of PGs in a OSD

2015-03-20 Thread Craig Lewis
This isn't a hard limit on the number, but it's recommended that you keep
it around 100.  Smaller values cause data distribution evenness problems.
Larger values cause the OSD processes to use more CPU, RAM, and file
descriptors, particularly during recovery.  With that many OSDs, you're
going to want to increase your sysctl's, particularly open file
descriptors, open sockets, FDs per process, etc.


You don't need the same number of placement groups for every pool.  Pools
without much data don't need as many PGs.  For example, I have a bunch of
pools for RGW zones, and they have 32 PGs each.  I have a total of 2600
PGs, 2048 are in the .rgw.buckets pool.

Also keep in mind that your pg_num and pgp_num need to be multipled by the
number of replicas to get the PG per OSD count.  I have 2600 PGs and
replication 3, so I really have 7800 PGs spread over 72 OSDs.

Assuming you have one big pool, 750 OSDs, and replication 3, I'd go with
32k PGs on the big pool.  Same thing, but replication 2, I'd still go 32k,
but prepare to expand PGs with your next addition of OSDs.

If you're going to have several big pools (ie, you're using RGW and RDB
heavily), I'd go with 16k PGs for the big pools, and adjust those over time
depending on which is used more heavily.  If RDB is consuming 2x the space,
then increase it's pg_num and pgp_num during the next OSD expansion, but
don't increase RGWs pg_num and pgp_num.


The number of PGs per OSD should stay around 100 as you add OSDs.  If you
add 10x the OSDs, you'll multiple the pg_num and pgp_num by 10 too, which
gives you the same number of PGs per OSD.  My (pg_num / osd_num) fluctuates
between 75 and 200, depending on when I do the pg_num and pgp_num increase
relative to the OSD adds.

When you increase pg_num and pgp_num, don't do a large jump.  Ceph will
only allow you to double the value.  Even that is extreme.  It will cause
every OSD in the cluster to start splitting PGs.  When you want to double
your pg_num and pgp_num, it's recommended that you make several passes.  I
don't recall seeing any recommendations, but I'm planning to break my next
increase up into 10 passes.  I'm at 2048 now, so I'll probably add 204 PGs
until I get to 4096.




On Thu, Mar 19, 2015 at 6:12 AM, Sreenath BH bhsreen...@gmail.com wrote:

 Hi,

 Is there a celing on the number for number of placement groups in a
 OSD beyond which steady state and/or recovery performance will start
 to suffer?

 Example: I need to create a pool with 750 osds (25 OSD per server, 50
 servers).
 The PG calculator gives me 65536 placement groups with 300 PGs per OSD.
 Now as the cluster expands, the number of PGs in a OSD has to increase as
 well.

 If the cluster size inceases by a factor of 10, the number of PGs per
 OSD will also need to be increased.
 What would be the impact of large pg number in a OSD on peering and
 rebalancing.

 There is 3GB per OSD available.

 thanks,
 Sreenath
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceiling on number of PGs in a OSD

2015-03-19 Thread Sreenath BH
Hi,

Is there a celing on the number for number of placement groups in a
OSD beyond which steady state and/or recovery performance will start
to suffer?

Example: I need to create a pool with 750 osds (25 OSD per server, 50 servers).
The PG calculator gives me 65536 placement groups with 300 PGs per OSD.
Now as the cluster expands, the number of PGs in a OSD has to increase as well.

If the cluster size inceases by a factor of 10, the number of PGs per
OSD will also need to be increased.
What would be the impact of large pg number in a OSD on peering and rebalancing.

There is 3GB per OSD available.

thanks,
Sreenath
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com