Re: [ceph-users] too many PGs per OSD (326 > max 300) warning when ALL PGs are 256

2016-10-11 Thread David Turner
You're right that you could be fine above the warning threshold, you can also 
be in a bad state while being within the threshold.  There is no one size fits 
all for how much memory you need.  The defaults and recommendations are there 
as a general guide to get you started.  If you want to push the limits or stray 
from the recommendations, you really need to be testing to see what you can do 
(not in production, but trying your best to duplicate the production 
environment and load).

How much memory per PG you need is going to be dependent on every aspect of 
your hardware, latency, load, etc.  The only way to come up with that number is 
to actually test scenarios and find out for yourself.  We're currently in the 
process of figuring out if we can reduce the amount of RAM in our hosts, but 
doing so slowly and deliberately testing everything we can think of along the 
way.



[cid:image4e785e.JPG@e3b30316.4ca66939]<https://storagecraft.com>   David 
Turner | Cloud Operations Engineer | StorageCraft Technology 
Corporation<https://storagecraft.com>
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2760 | Mobile: 385.224.2943



If you are not the intended recipient of this message or received it 
erroneously, please notify the sender and delete it, together with any 
attachments, and be advised that any dissemination or copying of this message 
is prohibited.




From: Thomas HAMEL [hm...@t-hamel.fr]
Sent: Tuesday, October 11, 2016 10:48 AM
To: David Turner; David Turner
Cc: Andrus, Brian Contractor; ceph-users@lists.ceph.com; Andrus, Brian 
Contractor; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] too many PGs per OSD (326 > max 300) warning when ALL 
PGs are 256

Le 11 octobre 2016 15:10:22 GMT+02:00, David Turner 
 a écrit :
>
>
>This is terrifying to think about increasing that threshold. You may
>have enough system resources while running healthy, but when
>recovering, especially during peering, your memory usage can more than
>double. If you have too many pgs per osd and/or too many osds per
>machine based on how much memory you have in each host you very well
>could end up in an eternal peering situation. Peering causes more
>memory usage, the Linux OOM killer is killing your osds causing more
>peering, and the only way to get out of the situation is running to the
>datacenter with buckets of RAM.
>

If the problem is memory, then I don't understand the point of the warning. If 
you have multiple osds sharing the same server, you can feel safe by being 
below the threshold while being at risk of not having enough ram if hundreds of 
pgs on multple osd starts to peer at the same time.

What am I missing here ?

What is the right amount of RAM for a pg, since that looks like the right 
metric (for this issue) ?

Thomas Hamel


>
>
>[cid:image7cfd24.JPG@49688796.47b58d1e]<https://storagecraft.com>
>David Turner | Cloud Operations Engineer | StorageCraft Technology
>Corporation<https://storagecraft.com>
>380 Data Drive Suite 300 | Draper | Utah | 84020
>Office: 801.871.2760 | Mobile: 385.224.2943
>
>
>
>If you are not the intended recipient of this message or received it
>erroneously, please notify the sender and delete it, together with any
>attachments, and be advised that any dissemination or copying of this
>message is prohibited.
>
>
>
>Le 10 octobre 2016 23:00:33 GMT+02:00, David Turner
>mailto:david.tur...@storagecraft.com>> a
>écrit :
>
>The default it uses can be controlled in your ceph.conf file.  The
>ceph-deploy tool is a generic ceph deployment tool which does not have
>presets for rados gateway deployments or other specific deployments.
>When creating pools you can specify the amount of pgs in them with the
>tool so that it doesn't use your defaults.
>
>You are correct that when creating a lot of pools that it is wiser to
>specify a smaller amount of pgs, but that is a manual step on your end
>and best to use the pgcalc tool to know what you should be aiming at.
>I haven't used ceph with rgw so I can't speak to optimal settings here,
>but I can only imagine that there are several tutorials and guides out
>there.
>
>
>
>[cid:image6e3427.JPG@497f9106.4eae2722]<https://storagecraft.com>
>David Turner | Cloud Operations Engineer | StorageCraft Technology
>Corporation<https://storagecraft.com>
>380 Data Drive Suite 300 | Draper | Utah | 84020
>Office: 801.871.2760 | Mobile: 385.224.2943
>
>__

Re: [ceph-users] too many PGs per OSD (326 > max 300) warning when ALL PGs are 256

2016-10-11 Thread Thomas HAMEL


Le 11 octobre 2016 15:10:22 GMT+02:00, David Turner 
 a écrit :
>
>
>This is terrifying to think about increasing that threshold. You may
>have enough system resources while running healthy, but when
>recovering, especially during peering, your memory usage can more than
>double. If you have too many pgs per osd and/or too many osds per
>machine based on how much memory you have in each host you very well
>could end up in an eternal peering situation. Peering causes more
>memory usage, the Linux OOM killer is killing your osds causing more
>peering, and the only way to get out of the situation is running to the
>datacenter with buckets of RAM.
>

If the problem is memory, then I don't understand the point of the warning. If 
you have multiple osds sharing the same server, you can feel safe by being 
below the threshold while being at risk of not having enough ram if hundreds of 
pgs on multple osd starts to peer at the same time.

What am I missing here ?

What is the right amount of RAM for a pg, since that looks like the right 
metric (for this issue) ?

Thomas Hamel


>
>
>[cid:image7cfd24.JPG@49688796.47b58d1e]<https://storagecraft.com>  
>David Turner | Cloud Operations Engineer | StorageCraft Technology
>Corporation<https://storagecraft.com>
>380 Data Drive Suite 300 | Draper | Utah | 84020
>Office: 801.871.2760 | Mobile: 385.224.2943
>
>
>
>If you are not the intended recipient of this message or received it
>erroneously, please notify the sender and delete it, together with any
>attachments, and be advised that any dissemination or copying of this
>message is prohibited.
>
>
>
>Le 10 octobre 2016 23:00:33 GMT+02:00, David Turner
>mailto:david.tur...@storagecraft.com>> a
>écrit :
>
>The default it uses can be controlled in your ceph.conf file.  The
>ceph-deploy tool is a generic ceph deployment tool which does not have
>presets for rados gateway deployments or other specific deployments. 
>When creating pools you can specify the amount of pgs in them with the
>tool so that it doesn't use your defaults.
>
>You are correct that when creating a lot of pools that it is wiser to
>specify a smaller amount of pgs, but that is a manual step on your end
>and best to use the pgcalc tool to know what you should be aiming at. 
>I haven't used ceph with rgw so I can't speak to optimal settings here,
>but I can only imagine that there are several tutorials and guides out
>there.
>
>
>
>[cid:image6e3427.JPG@497f9106.4eae2722]<https://storagecraft.com>  
>David Turner | Cloud Operations Engineer | StorageCraft Technology
>Corporation<https://storagecraft.com>
>380 Data Drive Suite 300 | Draper | Utah | 84020
>Office: 801.871.2760 | Mobile: 385.224.2943
>
>
>
>If you are not the intended recipient of this message or received it
>erroneously, please notify the sender and delete it, together with any
>attachments, and be advised that any dissemination or copying of this
>message is prohibited.
>
>
>
>____________________
>From: Andrus, Brian Contractor
>[bdand...@nps.edu<mailto:bdand...@nps.edu>]
>Sent: Monday, October 10, 2016 12:18 PM
>To: David Turner;
>ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
>Subject: RE: [ceph-users] too many PGs per OSD (326 > max 300) warning
>when ALL PGs are 256
>
>David,
>Thanks for the info. I am getting an understanding of how this works.
>Now I used the ceph-deploy tool to create the rgw pools. It seems then
>that the tool isn’t the best at creating the pools necessary for an rgw
>gateway as it made all of them the default sizes for pg_num/pgp_num
>Perhaps, then, it is wiser to have a very low default for those so the
>ceph-deploy tool doesn assign a large value to something that will
>merely hold control or other metadata?
>
>
>Brian Andrus
>ITACS/Research Computing
>Naval Postgraduate School
>Monterey, California
>voice: 831-656-6238
>
>
>
>From: David Turner [mailto:david.tur...@storagecraft.com]
>Sent: Monday, October 10, 2016 10:33 AM
>To: Andrus, Brian Contractor
>mailto:bdand...@nps.edu>>;
>ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
>Subject: RE: [ceph-users] too many PGs per OSD (326 > max 300) warning
>when ALL PGs are 256
>
>You have 11 pools with 256 pgs, 1 pool with 128 and 1 pool with 64...
>that's 3,008 pgs in your entire cluster.  Multiply that number by your
>replica size and divide by how many OSDs you have in your cluster and
>you'l

Re: [ceph-users] too many PGs per OSD (326 > max 300) warning when ALL PGs are 256

2016-10-11 Thread David Turner

On Oct 11, 2016, at 1:21 AM, Thomas HAMEL 
mailto:hm...@t-hamel.fr>> wrote:

Hello, I have the same problem and I wanted to make a few remarks.

One of the main advice on pg count in the docs is: if you have less than 5 
osds, use 128 pgs per pool. This kind of rule of thumb is really what you are 
looking for when you begin. But this one is very misleading, especially if you 
have 3 or 4 osds. Pgcalc may be the right tool, but it's hard to understand 
when you begin. I think it would be helpfull to lower the pg per pool advice to 
64.

My other point is that the vast majority of advices and tutorials on this issue 
tells you to raise the warning threshold (sometimes to thousands of pgs per 
osds) or remove it entirely. That is very tempting , but the official docs 
never hints at when it's safe to do so. Do I need RAM? CPU? IOPS?
This is terrifying to think about increasing that threshold. You may have 
enough system resources while running healthy, but when recovering, especially 
during peering, your memory usage can more than double. If you have too many 
pgs per osd and/or too many osds per machine based on how much memory you have 
in each host you very well could end up in an eternal peering situation. 
Peering causes more memory usage, the Linux OOM killer is killing your osds 
causing more peering, and the only way to get out of the situation is running 
to the datacenter with buckets of RAM.


Thomas Hamel




[cid:image7cfd24.JPG@49688796.47b58d1e]<https://storagecraft.com>   David 
Turner | Cloud Operations Engineer | StorageCraft Technology 
Corporation<https://storagecraft.com>
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2760 | Mobile: 385.224.2943



If you are not the intended recipient of this message or received it 
erroneously, please notify the sender and delete it, together with any 
attachments, and be advised that any dissemination or copying of this message 
is prohibited.



Le 10 octobre 2016 23:00:33 GMT+02:00, David Turner 
mailto:david.tur...@storagecraft.com>> a écrit :

The default it uses can be controlled in your ceph.conf file.  The ceph-deploy 
tool is a generic ceph deployment tool which does not have presets for rados 
gateway deployments or other specific deployments.  When creating pools you can 
specify the amount of pgs in them with the tool so that it doesn't use your 
defaults.

You are correct that when creating a lot of pools that it is wiser to specify a 
smaller amount of pgs, but that is a manual step on your end and best to use 
the pgcalc tool to know what you should be aiming at.  I haven't used ceph with 
rgw so I can't speak to optimal settings here, but I can only imagine that 
there are several tutorials and guides out there.



[cid:image6e3427.JPG@497f9106.4eae2722]<https://storagecraft.com>   David 
Turner | Cloud Operations Engineer | StorageCraft Technology 
Corporation<https://storagecraft.com>
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2760 | Mobile: 385.224.2943



If you are not the intended recipient of this message or received it 
erroneously, please notify the sender and delete it, together with any 
attachments, and be advised that any dissemination or copying of this message 
is prohibited.




From: Andrus, Brian Contractor [bdand...@nps.edu<mailto:bdand...@nps.edu>]
Sent: Monday, October 10, 2016 12:18 PM
To: David Turner; ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
Subject: RE: [ceph-users] too many PGs per OSD (326 > max 300) warning when ALL 
PGs are 256

David,
Thanks for the info. I am getting an understanding of how this works.
Now I used the ceph-deploy tool to create the rgw pools. It seems then that the 
tool isn’t the best at creating the pools necessary for an rgw gateway as it 
made all of them the default sizes for pg_num/pgp_num
Perhaps, then, it is wiser to have a very low default for those so the 
ceph-deploy tool doesn assign a large value to something that will merely hold 
control or other metadata?


Brian Andrus
ITACS/Research Computing
Naval Postgraduate School
Monterey, California
voice: 831-656-6238



From: David Turner [mailto:david.tur...@storagecraft.com]
Sent: Monday, October 10, 2016 10:33 AM
To: Andrus, Brian Contractor mailto:bdand...@nps.edu>>; 
ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
Subject: RE: [ceph-users] too many PGs per OSD (326 > max 300) warning when ALL 
PGs are 256

You have 11 pools with 256 pgs, 1 pool with 128 and 1 pool with 64... that's 
3,008 pgs in your entire cluster.  Multiply that number by your replica size 
and divide by how many OSDs you have in your cluster and you'll se

Re: [ceph-users] too many PGs per OSD (326 > max 300) warning when ALL PGs are 256

2016-10-11 Thread Thomas HAMEL
Hello, I have the same problem and I wanted to make a few remarks.

One of the main advice on pg count in the docs is: if you have less than 5 
osds, use 128 pgs per pool. This kind of rule of thumb is really what you are 
looking for when you begin. But this one is very misleading, especially if you 
have 3 or 4 osds. Pgcalc may be the right tool, but it's hard to understand 
when you begin. I think it would be helpfull to lower the pg per pool advice to 
64.

My other point is that the vast majority of advices and tutorials on this issue 
tells you to raise the warning threshold (sometimes to thousands of pgs per 
osds) or remove it entirely. That is very tempting , but the official docs 
never hints at when it's safe to do so. Do I need RAM? CPU? IOPS?


Thomas Hamel

Le 10 octobre 2016 23:00:33 GMT+02:00, David Turner 
 a écrit :
>The default it uses can be controlled in your ceph.conf file.  The
>ceph-deploy tool is a generic ceph deployment tool which does not have
>presets for rados gateway deployments or other specific deployments. 
>When creating pools you can specify the amount of pgs in them with the
>tool so that it doesn't use your defaults.
>
>You are correct that when creating a lot of pools that it is wiser to
>specify a smaller amount of pgs, but that is a manual step on your end
>and best to use the pgcalc tool to know what you should be aiming at. 
>I haven't used ceph with rgw so I can't speak to optimal settings here,
>but I can only imagine that there are several tutorials and guides out
>there.
>
>
>
>[cid:image6e3427.JPG@497f9106.4eae2722]<https://storagecraft.com>  
>David Turner | Cloud Operations Engineer | StorageCraft Technology
>Corporation<https://storagecraft.com>
>380 Data Drive Suite 300 | Draper | Utah | 84020
>Office: 801.871.2760 | Mobile: 385.224.2943
>
>
>
>If you are not the intended recipient of this message or received it
>erroneously, please notify the sender and delete it, together with any
>attachments, and be advised that any dissemination or copying of this
>message is prohibited.
>
>
>
>
>From: Andrus, Brian Contractor [bdand...@nps.edu]
>Sent: Monday, October 10, 2016 12:18 PM
>To: David Turner; ceph-users@lists.ceph.com
>Subject: RE: [ceph-users] too many PGs per OSD (326 > max 300) warning
>when ALL PGs are 256
>
>David,
>Thanks for the info. I am getting an understanding of how this works.
>Now I used the ceph-deploy tool to create the rgw pools. It seems then
>that the tool isn’t the best at creating the pools necessary for an rgw
>gateway as it made all of them the default sizes for pg_num/pgp_num
>Perhaps, then, it is wiser to have a very low default for those so the
>ceph-deploy tool doesn assign a large value to something that will
>merely hold control or other metadata?
>
>
>Brian Andrus
>ITACS/Research Computing
>Naval Postgraduate School
>Monterey, California
>voice: 831-656-6238
>
>
>
>From: David Turner [mailto:david.tur...@storagecraft.com]
>Sent: Monday, October 10, 2016 10:33 AM
>To: Andrus, Brian Contractor ;
>ceph-users@lists.ceph.com
>Subject: RE: [ceph-users] too many PGs per OSD (326 > max 300) warning
>when ALL PGs are 256
>
>You have 11 pools with 256 pgs, 1 pool with 128 and 1 pool with 64...
>that's 3,008 pgs in your entire cluster.  Multiply that number by your
>replica size and divide by how many OSDs you have in your cluster and
>you'll see what your average PGs per osd is.  Based on the replica size
>you shared, that's a total number of 6,528 copies of PGs to be divided
>amongst the OSDS in your cluster.  Your cluster will be in warning if
>that number is greater than 300 per OSD, like you're seeing.  When
>designing your cluster and how many pools, pgs, and replica size you
>will be setting, please consult the pgcalc tool found here
>http://ceph.com/pgcalc/.  You cannot reduce the number of PGs in a
>pool, so the easiest way to resolve this issue is mostly likely going
>to be destroying pools and recreating them with the proper number of
>PGs.
>
>The PG number should be based on what percentage of the data in your
>cluster will be in this pool.  If I'm planning to have about 1024 PGs
>total in my cluster and I give 256 PGs to 4 different pools, then what
>I'm saying is that each of those 4 pools will have the exact same
>amount of data as each other.  On the other hand, if I believe that 1
>of those pools will have 90% of the data and the other 3 pools will
>have very little data, then I'll probably give the larger pool 1024 PGs
>and the rest of them 64 PGs (or less dep

Re: [ceph-users] too many PGs per OSD (326 > max 300) warning when ALL PGs are 256

2016-10-10 Thread David Turner
The default it uses can be controlled in your ceph.conf file.  The ceph-deploy 
tool is a generic ceph deployment tool which does not have presets for rados 
gateway deployments or other specific deployments.  When creating pools you can 
specify the amount of pgs in them with the tool so that it doesn't use your 
defaults.

You are correct that when creating a lot of pools that it is wiser to specify a 
smaller amount of pgs, but that is a manual step on your end and best to use 
the pgcalc tool to know what you should be aiming at.  I haven't used ceph with 
rgw so I can't speak to optimal settings here, but I can only imagine that 
there are several tutorials and guides out there.



[cid:image6e3427.JPG@497f9106.4eae2722]<https://storagecraft.com>   David 
Turner | Cloud Operations Engineer | StorageCraft Technology 
Corporation<https://storagecraft.com>
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2760 | Mobile: 385.224.2943



If you are not the intended recipient of this message or received it 
erroneously, please notify the sender and delete it, together with any 
attachments, and be advised that any dissemination or copying of this message 
is prohibited.




From: Andrus, Brian Contractor [bdand...@nps.edu]
Sent: Monday, October 10, 2016 12:18 PM
To: David Turner; ceph-users@lists.ceph.com
Subject: RE: [ceph-users] too many PGs per OSD (326 > max 300) warning when ALL 
PGs are 256

David,
Thanks for the info. I am getting an understanding of how this works.
Now I used the ceph-deploy tool to create the rgw pools. It seems then that the 
tool isn’t the best at creating the pools necessary for an rgw gateway as it 
made all of them the default sizes for pg_num/pgp_num
Perhaps, then, it is wiser to have a very low default for those so the 
ceph-deploy tool doesn assign a large value to something that will merely hold 
control or other metadata?


Brian Andrus
ITACS/Research Computing
Naval Postgraduate School
Monterey, California
voice: 831-656-6238



From: David Turner [mailto:david.tur...@storagecraft.com]
Sent: Monday, October 10, 2016 10:33 AM
To: Andrus, Brian Contractor ; ceph-users@lists.ceph.com
Subject: RE: [ceph-users] too many PGs per OSD (326 > max 300) warning when ALL 
PGs are 256

You have 11 pools with 256 pgs, 1 pool with 128 and 1 pool with 64... that's 
3,008 pgs in your entire cluster.  Multiply that number by your replica size 
and divide by how many OSDs you have in your cluster and you'll see what your 
average PGs per osd is.  Based on the replica size you shared, that's a total 
number of 6,528 copies of PGs to be divided amongst the OSDS in your cluster.  
Your cluster will be in warning if that number is greater than 300 per OSD, 
like you're seeing.  When designing your cluster and how many pools, pgs, and 
replica size you will be setting, please consult the pgcalc tool found here 
http://ceph.com/pgcalc/.  You cannot reduce the number of PGs in a pool, so the 
easiest way to resolve this issue is mostly likely going to be destroying pools 
and recreating them with the proper number of PGs.

The PG number should be based on what percentage of the data in your cluster 
will be in this pool.  If I'm planning to have about 1024 PGs total in my 
cluster and I give 256 PGs to 4 different pools, then what I'm saying is that 
each of those 4 pools will have the exact same amount of data as each other.  
On the other hand, if I believe that 1 of those pools will have 90% of the data 
and the other 3 pools will have very little data, then I'll probably give the 
larger pool 1024 PGs and the rest of them 64 PGs (or less depending on what I'm 
aiming for).  It is beneficial to keep the pg_num and pgp_num counts to base 2 
numbers.

[cid:image001.jpg@01D222E8.1655BB40]<https://storagecraft.com>

David Turner | Cloud Operations Engineer | StorageCraft Technology 
Corporation<https://storagecraft.com>
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2760 | Mobile: 385.224.2943


If you are not the intended recipient of this message or received it 
erroneously, please notify the sender and delete it, together with any 
attachments, and be advised that any dissemination or copying of this message 
is prohibited.



From: ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of Andrus, Brian 
Contractor [bdand...@nps.edu]
Sent: Monday, October 10, 2016 11:14 AM
To: ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
Subject: [ceph-users] too many PGs per OSD (326 > max 300) warning when ALL PGs 
are 256
Ok, this is an odd one to me…
I have several pools, ALL of them are set with pg_num an

Re: [ceph-users] too many PGs per OSD (326 > max 300) warning when ALL PGs are 256

2016-10-10 Thread Andrus, Brian Contractor
David,
Thanks for the info. I am getting an understanding of how this works.
Now I used the ceph-deploy tool to create the rgw pools. It seems then that the 
tool isn’t the best at creating the pools necessary for an rgw gateway as it 
made all of them the default sizes for pg_num/pgp_num
Perhaps, then, it is wiser to have a very low default for those so the 
ceph-deploy tool doesn assign a large value to something that will merely hold 
control or other metadata?


Brian Andrus
ITACS/Research Computing
Naval Postgraduate School
Monterey, California
voice: 831-656-6238



From: David Turner [mailto:david.tur...@storagecraft.com]
Sent: Monday, October 10, 2016 10:33 AM
To: Andrus, Brian Contractor ; ceph-users@lists.ceph.com
Subject: RE: [ceph-users] too many PGs per OSD (326 > max 300) warning when ALL 
PGs are 256

You have 11 pools with 256 pgs, 1 pool with 128 and 1 pool with 64... that's 
3,008 pgs in your entire cluster.  Multiply that number by your replica size 
and divide by how many OSDs you have in your cluster and you'll see what your 
average PGs per osd is.  Based on the replica size you shared, that's a total 
number of 6,528 copies of PGs to be divided amongst the OSDS in your cluster.  
Your cluster will be in warning if that number is greater than 300 per OSD, 
like you're seeing.  When designing your cluster and how many pools, pgs, and 
replica size you will be setting, please consult the pgcalc tool found here 
http://ceph.com/pgcalc/.  You cannot reduce the number of PGs in a pool, so the 
easiest way to resolve this issue is mostly likely going to be destroying pools 
and recreating them with the proper number of PGs.

The PG number should be based on what percentage of the data in your cluster 
will be in this pool.  If I'm planning to have about 1024 PGs total in my 
cluster and I give 256 PGs to 4 different pools, then what I'm saying is that 
each of those 4 pools will have the exact same amount of data as each other.  
On the other hand, if I believe that 1 of those pools will have 90% of the data 
and the other 3 pools will have very little data, then I'll probably give the 
larger pool 1024 PGs and the rest of them 64 PGs (or less depending on what I'm 
aiming for).  It is beneficial to keep the pg_num and pgp_num counts to base 2 
numbers.

[cid:image001.jpg@01D222E8.1655BB40]<https://storagecraft.com>

David Turner | Cloud Operations Engineer | StorageCraft Technology 
Corporation<https://storagecraft.com>
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2760 | Mobile: 385.224.2943


If you are not the intended recipient of this message or received it 
erroneously, please notify the sender and delete it, together with any 
attachments, and be advised that any dissemination or copying of this message 
is prohibited.



From: ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of Andrus, Brian 
Contractor [bdand...@nps.edu]
Sent: Monday, October 10, 2016 11:14 AM
To: ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
Subject: [ceph-users] too many PGs per OSD (326 > max 300) warning when ALL PGs 
are 256
Ok, this is an odd one to me…
I have several pools, ALL of them are set with pg_num and pgp_num = 256. Yet, 
the warning about too many PGs per OSD is showing up.
Here are my pools:

pool 0 'rbd' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins 
pg_num 256 pgp_num 256 last_change 134 flags hashpspool stripe_width 0
pool 1 'cephfs_data' replicated size 3 min_size 1 crush_ruleset 0 object_hash 
rjenkins pg_num 256 pgp_num 256 last_change 203 flags hashpspool 
crash_replay_interval 45 stripe_width 0
pool 2 'cephfs_metadata' replicated size 2 min_size 1 crush_ruleset 0 
object_hash rjenkins pg_num 64 pgp_num 64 last_change 196 flags hashpspool 
stripe_width 0
pool 3 'vmimages' replicated size 2 min_size 1 crush_ruleset 0 object_hash 
rjenkins pg_num 128 pgp_num 128 last_change 213 flags hashpspool stripe_width 0
removed_snaps [1~3]
pool 25 '.rgw.root' replicated size 2 min_size 1 crush_ruleset 0 object_hash 
rjenkins pg_num 256 pgp_num 256 last_change 6199 flags hashpspool stripe_width 0
pool 26 'default.rgw.control' replicated size 2 min_size 1 crush_ruleset 0 
object_hash rjenkins pg_num 256 pgp_num 256 last_change 6202 flags hashpspool 
stripe_width 0
pool 27 'default.rgw.data.root' replicated size 2 min_size 1 crush_ruleset 0 
object_hash rjenkins pg_num 256 pgp_num 256 last_change 6204 flags hashpspool 
stripe_width 0
pool 28 'default.rgw.gc' replicated size 2 min_size 1 crush_ruleset 0 
object_hash rjenkins pg_num 256 pgp_num 256 last_change 6205 flags hashpspool 
stripe_width 0
pool 29 'default.rgw.log' replicated size 2 min_size 1 crush_

Re: [ceph-users] too many PGs per OSD (326 > max 300) warning when ALL PGs are 256

2016-10-10 Thread David Turner
You have 11 pools with 256 pgs, 1 pool with 128 and 1 pool with 64... that's 
3,008 pgs in your entire cluster.  Multiply that number by your replica size 
and divide by how many OSDs you have in your cluster and you'll see what your 
average PGs per osd is.  Based on the replica size you shared, that's a total 
number of 6,528 copies of PGs to be divided amongst the OSDS in your cluster.  
Your cluster will be in warning if that number is greater than 300 per OSD, 
like you're seeing.  When designing your cluster and how many pools, pgs, and 
replica size you will be setting, please consult the pgcalc tool found here 
http://ceph.com/pgcalc/.  You cannot reduce the number of PGs in a pool, so the 
easiest way to resolve this issue is mostly likely going to be destroying pools 
and recreating them with the proper number of PGs.

The PG number should be based on what percentage of the data in your cluster 
will be in this pool.  If I'm planning to have about 1024 PGs total in my 
cluster and I give 256 PGs to 4 different pools, then what I'm saying is that 
each of those 4 pools will have the exact same amount of data as each other.  
On the other hand, if I believe that 1 of those pools will have 90% of the data 
and the other 3 pools will have very little data, then I'll probably give the 
larger pool 1024 PGs and the rest of them 64 PGs (or less depending on what I'm 
aiming for).  It is beneficial to keep the pg_num and pgp_num counts to base 2 
numbers.



[cid:imagecb73c1.JPG@0c071d9d.4f9f8ea8]   David 
Turner | Cloud Operations Engineer | StorageCraft Technology 
Corporation
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2760 | Mobile: 385.224.2943



If you are not the intended recipient of this message or received it 
erroneously, please notify the sender and delete it, together with any 
attachments, and be advised that any dissemination or copying of this message 
is prohibited.




From: ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of Andrus, Brian 
Contractor [bdand...@nps.edu]
Sent: Monday, October 10, 2016 11:14 AM
To: ceph-users@lists.ceph.com
Subject: [ceph-users] too many PGs per OSD (326 > max 300) warning when ALL PGs 
are 256

Ok, this is an odd one to me…
I have several pools, ALL of them are set with pg_num and pgp_num = 256. Yet, 
the warning about too many PGs per OSD is showing up.
Here are my pools:

pool 0 'rbd' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins 
pg_num 256 pgp_num 256 last_change 134 flags hashpspool stripe_width 0
pool 1 'cephfs_data' replicated size 3 min_size 1 crush_ruleset 0 object_hash 
rjenkins pg_num 256 pgp_num 256 last_change 203 flags hashpspool 
crash_replay_interval 45 stripe_width 0
pool 2 'cephfs_metadata' replicated size 2 min_size 1 crush_ruleset 0 
object_hash rjenkins pg_num 64 pgp_num 64 last_change 196 flags hashpspool 
stripe_width 0
pool 3 'vmimages' replicated size 2 min_size 1 crush_ruleset 0 object_hash 
rjenkins pg_num 128 pgp_num 128 last_change 213 flags hashpspool stripe_width 0
removed_snaps [1~3]
pool 25 '.rgw.root' replicated size 2 min_size 1 crush_ruleset 0 object_hash 
rjenkins pg_num 256 pgp_num 256 last_change 6199 flags hashpspool stripe_width 0
pool 26 'default.rgw.control' replicated size 2 min_size 1 crush_ruleset 0 
object_hash rjenkins pg_num 256 pgp_num 256 last_change 6202 flags hashpspool 
stripe_width 0
pool 27 'default.rgw.data.root' replicated size 2 min_size 1 crush_ruleset 0 
object_hash rjenkins pg_num 256 pgp_num 256 last_change 6204 flags hashpspool 
stripe_width 0
pool 28 'default.rgw.gc' replicated size 2 min_size 1 crush_ruleset 0 
object_hash rjenkins pg_num 256 pgp_num 256 last_change 6205 flags hashpspool 
stripe_width 0
pool 29 'default.rgw.log' replicated size 2 min_size 1 crush_ruleset 0 
object_hash rjenkins pg_num 256 pgp_num 256 last_change 6206 flags hashpspool 
stripe_width 0
pool 30 'default.rgw.users.uid' replicated size 2 min_size 1 crush_ruleset 0 
object_hash rjenkins pg_num 256 pgp_num 256 last_change 6211 flags hashpspool 
stripe_width 0
pool 31 'default.rgw.meta' replicated size 2 min_size 1 crush_ruleset 0 
object_hash rjenkins pg_num 256 pgp_num 256 last_change 6214 flags hashpspool 
stripe_width 0
pool 32 'default.rgw.buckets.index' replicated size 2 min_size 1 crush_ruleset 
0 object_hash rjenkins pg_num 256 pgp_num 256 last_change 6216 flags hashpspool 
stripe_width 0
pool 33 'default.rgw.buckets.data' replicated size 2 min_size 1 crush_ruleset 0 
object_hash rjenkins pg_num 256 pgp_num 256 last_change 6218 flags hashpspool 
stripe_width 0


so why would the warning show up, and how do I get it to go away and stay away?


Brian Andrus
ITACS/Research Computing
Naval Postgraduate School
Monterey, California
voice: 831-656-6238

___