Re: [ceph-users] How many PGs per OSD is too many?

2018-11-14 Thread Mark Nelson


On 11/14/18 1:45 PM, Vladimir Brik wrote:

Hello

I have a ceph 13.2.2 cluster comprised of 5 hosts, each with 16 HDDs 
and 4 SSDs. HDD OSDs have about 50 PGs each, while SSD OSDs have about 
400 PGs each (a lot more pools use SSDs than HDDs). Servers are fairly 
powerful: 48 HT cores, 192GB of RAM, and 2x25Gbps Ethernet.


The impression I got from the docs is that having more than 200 PGs 
per OSD is not a good thing, but justifications were vague (no 
concrete numbers), like increased peering time, increased resource 
consumption, and possibly decreased recovery performance. None of 
these appeared to be a significant problem in my testing, but the 
tests were very basic and done on a pretty empty cluster under minimal 
load, so I worry I'll run into trouble down the road.


Here are the questions I have:
- In practice, is it a big deal that some OSDs have ~400 PGs?
- In what situations would our cluster most likely fare significantly 
better if I went through the trouble of re-creating pools so that no 
OSD would have more than, say, ~100 PGs?
- What performance metrics could I monitor to detect possible issues 
due to having too many PGs?



It's a fuzzy sort of thing.  During normal operation: With more PGs 
you'll store more pglog info in memory, so you'll have a more bloated 
OSD process.  If you use the new bluestore option for setting an osd 
memory target, that will mean less memory for caches.  It will also 
likely mean that there's a greater chance that pglog entries won't be 
invalidated before memtable flushes in rocksdb, so you might end up with 
higher write amp and slower DB performance as those entries get 
compacted into L0+.  That could matter with RGW or if you are doing lots 
of small 4k writes with RBD.


I'd see what Neha/Josh think about the impact on recovery, though I 
suppose one upside is that more PGs means you get a longer log based 
recovery window.  You could accomplish the same effect by increasing the 
number of pglog entries per pg (or keep the same overall number of 
entries by having more PGs and lower the number of entries per PG).  And 
upside to having more PGs is better data distribution quality, though we 
can now get much better distributions with the new balancer code, even 
with fewer PGs. One bad thing about having too few PGs is that you can 
have increased lock contention.  The balancer can make the data 
distribution better but you still can't shrink the number of PGs per 
pool too low.


The gist of it is that if you decide to look into this yourself you are 
probably going to find some contradictory evidence and trade-offs.  
There are pitfalls if you go too high and pitfalls if you go too low.  
I'm not sure we can easily define the exact PG counts/OSD where they 
happen since it's sort of dependent on how much memory you have, how 
fast your hardware is, whether you are using the balancer, and what your 
expectations are.


How's that for a non-answer? ;)

Mark




Thanks,

Vlad
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How many PGs per OSD is too many?

2018-11-14 Thread Kjetil Joergensen
This may be less of an issue now - the most traumatic experience for us,
back around hammer, memory usage under recovery+load ended up with OOM kill
of osds, needing more recovery, a pretty vicious cycle.

-KJ

On Wed, Nov 14, 2018 at 11:45 AM Vladimir Brik <
vladimir.b...@icecube.wisc.edu> wrote:

> Hello
>
> I have a ceph 13.2.2 cluster comprised of 5 hosts, each with 16 HDDs and
> 4 SSDs. HDD OSDs have about 50 PGs each, while SSD OSDs have about 400
> PGs each (a lot more pools use SSDs than HDDs). Servers are fairly
> powerful: 48 HT cores, 192GB of RAM, and 2x25Gbps Ethernet.
>
> The impression I got from the docs is that having more than 200 PGs per
> OSD is not a good thing, but justifications were vague (no concrete
> numbers), like increased peering time, increased resource consumption,
> and possibly decreased recovery performance. None of these appeared to
> be a significant problem in my testing, but the tests were very basic
> and done on a pretty empty cluster under minimal load, so I worry I'll
> run into trouble down the road.
>
> Here are the questions I have:
> - In practice, is it a big deal that some OSDs have ~400 PGs?
> - In what situations would our cluster most likely fare significantly
> better if I went through the trouble of re-creating pools so that no OSD
> would have more than, say, ~100 PGs?
> - What performance metrics could I monitor to detect possible issues due
> to having too many PGs?
>
> Thanks,
>
> Vlad
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>


-- 
Kjetil Joergensen 
SRE, Medallia Inc
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] How many PGs per OSD is too many?

2018-11-14 Thread Vladimir Brik

Hello

I have a ceph 13.2.2 cluster comprised of 5 hosts, each with 16 HDDs and 
4 SSDs. HDD OSDs have about 50 PGs each, while SSD OSDs have about 400 
PGs each (a lot more pools use SSDs than HDDs). Servers are fairly 
powerful: 48 HT cores, 192GB of RAM, and 2x25Gbps Ethernet.


The impression I got from the docs is that having more than 200 PGs per 
OSD is not a good thing, but justifications were vague (no concrete 
numbers), like increased peering time, increased resource consumption, 
and possibly decreased recovery performance. None of these appeared to 
be a significant problem in my testing, but the tests were very basic 
and done on a pretty empty cluster under minimal load, so I worry I'll 
run into trouble down the road.


Here are the questions I have:
- In practice, is it a big deal that some OSDs have ~400 PGs?
- In what situations would our cluster most likely fare significantly 
better if I went through the trouble of re-creating pools so that no OSD 
would have more than, say, ~100 PGs?
- What performance metrics could I monitor to detect possible issues due 
to having too many PGs?


Thanks,

Vlad
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com