Re: [ceph-users] Bluestore DB size and onode count
On 9/10/2018 11:39 PM, Nick Fisk wrote: -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Mark Nelson Sent: 10 September 2018 18:27 To: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Bluestore DB size and onode count On 09/10/2018 12:22 PM, Igor Fedotov wrote: Hi Nick. On 9/10/2018 1:30 PM, Nick Fisk wrote: If anybody has 5 minutes could they just clarify a couple of things for me 1. onode count, should this be equal to the number of objects stored on the OSD? Through reading several posts, there seems to be a general indication that this is the case, but looking at my OSD's the maths don't work. onode_count is the number of onodes in the cache, not the total number of onodes at an OSD. Hence the difference... Ok, thanks, that makes sense. I assume there isn't actually a counter which gives you the total number of objects on an OSD then? IIRC "bin/ceph daemon osd.1 calc_objectstore_db_histogram" might report what you need, see "num_onodes" field in the report.. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Bluestore DB size and onode count
> -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Mark > Nelson > Sent: 10 September 2018 18:27 > To: ceph-users@lists.ceph.com > Subject: Re: [ceph-users] Bluestore DB size and onode count > > On 09/10/2018 12:22 PM, Igor Fedotov wrote: > > > Hi Nick. > > > > > > On 9/10/2018 1:30 PM, Nick Fisk wrote: > >> If anybody has 5 minutes could they just clarify a couple of things > >> for me > >> > >> 1. onode count, should this be equal to the number of objects stored > >> on the OSD? > >> Through reading several posts, there seems to be a general indication > >> that this is the case, but looking at my OSD's the maths don't > >> work. > > onode_count is the number of onodes in the cache, not the total number > > of onodes at an OSD. > > Hence the difference... Ok, thanks, that makes sense. I assume there isn't actually a counter which gives you the total number of objects on an OSD then? > >> > >> Eg. > >> ceph osd df > >> ID CLASS WEIGHT REWEIGHT SIZE USEAVAIL %USE VAR PGS > >> 0 hdd 2.73679 1.0 2802G 1347G 1454G 48.09 0.69 115 > >> > >> So 3TB OSD, roughly half full. This is pure RBD workload (no > >> snapshots or anything clever) so let's assume worse case scenario of > >> 4MB objects (Compression is on however, which would only mean more > >> objects for given size) > >> 1347000/4=~336750 expected objects > >> > >> sudo ceph daemon osd.0 perf dump | grep blue > >> "bluefs": { > >> "bluestore": { > >> "bluestore_allocated": 1437813964800, > >> "bluestore_stored": 2326118994003, > >> "bluestore_compressed": 445228558486, > >> "bluestore_compressed_allocated": 547649159168, > >> "bluestore_compressed_original": 1437773843456, > >> "bluestore_onodes": 99022, > >> "bluestore_onode_hits": 18151499, > >> "bluestore_onode_misses": 4539604, > >> "bluestore_onode_shard_hits": 10596780, > >> "bluestore_onode_shard_misses": 4632238, > >> "bluestore_extents": 896365, > >> "bluestore_blobs": 861495, > >> > >> 99022 onodes, anyone care to enlighten me? > >> > >> 2. block.db Size > >> sudo ceph daemon osd.0 perf dump | grep db > >> "db_total_bytes": 8587829248, > >> "db_used_bytes": 2375024640, > >> > >> 2.3GB=0.17% of data size. This seems a lot lower than the 1% > >> recommendation (10GB for every 1TB) or 4% given in the official docs. I > >> know that different workloads will have differing overheads and > >> potentially smaller objects. But am I understanding these figures > >> correctly as they seem dramatically lower? > > Just in case - is slow_used_bytes equal to 0? Some DB data might > > reside at slow device if spill over has happened. Which doesn't > > require full DB volume to happen - that's by RocksDB's design. > > > > And recommended numbers are a bit... speculative. So it's quite > > possible that you numbers are absolutely adequate. > > FWIW, these are the numbers I came up with after examining the SST files > generated under different workloads: > > https://protect-eu.mimecast.com/s/7e0iCJq9Bh6pZCzILpy?domain=drive.google.com > Thanks for your input Mark and Igor. Mark I can see your RBD figures aren't too far off mine, so all looks to be as expected then. > >> > >> Regards, > >> Nick > >> > >> ___ > >> ceph-users mailing list > >> mailto:ceph-users@lists.ceph.com > >> https://protect-eu.mimecast.com/s/YtrdCKZVDUX8OTAS9XW?domain=lists.ceph.com > > > > ___ > > ceph-users mailing list > > mailto:ceph-users@lists.ceph.com > > https://protect-eu.mimecast.com/s/YtrdCKZVDUX8OTAS9XW?domain=lists.ceph.com > > ___ > ceph-users mailing list > mailto:ceph-users@lists.ceph.com > https://protect-eu.mimecast.com/s/YtrdCKZVDUX8OTAS9XW?domain=lists.ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Bluestore DB size and onode count
On 9/10/2018 8:26 PM, Mark Nelson wrote: On 09/10/2018 12:22 PM, Igor Fedotov wrote: Just in case - is slow_used_bytes equal to 0? Some DB data might reside at slow device if spill over has happened. Which doesn't require full DB volume to happen - that's by RocksDB's design. And recommended numbers are a bit... speculative. So it's quite possible that you numbers are absolutely adequate. FWIW, these are the numbers I came up with after examining the SST files generated under different workloads: https://drive.google.com/file/d/1Ews2WR-y5k3TMToAm0ZDsm7Gf_fwvyFw/view?usp=sharing Sorry, Mark. Speculative is a bit too strong word... I meant that two-parameter sizing model describing such a complex system as Ceph might tend to produce quite inaccurate results often enough... Regards, Nick ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Bluestore DB size and onode count
On 09/10/2018 12:22 PM, Igor Fedotov wrote: Hi Nick. On 9/10/2018 1:30 PM, Nick Fisk wrote: If anybody has 5 minutes could they just clarify a couple of things for me 1. onode count, should this be equal to the number of objects stored on the OSD? Through reading several posts, there seems to be a general indication that this is the case, but looking at my OSD's the maths don't work. onode_count is the number of onodes in the cache, not the total number of onodes at an OSD. Hence the difference... Eg. ceph osd df ID CLASS WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS 0 hdd 2.73679 1.0 2802G 1347G 1454G 48.09 0.69 115 So 3TB OSD, roughly half full. This is pure RBD workload (no snapshots or anything clever) so let's assume worse case scenario of 4MB objects (Compression is on however, which would only mean more objects for given size) 1347000/4=~336750 expected objects sudo ceph daemon osd.0 perf dump | grep blue "bluefs": { "bluestore": { "bluestore_allocated": 1437813964800, "bluestore_stored": 2326118994003, "bluestore_compressed": 445228558486, "bluestore_compressed_allocated": 547649159168, "bluestore_compressed_original": 1437773843456, "bluestore_onodes": 99022, "bluestore_onode_hits": 18151499, "bluestore_onode_misses": 4539604, "bluestore_onode_shard_hits": 10596780, "bluestore_onode_shard_misses": 4632238, "bluestore_extents": 896365, "bluestore_blobs": 861495, 99022 onodes, anyone care to enlighten me? 2. block.db Size sudo ceph daemon osd.0 perf dump | grep db "db_total_bytes": 8587829248, "db_used_bytes": 2375024640, 2.3GB=0.17% of data size. This seems a lot lower than the 1% recommendation (10GB for every 1TB) or 4% given in the official docs. I know that different workloads will have differing overheads and potentially smaller objects. But am I understanding these figures correctly as they seem dramatically lower? Just in case - is slow_used_bytes equal to 0? Some DB data might reside at slow device if spill over has happened. Which doesn't require full DB volume to happen - that's by RocksDB's design. And recommended numbers are a bit... speculative. So it's quite possible that you numbers are absolutely adequate. FWIW, these are the numbers I came up with after examining the SST files generated under different workloads: https://drive.google.com/file/d/1Ews2WR-y5k3TMToAm0ZDsm7Gf_fwvyFw/view?usp=sharing Regards, Nick ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Bluestore DB size and onode count
Hi Nick. On 9/10/2018 1:30 PM, Nick Fisk wrote: If anybody has 5 minutes could they just clarify a couple of things for me 1. onode count, should this be equal to the number of objects stored on the OSD? Through reading several posts, there seems to be a general indication that this is the case, but looking at my OSD's the maths don't work. onode_count is the number of onodes in the cache, not the total number of onodes at an OSD. Hence the difference... Eg. ceph osd df ID CLASS WEIGHT REWEIGHT SIZE USEAVAIL %USE VAR PGS 0 hdd 2.73679 1.0 2802G 1347G 1454G 48.09 0.69 115 So 3TB OSD, roughly half full. This is pure RBD workload (no snapshots or anything clever) so let's assume worse case scenario of 4MB objects (Compression is on however, which would only mean more objects for given size) 1347000/4=~336750 expected objects sudo ceph daemon osd.0 perf dump | grep blue "bluefs": { "bluestore": { "bluestore_allocated": 1437813964800, "bluestore_stored": 2326118994003, "bluestore_compressed": 445228558486, "bluestore_compressed_allocated": 547649159168, "bluestore_compressed_original": 1437773843456, "bluestore_onodes": 99022, "bluestore_onode_hits": 18151499, "bluestore_onode_misses": 4539604, "bluestore_onode_shard_hits": 10596780, "bluestore_onode_shard_misses": 4632238, "bluestore_extents": 896365, "bluestore_blobs": 861495, 99022 onodes, anyone care to enlighten me? 2. block.db Size sudo ceph daemon osd.0 perf dump | grep db "db_total_bytes": 8587829248, "db_used_bytes": 2375024640, 2.3GB=0.17% of data size. This seems a lot lower than the 1% recommendation (10GB for every 1TB) or 4% given in the official docs. I know that different workloads will have differing overheads and potentially smaller objects. But am I understanding these figures correctly as they seem dramatically lower? Just in case - is slow_used_bytes equal to 0? Some DB data might reside at slow device if spill over has happened. Which doesn't require full DB volume to happen - that's by RocksDB's design. And recommended numbers are a bit... speculative. So it's quite possible that you numbers are absolutely adequate. Regards, Nick ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Bluestore DB size and onode count
If anybody has 5 minutes could they just clarify a couple of things for me 1. onode count, should this be equal to the number of objects stored on the OSD? Through reading several posts, there seems to be a general indication that this is the case, but looking at my OSD's the maths don't work. Eg. ceph osd df ID CLASS WEIGHT REWEIGHT SIZE USEAVAIL %USE VAR PGS 0 hdd 2.73679 1.0 2802G 1347G 1454G 48.09 0.69 115 So 3TB OSD, roughly half full. This is pure RBD workload (no snapshots or anything clever) so let's assume worse case scenario of 4MB objects (Compression is on however, which would only mean more objects for given size) 1347000/4=~336750 expected objects sudo ceph daemon osd.0 perf dump | grep blue "bluefs": { "bluestore": { "bluestore_allocated": 1437813964800, "bluestore_stored": 2326118994003, "bluestore_compressed": 445228558486, "bluestore_compressed_allocated": 547649159168, "bluestore_compressed_original": 1437773843456, "bluestore_onodes": 99022, "bluestore_onode_hits": 18151499, "bluestore_onode_misses": 4539604, "bluestore_onode_shard_hits": 10596780, "bluestore_onode_shard_misses": 4632238, "bluestore_extents": 896365, "bluestore_blobs": 861495, 99022 onodes, anyone care to enlighten me? 2. block.db Size sudo ceph daemon osd.0 perf dump | grep db "db_total_bytes": 8587829248, "db_used_bytes": 2375024640, 2.3GB=0.17% of data size. This seems a lot lower than the 1% recommendation (10GB for every 1TB) or 4% given in the official docs. I know that different workloads will have differing overheads and potentially smaller objects. But am I understanding these figures correctly as they seem dramatically lower? Regards, Nick ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com