Re: [ceph-users] Bluestore DB size and onode count

2018-09-11 Thread Igor Fedotov



On 9/10/2018 11:39 PM, Nick Fisk wrote:

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Mark 
Nelson
Sent: 10 September 2018 18:27
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Bluestore DB size and onode count

On 09/10/2018 12:22 PM, Igor Fedotov wrote:


Hi Nick.


On 9/10/2018 1:30 PM, Nick Fisk wrote:

If anybody has 5 minutes could they just clarify a couple of things
for me

1. onode count, should this be equal to the number of objects stored
on the OSD?
Through reading several posts, there seems to be a general indication
that this is the case, but looking at my OSD's the maths don't
work.

onode_count is the number of onodes in the cache, not the total number
of onodes at an OSD.
Hence the difference...

Ok, thanks, that makes sense. I assume there isn't actually a counter which 
gives you the total number of objects on an OSD then?
IIRC "bin/ceph daemon osd.1 calc_objectstore_db_histogram" might report 
what you need, see "num_onodes" field in the report..



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bluestore DB size and onode count

2018-09-10 Thread Nick Fisk
> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Mark 
> Nelson
> Sent: 10 September 2018 18:27
> To: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] Bluestore DB size and onode count
> 
> On 09/10/2018 12:22 PM, Igor Fedotov wrote:
> 
> > Hi Nick.
> >
> >
> > On 9/10/2018 1:30 PM, Nick Fisk wrote:
> >> If anybody has 5 minutes could they just clarify a couple of things
> >> for me
> >>
> >> 1. onode count, should this be equal to the number of objects stored
> >> on the OSD?
> >> Through reading several posts, there seems to be a general indication
> >> that this is the case, but looking at my OSD's the maths don't
> >> work.
> > onode_count is the number of onodes in the cache, not the total number
> > of onodes at an OSD.
> > Hence the difference...

Ok, thanks, that makes sense. I assume there isn't actually a counter which 
gives you the total number of objects on an OSD then?

> >>
> >> Eg.
> >> ceph osd df
> >> ID CLASS WEIGHT  REWEIGHT SIZE  USEAVAIL  %USE  VAR  PGS
> >>   0   hdd 2.73679  1.0 2802G  1347G  1454G 48.09 0.69 115
> >>
> >> So 3TB OSD, roughly half full. This is pure RBD workload (no
> >> snapshots or anything clever) so let's assume worse case scenario of
> >> 4MB objects (Compression is on however, which would only mean more
> >> objects for given size)
> >> 1347000/4=~336750 expected objects
> >>
> >> sudo ceph daemon osd.0 perf dump | grep blue
> >>  "bluefs": {
> >>  "bluestore": {
> >>  "bluestore_allocated": 1437813964800,
> >>  "bluestore_stored": 2326118994003,
> >>  "bluestore_compressed": 445228558486,
> >>  "bluestore_compressed_allocated": 547649159168,
> >>  "bluestore_compressed_original": 1437773843456,
> >>  "bluestore_onodes": 99022,
> >>  "bluestore_onode_hits": 18151499,
> >>  "bluestore_onode_misses": 4539604,
> >>  "bluestore_onode_shard_hits": 10596780,
> >>  "bluestore_onode_shard_misses": 4632238,
> >>  "bluestore_extents": 896365,
> >>  "bluestore_blobs": 861495,
> >>
> >> 99022 onodes, anyone care to enlighten me?
> >>
> >> 2. block.db Size
> >> sudo ceph daemon osd.0 perf dump | grep db
> >>  "db_total_bytes": 8587829248,
> >>  "db_used_bytes": 2375024640,
> >>
> >> 2.3GB=0.17% of data size. This seems a lot lower than the 1%
> >> recommendation (10GB for every 1TB) or 4% given in the official docs. I
> >> know that different workloads will have differing overheads and
> >> potentially smaller objects. But am I understanding these figures
> >> correctly as they seem dramatically lower?
> > Just in case - is slow_used_bytes equal to 0? Some DB data might
> > reside at slow device if spill over has happened. Which doesn't
> > require full DB volume to happen - that's by RocksDB's design.
> >
> > And recommended numbers are a bit... speculative. So it's quite
> > possible that you numbers are absolutely adequate.
> 
> FWIW, these are the numbers I came up with after examining the SST files
> generated under different workloads:
> 
> https://protect-eu.mimecast.com/s/7e0iCJq9Bh6pZCzILpy?domain=drive.google.com
> 

Thanks for your input Mark and Igor. Mark I can see your RBD figures aren't too 
far off mine, so all looks to be as expected then.

> >>
> >> Regards,
> >> Nick
> >>
> >> ___
> >> ceph-users mailing list
> >> mailto:ceph-users@lists.ceph.com
> >> https://protect-eu.mimecast.com/s/YtrdCKZVDUX8OTAS9XW?domain=lists.ceph.com
> >
> > ___
> > ceph-users mailing list
> > mailto:ceph-users@lists.ceph.com
> > https://protect-eu.mimecast.com/s/YtrdCKZVDUX8OTAS9XW?domain=lists.ceph.com
> 
> ___
> ceph-users mailing list
> mailto:ceph-users@lists.ceph.com
> https://protect-eu.mimecast.com/s/YtrdCKZVDUX8OTAS9XW?domain=lists.ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bluestore DB size and onode count

2018-09-10 Thread Igor Fedotov




On 9/10/2018 8:26 PM, Mark Nelson wrote:

On 09/10/2018 12:22 PM, Igor Fedotov wrote:

Just in case - is slow_used_bytes equal to 0? Some DB data might 
reside at slow device if spill over has happened. Which doesn't 
require full DB volume to happen - that's by RocksDB's design.


And recommended numbers are a bit... speculative. So it's quite 
possible that you numbers are absolutely adequate.


FWIW, these are the numbers I came up with after examining the SST 
files generated under different workloads:


https://drive.google.com/file/d/1Ews2WR-y5k3TMToAm0ZDsm7Gf_fwvyFw/view?usp=sharing 



Sorry, Mark. Speculative is a bit too strong word... I meant that 
two-parameter sizing model describing such a complex system as Ceph 
might tend to produce quite inaccurate results often enough...




Regards,
Nick

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bluestore DB size and onode count

2018-09-10 Thread Mark Nelson

On 09/10/2018 12:22 PM, Igor Fedotov wrote:


Hi Nick.


On 9/10/2018 1:30 PM, Nick Fisk wrote:
If anybody has 5 minutes could they just clarify a couple of things 
for me


1. onode count, should this be equal to the number of objects stored 
on the OSD?
Through reading several posts, there seems to be a general indication 
that this is the case, but looking at my OSD's the maths don't

work.
onode_count is the number of onodes in the cache, not the total number 
of onodes at an OSD.

Hence the difference...


Eg.
ceph osd df
ID CLASS WEIGHT  REWEIGHT SIZE  USE    AVAIL  %USE  VAR  PGS
  0   hdd 2.73679  1.0 2802G  1347G  1454G 48.09 0.69 115

So 3TB OSD, roughly half full. This is pure RBD workload (no 
snapshots or anything clever) so let's assume worse case scenario of
4MB objects (Compression is on however, which would only mean more 
objects for given size)

1347000/4=~336750 expected objects

sudo ceph daemon osd.0 perf dump | grep blue
 "bluefs": {
 "bluestore": {
 "bluestore_allocated": 1437813964800,
 "bluestore_stored": 2326118994003,
 "bluestore_compressed": 445228558486,
 "bluestore_compressed_allocated": 547649159168,
 "bluestore_compressed_original": 1437773843456,
 "bluestore_onodes": 99022,
 "bluestore_onode_hits": 18151499,
 "bluestore_onode_misses": 4539604,
 "bluestore_onode_shard_hits": 10596780,
 "bluestore_onode_shard_misses": 4632238,
 "bluestore_extents": 896365,
 "bluestore_blobs": 861495,

99022 onodes, anyone care to enlighten me?

2. block.db Size
sudo ceph daemon osd.0 perf dump | grep db
 "db_total_bytes": 8587829248,
 "db_used_bytes": 2375024640,

2.3GB=0.17% of data size. This seems a lot lower than the 1% 
recommendation (10GB for every 1TB) or 4% given in the official docs. I
know that different workloads will have differing overheads and 
potentially smaller objects. But am I understanding these figures

correctly as they seem dramatically lower?
Just in case - is slow_used_bytes equal to 0? Some DB data might 
reside at slow device if spill over has happened. Which doesn't 
require full DB volume to happen - that's by RocksDB's design.


And recommended numbers are a bit... speculative. So it's quite 
possible that you numbers are absolutely adequate.


FWIW, these are the numbers I came up with after examining the SST files 
generated under different workloads:


https://drive.google.com/file/d/1Ews2WR-y5k3TMToAm0ZDsm7Gf_fwvyFw/view?usp=sharing



Regards,
Nick

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bluestore DB size and onode count

2018-09-10 Thread Igor Fedotov

Hi Nick.


On 9/10/2018 1:30 PM, Nick Fisk wrote:

If anybody has 5 minutes could they just clarify a couple of things for me

1. onode count, should this be equal to the number of objects stored on the OSD?
Through reading several posts, there seems to be a general indication that this 
is the case, but looking at my OSD's the maths don't
work.
onode_count is the number of onodes in the cache, not the total number 
of onodes at an OSD.

Hence the difference...


Eg.
ceph osd df
ID CLASS WEIGHT  REWEIGHT SIZE  USEAVAIL  %USE  VAR  PGS
  0   hdd 2.73679  1.0 2802G  1347G  1454G 48.09 0.69 115

So 3TB OSD, roughly half full. This is pure RBD workload (no snapshots or 
anything clever) so let's assume worse case scenario of
4MB objects (Compression is on however, which would only mean more objects for 
given size)
1347000/4=~336750 expected objects

sudo ceph daemon osd.0 perf dump | grep blue
 "bluefs": {
 "bluestore": {
 "bluestore_allocated": 1437813964800,
 "bluestore_stored": 2326118994003,
 "bluestore_compressed": 445228558486,
 "bluestore_compressed_allocated": 547649159168,
 "bluestore_compressed_original": 1437773843456,
 "bluestore_onodes": 99022,
 "bluestore_onode_hits": 18151499,
 "bluestore_onode_misses": 4539604,
 "bluestore_onode_shard_hits": 10596780,
 "bluestore_onode_shard_misses": 4632238,
 "bluestore_extents": 896365,
 "bluestore_blobs": 861495,

99022 onodes, anyone care to enlighten me?

2. block.db Size
sudo ceph daemon osd.0 perf dump | grep db
 "db_total_bytes": 8587829248,
 "db_used_bytes": 2375024640,

2.3GB=0.17% of data size. This seems a lot lower than the 1% recommendation 
(10GB for every 1TB) or 4% given in the official docs. I
know that different workloads will have differing overheads and potentially 
smaller objects. But am I understanding these figures
correctly as they seem dramatically lower?
Just in case - is slow_used_bytes equal to 0? Some DB data might reside 
at slow device if spill over has happened. Which doesn't require full DB 
volume to happen - that's by RocksDB's design.


And recommended numbers are a bit... speculative. So it's quite possible 
that you numbers are absolutely adequate.


Regards,
Nick

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Bluestore DB size and onode count

2018-09-10 Thread Nick Fisk
If anybody has 5 minutes could they just clarify a couple of things for me

1. onode count, should this be equal to the number of objects stored on the OSD?
Through reading several posts, there seems to be a general indication that this 
is the case, but looking at my OSD's the maths don't
work.

Eg.
ceph osd df
ID CLASS WEIGHT  REWEIGHT SIZE  USEAVAIL  %USE  VAR  PGS
 0   hdd 2.73679  1.0 2802G  1347G  1454G 48.09 0.69 115

So 3TB OSD, roughly half full. This is pure RBD workload (no snapshots or 
anything clever) so let's assume worse case scenario of
4MB objects (Compression is on however, which would only mean more objects for 
given size)
1347000/4=~336750 expected objects

sudo ceph daemon osd.0 perf dump | grep blue
"bluefs": {
"bluestore": {
"bluestore_allocated": 1437813964800,
"bluestore_stored": 2326118994003,
"bluestore_compressed": 445228558486,
"bluestore_compressed_allocated": 547649159168,
"bluestore_compressed_original": 1437773843456,
"bluestore_onodes": 99022,
"bluestore_onode_hits": 18151499,
"bluestore_onode_misses": 4539604,
"bluestore_onode_shard_hits": 10596780,
"bluestore_onode_shard_misses": 4632238,
"bluestore_extents": 896365,
"bluestore_blobs": 861495,

99022 onodes, anyone care to enlighten me?

2. block.db Size
sudo ceph daemon osd.0 perf dump | grep db
"db_total_bytes": 8587829248,
"db_used_bytes": 2375024640,

2.3GB=0.17% of data size. This seems a lot lower than the 1% recommendation 
(10GB for every 1TB) or 4% given in the official docs. I
know that different workloads will have differing overheads and potentially 
smaller objects. But am I understanding these figures
correctly as they seem dramatically lower?

Regards,
Nick

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com