Re: [ceph-users] BlueStore Cache Ratios

2017-10-12 Thread Jorge Pinilla López
Hey Mark,

Thanks a lot for the info. You should really make a paper of it and post
it :)

First of all, I am sorry if I say something wrong, I am still learning
about this topic and I am speaking from totally unawareness.

Second, I understood that ratios ar a way of controling priorities and
they make that bloom filters and indexes don't get page out from cache,
which really makes sense.

Also the 512MB restriction kind of makes sense but I dont really know if
It would make any sense to give more space for rocksdb block cache (like
1GB). I think only testing can resolve that question because I think it
really depends on workloads.

what I don't understand is why data ins't cache at all, even if there is
free space for it. I undertand the importance order would be: bloom
filter and index >> metadata >> data, but if there is free space left
for data then why not go for it? maybe setting ratios of 0.90 k/v 0.09
metadata and 0.01 data would make more sense.


El 11/10/2017 a las 15:44, Mark Nelson escribió:
> Hi Jorge,
>
> I was sort of responsible for all of this. :)
>
> So basically there are different caches in different places:
>
> - rocksdb cache
> - rocksdb block cache (which can be configured to include filters and
> indexes)
> - rocksdb compressed block cache
> - bluestore onode cache
>
> The bluestore onode cache is the only one that stores
> onode/extent/blob metadata before it is encoded, ie it's bigger but
> has lower impact on the CPU.  The next step is the regular rocksdb
> block cache where we've already encoded the data, but it's not
> compressed.  Optionally we could also compress the data and then cache
> it using rocksdb's compressed block cache.  Finally, rocksdb can set
> memory aside for bloom filters and indexes but we're configuring those
> to go into the block cache so we can get a better accounting for how
> memory is being used (otherwise it's difficult to control how much
> memory index and filters get).  The downside is that bloom filters and
> indexes can theoretically get paged out under heavy cache pressure. 
> We set these to be high priority in the block cache and also pin the
> L0 filters/index though to help avoid this.
>
> In the testing I did earlier this year, what I saw is that in low
> memory scenarios it's almost always best to give all of the cache to
> rocksdb's block cache.  Once you hit about the 512MB mark, we start
> seeing bigger gains by giving additional memory to bluestore's onode
> cache.  So we devised a mechanism where you can decide where to cut
> over.  It's quite possible that on very fast CPUs it might make sense
> ot use rocksdb compressed cache, or possibly if you have a huge number
> of objects these ratios might change.  The values we have now were
> sort of the best jack-of-all-trades values we found.
>
> Mark
>
> On 10/11/2017 08:32 AM, Jorge Pinilla López wrote:
>> okay, thanks for the explanation, so from the 3GB of Cache (default
>> cache for SSD) only a 0.5GB is going to K/V and 2.5 going to metadata.
>>
>> Is there a way of knowing how much k/v, metadata, data is storing and
>> how full cache is so I can adjust my ratios?, I was thinking some ratios
>> (like 0.9 k/v, 0.07 meta 0.03 data) but only speculating, I dont have
>> any real data.
>>
>> El 11/10/2017 a las 14:32, Mohamad Gebai escribió:
>>> Hi Jorge,
>>>
>>> On 10/10/2017 07:23 AM, Jorge Pinilla López wrote:
 Are .99 KV, .01 MetaData and .0 Data ratios right? they seem a little
 too disproporcionate.
>>> Yes, this is correct.
>>>
 Also .99 KV and Cache of 3GB for SSD means that almost the 3GB would
 be used for KV but there is also another attributed called
 bluestore_cache_kv_max which is by fault 512MB, then what is the rest
 of the cache used for?, nothing? shouldnt it be more kv_max value or
 less KV ratio?
>>> Anything over the *cache_kv_max value goes to the metadata cache. You
>>> can look in your logs to see the final values of kv, metadata and data
>>> cache ratios. To get data cache, you need to lower the ratios of
>>> metadata and kv caches.
>>>
>>> Mohamad
>>
>> -- 
>> 
>> *Jorge Pinilla López*
>> jorp...@unizar.es
>> Estudiante de ingenieria informática
>> Becario del area de sistemas (SICUZ)
>> Universidad de Zaragoza
>> PGP-KeyID: A34331932EBC715A
>> 
>>
>> 
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

-- 

*Jorge Pinilla López*
jorp...@unizar.es
Estudiante de ingenier

Re: [ceph-users] BlueStore Cache Ratios

2017-10-11 Thread Mark Nelson

Hi Jorge,

I was sort of responsible for all of this. :)

So basically there are different caches in different places:

- rocksdb bloom filter and index cache
- rocksdb block cache (which can be configured to include filters and 
indexes)

- rocksdb compressed block cache
- bluestore onode cache

The bluestore onode cache is the only one that stores onode/extent/blob 
metadata before it is encoded, ie it's bigger but has lower impact on 
the CPU.  The next step is the regular rocksdb block cache where we've 
already encoded the data, but it's not compressed.  Optionally we could 
also compress the data and then cache it using rocksdb's compressed 
block cache.  Finally, rocksdb can set memory aside for bloom filters 
and indexes but we're configuring those to go into the block cache so we 
can get a better accounting for how memory is being used (otherwise it's 
difficult to control how much memory index and filters get).  The 
downside is that bloom filters and indexes can theoretically get paged 
out under heavy cache pressure.  We set these to be high priority in the 
block cache and also pin the L0 filters/index though to help avoid this.


In the testing I did earlier this year, what I saw is that in low memory 
scenarios it's almost always best to give all of the cache to rocksdb's 
block cache.  Once you hit about the 512MB mark, we start seeing bigger 
gains by giving additional memory to bluestore's onode cache.  So we 
devised a mechanism where you can decide where to cut over.  It's quite 
possible that on very fast CPUs it might make sense ot use rocksdb 
compressed cache, or possibly if you have a huge number of objects these 
ratios might change.  The values we have now were sort of the best 
jack-of-all-trades values we found.


Mark

On 10/11/2017 08:32 AM, Jorge Pinilla López wrote:

okay, thanks for the explanation, so from the 3GB of Cache (default
cache for SSD) only a 0.5GB is going to K/V and 2.5 going to metadata.

Is there a way of knowing how much k/v, metadata, data is storing and
how full cache is so I can adjust my ratios?, I was thinking some ratios
(like 0.9 k/v, 0.07 meta 0.03 data) but only speculating, I dont have
any real data.

El 11/10/2017 a las 14:32, Mohamad Gebai escribió:

Hi Jorge,

On 10/10/2017 07:23 AM, Jorge Pinilla López wrote:

Are .99 KV, .01 MetaData and .0 Data ratios right? they seem a little
too disproporcionate.

Yes, this is correct.


Also .99 KV and Cache of 3GB for SSD means that almost the 3GB would
be used for KV but there is also another attributed called
bluestore_cache_kv_max which is by fault 512MB, then what is the rest
of the cache used for?, nothing? shouldnt it be more kv_max value or
less KV ratio?

Anything over the *cache_kv_max value goes to the metadata cache. You
can look in your logs to see the final values of kv, metadata and data
cache ratios. To get data cache, you need to lower the ratios of
metadata and kv caches.

Mohamad


--

*Jorge Pinilla López*
jorp...@unizar.es
Estudiante de ingenieria informática
Becario del area de sistemas (SICUZ)
Universidad de Zaragoza
PGP-KeyID: A34331932EBC715A




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] BlueStore Cache Ratios

2017-10-11 Thread Jorge Pinilla López
okay, thanks for the explanation, so from the 3GB of Cache (default
cache for SSD) only a 0.5GB is going to K/V and 2.5 going to metadata.

Is there a way of knowing how much k/v, metadata, data is storing and
how full cache is so I can adjust my ratios?, I was thinking some ratios
(like 0.9 k/v, 0.07 meta 0.03 data) but only speculating, I dont have
any real data.

El 11/10/2017 a las 14:32, Mohamad Gebai escribió:
> Hi Jorge,
>
> On 10/10/2017 07:23 AM, Jorge Pinilla López wrote:
>> Are .99 KV, .01 MetaData and .0 Data ratios right? they seem a little
>> too disproporcionate.
> Yes, this is correct.
>
>> Also .99 KV and Cache of 3GB for SSD means that almost the 3GB would
>> be used for KV but there is also another attributed called
>> bluestore_cache_kv_max which is by fault 512MB, then what is the rest
>> of the cache used for?, nothing? shouldnt it be more kv_max value or
>> less KV ratio?
> Anything over the *cache_kv_max value goes to the metadata cache. You
> can look in your logs to see the final values of kv, metadata and data
> cache ratios. To get data cache, you need to lower the ratios of
> metadata and kv caches.
>
> Mohamad

-- 

*Jorge Pinilla López*
jorp...@unizar.es
Estudiante de ingenieria informática
Becario del area de sistemas (SICUZ)
Universidad de Zaragoza
PGP-KeyID: A34331932EBC715A


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] BlueStore Cache Ratios

2017-10-11 Thread Mohamad Gebai
Hi Jorge,

On 10/10/2017 07:23 AM, Jorge Pinilla López wrote:
> Are .99 KV, .01 MetaData and .0 Data ratios right? they seem a little
> too disproporcionate.

Yes, this is correct.

> Also .99 KV and Cache of 3GB for SSD means that almost the 3GB would
> be used for KV but there is also another attributed called
> bluestore_cache_kv_max which is by fault 512MB, then what is the rest
> of the cache used for?, nothing? shouldnt it be more kv_max value or
> less KV ratio?

Anything over the *cache_kv_max value goes to the metadata cache. You
can look in your logs to see the final values of kv, metadata and data
cache ratios. To get data cache, you need to lower the ratios of
metadata and kv caches.

Mohamad
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] BlueStore Cache Ratios

2017-10-10 Thread Jorge Pinilla López
I've been reading about BlueStore and I came across the BlueStore Cache
and its ratios. I couldn't fully understand it.

Are .99 KV, .01 MetaData and .0 Data ratios right? they seem a little
too disproporcionate.
Also .99 KV and Cache of 3GB for SSD means that almost the 3GB would be
used for KV but there is also another attributed called
bluestore_cache_kv_max which is by fault 512MB, then what is the rest of
the cache used for?, nothing? shouldnt it be more kv_max value or less
KV ratio?

I know it really depends on the enviroment (size, amount of IOS,
files...) but all of this data seems to me a little too odd and not
razonable.

Is there any way I can make an aprox about how much KV and metadata is
generating for GB of actual data?

Does it make anypoint to left some cache for the Data itself or its
better to just store ONodes and metadata?

Another little question
I dont really understand how BlueStore gets its speed from cause it
actually writes the data directly to the end device (not like FS where
you had a journal), then... shouldnt speed be limitated for that device
write speed even having a SSD for RocksDB?

Thanks a lot.


*Jorge Pinilla López*
jorp...@unizar.es

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com