Re: [ceph-users] bucket indices: ssd-only or is a large fast block.db sufficient?

2018-11-20 Thread Mark Nelson
One consideration is that you may not be able to fit higher DB levels on 
the db partition and end up with a lot of waste (Nick Fisk recently saw 
this on his test cluster).  We've talked about potentially trying to 
pre-compute the hierarchy sizing so that we can align a level boundary 
to fit within the db partition size. I'm concerned there could be some 
unintended consequences (IE having a media transition and a write-amp 
jump hit at the same time)  I tend to wonder if we should focus on 
either DB or column family sharding and just get some fraction of the 
high level SSTs from different shards on the db partition.


I think you could probably make either configuration work, and there are 
advantages and disadvantages to each approach. (sizing, complexity, 
write-amp, etc).  If you go for the 2nd option, you probably still want 
some portion of the SSDs carved out for DB/WAL for the data pool which 
would shrink how much you'd have available for the flash-only OSDs.  One 
point I do want to bring up is that we're considering experimenting with 
layering bucket index pools on top of objects rather than using OMAP.  
No idea if that will pan out (or even how far we'll get), but if that 
ends up being a win, you might prefer the second approach as the objects 
would end up on flash.  The 2nd approach is also the only option as far 
as filestore goes, though I'm not sure if that really matters to you guys.



Mark

On 11/20/18 8:48 AM, Gregory Farnum wrote:
Looks like you’ve considered the essential points for bluestore OSDs, 
yep. :)
My concern would just be the surprisingly-large block.db requirements 
for rgw workloads that have been brought up. (300+GB per OSD, I think 
someone saw/worked out?).

-Greg

On Tue, Nov 20, 2018 at 1:35 AM Dan van der Ster > wrote:


Hi ceph-users,

Most of our servers have 24 hdds plus 4 ssds.
Any experience how these should be configured to get the best rgw
performance?

We have two options:
   1) All osds the same, with data on the hdd and block.db on a 40GB
ssd partition
   2) Two osd device types: hdd-only for the rgw data pool and
ssd-only for bucket index pool

But all of the bucket index data is in omap, right?
And all of the omap is stored in the rocks db, right?

After reading the recent threads about bluefs slow_used_bytes, I had
the thought that as long as we have a large enough block.db, then
slow_used_bytes will be 0 and all of the bucket indexes will be on
ssd-only, regardless of option (1) or (2) above.

Any thoughts?

Thanks!

Dan
___
ceph-users mailing list
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] bucket indices: ssd-only or is a large fast block.db sufficient?

2018-11-20 Thread Gregory Farnum
Looks like you’ve considered the essential points for bluestore OSDs, yep.
:)
My concern would just be the surprisingly-large block.db requirements for
rgw workloads that have been brought up. (300+GB per OSD, I think someone
saw/worked out?).
-Greg

On Tue, Nov 20, 2018 at 1:35 AM Dan van der Ster  wrote:

> Hi ceph-users,
>
> Most of our servers have 24 hdds plus 4 ssds.
> Any experience how these should be configured to get the best rgw
> performance?
>
> We have two options:
>1) All osds the same, with data on the hdd and block.db on a 40GB
> ssd partition
>2) Two osd device types: hdd-only for the rgw data pool and
> ssd-only for bucket index pool
>
> But all of the bucket index data is in omap, right?
> And all of the omap is stored in the rocks db, right?
>
> After reading the recent threads about bluefs slow_used_bytes, I had
> the thought that as long as we have a large enough block.db, then
> slow_used_bytes will be 0 and all of the bucket indexes will be on
> ssd-only, regardless of option (1) or (2) above.
>
> Any thoughts?
>
> Thanks!
>
> Dan
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] bucket indices: ssd-only or is a large fast block.db sufficient?

2018-11-20 Thread Dan van der Ster
Hi ceph-users,

Most of our servers have 24 hdds plus 4 ssds.
Any experience how these should be configured to get the best rgw performance?

We have two options:
   1) All osds the same, with data on the hdd and block.db on a 40GB
ssd partition
   2) Two osd device types: hdd-only for the rgw data pool and
ssd-only for bucket index pool

But all of the bucket index data is in omap, right?
And all of the omap is stored in the rocks db, right?

After reading the recent threads about bluefs slow_used_bytes, I had
the thought that as long as we have a large enough block.db, then
slow_used_bytes will be 0 and all of the bucket indexes will be on
ssd-only, regardless of option (1) or (2) above.

Any thoughts?

Thanks!

Dan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com