Re: [ceph-users] optimize bluestore for random write i/o

2019-03-12 Thread vitalif
I bet you'd see better memstore results with my vector based object implementation instead of bufferlists. Where can I find it? Nick Fisk noticed the same thing you did.  One interesting observation he made was that disabling CPU C/P states helped bluestore immensely in the iodepth=1 case. T

Re: [ceph-users] optimize bluestore for random write i/o

2019-03-12 Thread Mark Nelson
On 3/12/19 8:40 AM, vita...@yourcmc.ru wrote: One way or another we can only have a single thread sending writes to rocksdb.  A lot of the prior optimization work on the write side was to get as much processing out of the kv_sync_thread as possible. That's still a worthwhile goal as it's typical

Re: [ceph-users] optimize bluestore for random write i/o

2019-03-12 Thread vitalif
One way or another we can only have a single thread sending writes to rocksdb.  A lot of the prior optimization work on the write side was to get as much processing out of the kv_sync_thread as possible.  That's still a worthwhile goal as it's typically what bottlenecks with high amounts of concur

Re: [ceph-users] optimize bluestore for random write i/o

2019-03-12 Thread Mark Nelson
On 3/12/19 7:31 AM, vita...@yourcmc.ru wrote: Decreasing the min_alloc size isn't always a win, but ican be in some cases.  Originally bluestore_min_alloc_size_ssd was set to 4096 but we increased it to 16384 because at the time our metadata path was slow and increasing it resulted in a pretty s

Re: [ceph-users] optimize bluestore for random write i/o

2019-03-12 Thread vitalif
Decreasing the min_alloc size isn't always a win, but ican be in some cases.  Originally bluestore_min_alloc_size_ssd was set to 4096 but we increased it to 16384 because at the time our metadata path was slow and increasing it resulted in a pretty significant performance win (along with increasin

Re: [ceph-users] optimize bluestore for random write i/o

2019-03-06 Thread Stefan Priebe - Profihost AG
Am 06.03.19 um 14:08 schrieb Mark Nelson: > > On 3/6/19 5:12 AM, Stefan Priebe - Profihost AG wrote: >> Hi Mark, >> Am 05.03.19 um 23:12 schrieb Mark Nelson: >>> Hi Stefan, >>> >>> >>> Could you try running your random write workload against bluestore and >>> then take a wallclock profile of an O

Re: [ceph-users] optimize bluestore for random write i/o

2019-03-06 Thread Mark Nelson
On 3/6/19 5:12 AM, Stefan Priebe - Profihost AG wrote: Hi Mark, Am 05.03.19 um 23:12 schrieb Mark Nelson: Hi Stefan, Could you try running your random write workload against bluestore and then take a wallclock profile of an OSD using gdbpmp? It's available here: https://github.com/markhpc/g

Re: [ceph-users] optimize bluestore for random write i/o

2019-03-06 Thread Mark Nelson
On 3/5/19 4:23 PM, Vitaliy Filippov wrote: Testing -rw=write without -sync=1 or -fsync=1 (or -fsync=32 for batch IO, or just fio -ioengine=rbd from outside a VM) is rather pointless - you're benchmarking the RBD cache, not Ceph itself. RBD cache is coalescing your writes into big sequential wr

Re: [ceph-users] optimize bluestore for random write i/o

2019-03-06 Thread Stefan Priebe - Profihost AG
Hi Mark, Am 05.03.19 um 23:12 schrieb Mark Nelson: > Hi Stefan, > > > Could you try running your random write workload against bluestore and > then take a wallclock profile of an OSD using gdbpmp? It's available here: > > > https://github.com/markhpc/gdbpmp sure but it does not work: # ./gdb

Re: [ceph-users] optimize bluestore for random write i/o

2019-03-05 Thread Vitaliy Filippov
Testing -rw=write without -sync=1 or -fsync=1 (or -fsync=32 for batch IO, or just fio -ioengine=rbd from outside a VM) is rather pointless - you're benchmarking the RBD cache, not Ceph itself. RBD cache is coalescing your writes into big sequential writes. Of course bluestore is faster in thi

Re: [ceph-users] optimize bluestore for random write i/o

2019-03-05 Thread Mark Nelson
Hi Stefan, Could you try running your random write workload against bluestore and then take a wallclock profile of an OSD using gdbpmp? It's available here: https://github.com/markhpc/gdbpmp Thanks, Mark On 3/5/19 2:29 AM, Stefan Priebe - Profihost AG wrote: Hello list, while the perf

Re: [ceph-users] optimize bluestore for random write i/o

2019-03-05 Thread Stefan Priebe - Profihost AG
Am 05.03.19 um 10:05 schrieb Paul Emmerich: > This workload is probably bottlenecked by rocksdb (since the small > writes are buffered there), so that's probably what needs tuning here. while reading: https://www.flashmemorysummit.com/English/Collaterals/Proceedings/2018/20180807_INVT-101A-1_Mered

Re: [ceph-users] optimize bluestore for random write i/o

2019-03-05 Thread Paul Emmerich
This workload is probably bottlenecked by rocksdb (since the small writes are buffered there), so that's probably what needs tuning here. Paul -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel:

[ceph-users] optimize bluestore for random write i/o

2019-03-05 Thread Stefan Priebe - Profihost AG
Hello list, while the performance of sequential writes 4k on bluestore is very high and even higher than filestore i was wondering what i can do to optimize random pattern as well. While using: fio --rw=write --iodepth=32 --ioengine=libaio --bs=4k --numjobs=4 --filename=/tmp/test --size=10G --run