Re: [ceph-users] Bluestore HDD Cluster Advice

2019-02-23 Thread Vitaliy Filippov

Hello,

What IO size are you testing, Bluestore will only defer writes under  
32kb is size by default. Unless you are writing sequentially,
only limited amount of buffering via SSD is going to help, you will  
eventually hit the limits of the disk. Could you share some more

details as I'm interested in in this topic as well.


I'm testing 4kb random writes, mostly with iodepth=1 (single-thread  
latency test). This is the main case which is expected to be sped up by  
the SSD journal and also the worst case for SDS's :).


Interesting, will have to investigate this further!!! I wish there were  
more details around this technology from HGST


It's simple to test yourself - similar thing is currently common in SMR  
drives. Pick a random cheap 2.5" 1TB Seagate SMR HDD and test it with fio  
with one of `sync` or `fsync` options and iodepth=32 - you'll see it  
handles more than 1000 random 4Kb write iops. It only handles so much  
until its buffer is full of course. When I tested one of these I found  
that the buffer was 8 GB. After writing 8 GB the performance drops to  
~30-50 iops, and when the drive is idle it starts to flush the buffer.  
This process takes a lot of time if the buffer is full (several hours).


The difference between 2.5 SMR seagates and HGSTs is that HGSTs only  
enable "media cache" when the volatile cache is disabled (which was a real  
surprise to me), and SMRs keep it enabled all the time.


But the thing that really confused me was that Bluestore random write  
performance - even single-threaded write performance (latency test) -  
changed when I altered the parameter of the DATA device (not journal)! WHY  
was it affected? Based on common sense and bluestore's documentation  
random deferred write commit time when the system is not under load (and  
with iodepth=1 it isn't) should only depend on the WAL device performance!  
But it's also affected by the data device which tells us there is some  
problem in the bluestore's implementation.



At the same time, deferred writes slightly help performance when you
don't have SSD. But the difference we talking is like tens of iops (30
vs 40), so it's not noticeable in the SSD era :).


What size IO's are these you are testing with? I see a difference going  
from around 50IOPs up to over a thousand for a single

threaded 4kb sequential test.


4Kb random writes. The numbers of 30-40 iops are from small HDD-only  
clusters (one 12x on 3 hosts, one 4x on ONE host - "scrap-ceph", home  
version :)). I've tried to play with prefer_deferred_size_hdd there and  
discovered that it had very little impact on random 4kb iodepth=128 iops.  
Which I think is slightly counter-intuitive because the expectation is  
that the deferred writes should increase random iops.


Careful here, Bluestore will only migrate the next level of its DB if it  
can fit the entire DB on the flash device. These cutoff's
are around 3GB,30GB,300GB by default, so anything in-between will not be  
used. In your example a 20GB flash partition will mean that
a large amount of RocksDB will end up on the spinning disk  
(slowusedBytes)


Thanks, I didn't know that... I rechecked - all my 8TB osds with 20GB  
partitions migrated their DBs to slow devices again. Previously I moved  
them to SSDs with rebased Igor Fedotov's ceph-bluestool ... oops :)  
ceph-bluestore-tool. Although I still don't understand where the number 3  
comes from? Ceph's default bluestore_rocksdb_options states there are  
4*256MB memtables, it's 1GB, not 3...


--
With best regards,
  Vitaliy Filippov
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bluestore HDD Cluster Advice

2019-02-22 Thread Nick Fisk
>Yes and no... bluestore seems to not work really optimal. For example,
>it has no filestore-like journal waterlining and flushes the deferred
>write queue just every 32 writes (deferred_batch_ops). And when it does
>that it's basically waiting for the HDD to commit and slowing down all
>further writes. And even worse, I found it to be basically untunable
>because when I tried to increase that limit to 1024 - OSDs waited for
>1024 writes to accumulate and then started to flush them in one batch
>which led to a HUGE write stall (tens of seconds). Commiting every 32
>writes is probably good for the thing they gently call "tail latency"
>(sudden latency spikes!) But it has the downside of that the latency is
>just consistently high :-P (ok, consistently average). 

What IO size are you testing, Bluestore will only defer writes under 32kb is 
size by default. Unless you are writing sequentially,
only limited amount of buffering via SSD is going to help, you will eventually 
hit the limits of the disk. Could you share some more
details as I'm interested in in this topic as well.

>
>In my small cluster with HGST drives and Intel SSDs for WAL+DB I've
>found the single-thread write latency (fio -iodepth=1 -ioengine=rbd) to
>be similar to a cluster without SSDs at all, it gave me only ~40-60
>iops. As I understand this is exactly because bluestore is flushing data
>each 32 writes and waiting for HDDs to commit all the time. One thing
>that helped me a lot was to disable the drives' volatile write cache
>(`hdparm -W 0 /dev/sdXX`). After doing that I have ~500-600 iops for the
>single-thread load! Which looks like it's finally committing data using
>the WAL correctly. My guess is that this is because HGST drives, in
>addition to a normal volatile write cache, have the thing called "Media
>Cache" which allows the HDD to acknowledge random writes by writing them
>to a temporary place on the platters without doing much seeks, and this
>thing gets enabled only when you disable the volatile cache. 

Interesting, will have to investigate this further!!! I wish there were more 
details around this technology from HGST

>
>At the same time, deferred writes slightly help performance when you
>don't have SSD. But the difference we talking is like tens of iops (30
>vs 40), so it's not noticeable in the SSD era :).

What size IO's are these you are testing with? I see a difference going from 
around 50IOPs up to over a thousand for a single
threaded 4kb sequential test.

>
>So - in theory yes, deferred writes should be acknowledged by the WAL.
>In practice, bluestore is a big mess of threads, locks and extra writes,
>so this is not always so. In fact, I would recommend you trying bcache
>as an option, it may work better, although I've not tested it myself yet
>:-) 
>
>What about the size of WAL/DB: 
>
>1) you don't need to put them on separate partitions, bluestore
>automatically allocates the available space 
>
>2) 8TB disks only take 16-17 GB for WAL+DB in my case. SSD partitions I
>have allocated for OSDs are just 20GB and it's also OK because bluestore
>can move parts of its DB to the main data device when it runs out of
>space on SSD partition.

Careful here, Bluestore will only migrate the next level of its DB if it can 
fit the entire DB on the flash device. These cutoff's
are around 3GB,30GB,300GB by default, so anything in-between will not be used. 
In your example a 20GB flash partition will mean that
a large amount of RocksDB will end up on the spinning disk (slowusedBytes)

>
>14 февраля 2019 г. 6:40:35 GMT+03:00, John Petrini
> пишет: 
>
>> Okay that makes more sense, I didn't realize the WAL functioned in a similar 
>> manner to filestore journals (though now that I've
had another read of Sage's blog post, New in Luminous: BlueStore, I notice he 
does >cover this). Is this to say that writes are
acknowledged as soon as they hit the WAL?
>> 
>> Also this raises another question regarding sizing. The Ceph documentation 
>> suggests allocating as much available space as
possible to blocks.db but what about WAL? We'll have 120GB per OSD available on 
each >SSD. Any suggestion on how we might divvy that
between the WAL and DB?


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bluestore HDD Cluster Advice

2019-02-14 Thread vitalif
Yes and no... bluestore seems to not work really optimal. For example,
it has no filestore-like journal waterlining and flushes the deferred
write queue just every 32 writes (deferred_batch_ops). And when it does
that it's basically waiting for the HDD to commit and slowing down all
further writes. And even worse, I found it to be basically untunable
because when I tried to increase that limit to 1024 - OSDs waited for
1024 writes to accumulate and then started to flush them in one batch
which led to a HUGE write stall (tens of seconds). Commiting every 32
writes is probably good for the thing they gently call "tail latency"
(sudden latency spikes!) But it has the downside of that the latency is
just consistently high :-P (ok, consistently average). 

In my small cluster with HGST drives and Intel SSDs for WAL+DB I've
found the single-thread write latency (fio -iodepth=1 -ioengine=rbd) to
be similar to a cluster without SSDs at all, it gave me only ~40-60
iops. As I understand this is exactly because bluestore is flushing data
each 32 writes and waiting for HDDs to commit all the time. One thing
that helped me a lot was to disable the drives' volatile write cache
(`hdparm -W 0 /dev/sdXX`). After doing that I have ~500-600 iops for the
single-thread load! Which looks like it's finally committing data using
the WAL correctly. My guess is that this is because HGST drives, in
addition to a normal volatile write cache, have the thing called "Media
Cache" which allows the HDD to acknowledge random writes by writing them
to a temporary place on the platters without doing much seeks, and this
thing gets enabled only when you disable the volatile cache. 

At the same time, deferred writes slightly help performance when you
don't have SSD. But the difference we talking is like tens of iops (30
vs 40), so it's not noticeable in the SSD era :).

So - in theory yes, deferred writes should be acknowledged by the WAL.
In practice, bluestore is a big mess of threads, locks and extra writes,
so this is not always so. In fact, I would recommend you trying bcache
as an option, it may work better, although I've not tested it myself yet
:-) 

What about the size of WAL/DB: 

1) you don't need to put them on separate partitions, bluestore
automatically allocates the available space 

2) 8TB disks only take 16-17 GB for WAL+DB in my case. SSD partitions I
have allocated for OSDs are just 20GB and it's also OK because bluestore
can move parts of its DB to the main data device when it runs out of
space on SSD partition.

14 февраля 2019 г. 6:40:35 GMT+03:00, John Petrini
 пишет: 

> Okay that makes more sense, I didn't realize the WAL functioned in a similar 
> manner to filestore journals (though now that I've had another read of Sage's 
> blog post, New in Luminous: BlueStore, I notice he does cover this). Is this 
> to say that writes are acknowledged as soon as they hit the WAL?
> 
> Also this raises another question regarding sizing. The Ceph documentation 
> suggests allocating as much available space as possible to blocks.db but what 
> about WAL? We'll have 120GB per OSD available on each SSD. Any suggestion on 
> how we might divvy that between the WAL and DB?___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bluestore HDD Cluster Advice

2019-02-13 Thread Wido den Hollander



On 2/14/19 4:40 AM, John Petrini wrote:
> Okay that makes more sense, I didn't realize the WAL functioned in a
> similar manner to filestore journals (though now that I've had another
> read of Sage's blog post, New in Luminous: BlueStore, I notice he does
> cover this). Is this to say that writes are acknowledged as soon as they
> hit the WAL?
> 

Depends on the size of the write. This is different for SSD and HDD, but
only small writes will be ACK'ed by the WAL, larger writes need to wait
on the backing device.

> Also this raises another question regarding sizing. The Ceph
> documentation suggests allocating as much available space as possible to
> blocks.db but what about WAL? We'll have 120GB per OSD available on each
> SSD. Any suggestion on how we might divvy that between the WAL and DB?
> 

1GB for the WAL is sufficient. Only RocksDB writes go there, there is no
need to increase it.

You want to give as much to the DB as possible in your situation.

> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bluestore HDD Cluster Advice

2019-02-13 Thread John Petrini
Okay that makes more sense, I didn't realize the WAL functioned in a
similar manner to filestore journals (though now that I've had another read
of Sage's blog post, New in Luminous: BlueStore, I notice he does cover
this). Is this to say that writes are acknowledged as soon as they hit the
WAL?

Also this raises another question regarding sizing. The Ceph documentation
suggests allocating as much available space as possible to blocks.db but
what about WAL? We'll have 120GB per OSD available on each SSD. Any
suggestion on how we might divvy that between the WAL and DB?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bluestore HDD Cluster Advice

2019-02-13 Thread Vitaliy Filippov

Hello,

We'll soon be building out four new luminous clusters with Bluestore.
Our current clusters are running filestore so we're not very familiar
with Bluestore yet and I'd like to have an idea of what to expect.

Here are the OSD hardware specs (5x per cluster):
2x 3.0GHz 18c/36t
22x 1.8TB 10K SAS (RAID1 OS + 20 OSD's)
5x 480GB Intel S4610 SSD's (WAL and DB)
192 GB RAM
4X Mellanox 25GB NIC
PERC H730p

With filestore we've found that we can achieve sub-millisecond write
latency by running very fast journals (currently Intel S4610's). My
main concern is that Bluestore doesn't use journals and instead writes
directly to the higher latency HDD; in theory resulting in slower acks
and higher write latency. How does Bluestore handle this? Can we
expect similar or better performance then our current filestore
clusters?

I've heard it repeated that Bluestore performs better than Filestore
but I've also heard some people claiming this is not always the case
with HDD's. Is there any truth to that and if so is there a
configuration we can use to achieve this same type of performance with
Bluestore?


Bluestore does use journals for small writes and doesn't for big ones. You  
can try to disable "small writes" by increasing  
bluestore_prefer_deferred_size, but it's generally pointless because in  
Bluestore the "journal" is RocksDB's journal (WAL) which creates way too  
much extra write amplification when big data chunks are put into it. This  
creates extra load for SSDs and write performance does not increase when  
compared to the default.


Bluestore is always better in terms of linear write throughput because it  
has no double-write for big data chunks. But it's roughly on par, and  
sometimes may even be slightly worse than filestore, in terms of 4K random  
writes.


--
With best regards,
  Vitaliy Filippov
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bluestore HDD Cluster Advice

2019-02-13 Thread John Petrini
Anyone have any insight to offer here? Also I'm now curious to hear
about experiences with 512e vs 4kn drives.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bluestore HDD Cluster Advice

2019-02-02 Thread John Petrini
Hi Martin,

Hardware has already been aquired and was spec'd to mostly match our
current clusters which perform very well for us. I'm really just hoping to
hear from anyone who may have experience moving from filestore => bluestore
with and HDD cluster. Obviously we'll be doing testing but it's always
helpful to hear firsthand experience.

That said there is reasoning behind our choices.

CPU: Buys us some additional horsepower for collocating RGW. We run 12c
currently and they stay very busy. Since we're adding an additional
workload it seemed warranted.
Memory: The Intel Procs in the R740's are 6 channel instead of 4 so the
bump to 192GB was the result of that change. We run 128 today.
NIC's: A few reasons

   - Microbursts, our workload seems to generate them pretty regularly and
   we've had a tough time taming them using buffers, 25G should eliminate that
   even though we won't ever use the sustained bandwidth.
   - Port waste: We're running large compute nodes so the choice came down
   to 4x10G or 2x25G per compute. The switches are more expensive (though not
   terribly) but we get the benefit of using fewer ports.
   - Features: The 25G switches support some features we were looking for
   such as EVPN, VxLAN etc.

Disk: 512e. I'm interested to hear about the performance difference here.
Does Ceph not recognize the physical sector size as being 4k?

Thanks,
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bluestore HDD Cluster Advice

2019-02-02 Thread Alan Johnson
If this is Skylake the 6 channel memory architecture lends itself better to 
configs such as 192GB (6 x 32) so yes even though 128GB is most likely 
sufficient usng (6 x 16GB) might be too small.

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Martin 
Verges
Sent: Saturday, February 2, 2019 2:19 AM
To: John Petrini 
Cc: ceph-users 
Subject: Re: [ceph-users] Bluestore HDD Cluster Advice

Hello John,

you don't need such a big CPU, save yourself some money with a 12c/24t and 
invest it in better / more disks. Same goes for memory, 128G would be enough. 
Why do you install 4x 25G NIC, hard disks won't be able to use that capacity?

In addition, you can use the 2 disks for OSDs and not OS if you choose croit 
for system management meaning 10 more OSDs in your small cluster for better 
performance and a lot easier to manage. The best part of it, this feature comes 
with our complete free version, so it is just a gain on your side! Try it out

Please make sure to buy the right disks, there is a huge performance gap 
between 512e and 4Kn drives but near to no price difference. Bluestore does 
perform better then filestore in most environments, but as always depending on 
your specific workload. I would not recommend to even considering a filestore 
osd anymore, instead buy the correct hardware for your use case and configure 
the cluster accordingly.

--
Martin Verges
Managing director

Mobile: +49 174 9335695
E-Mail: martin.ver...@croit.io<mailto:martin.ver...@croit.io>
Chat: https://t.me/MartinVerges 
[t.me]<https://urldefense.proofpoint.com/v2/url?u=https-3A__t.me_MartinVerges=DwMFaQ=4DxX-JX0i28X6V65hK0ftwVK1xnmwcYC0vo7GVya1JY=sgFiQgvQASiGFaHpitF5P9M9QDCRkgKGttwwMFt2VIU=ohS7IYqetyezevBDf-oyQXOKt8D31VDZwNDVsVUI7VY=ASB_kEm4l5zlt3_5uJZ3UDoe1L8vasCPv1x78H6XoHQ=>

croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263

Web: https://croit.io 
[croit.io]<https://urldefense.proofpoint.com/v2/url?u=https-3A__croit.io=DwMFaQ=4DxX-JX0i28X6V65hK0ftwVK1xnmwcYC0vo7GVya1JY=sgFiQgvQASiGFaHpitF5P9M9QDCRkgKGttwwMFt2VIU=ohS7IYqetyezevBDf-oyQXOKt8D31VDZwNDVsVUI7VY=0SP-tpSDy2oPK5UJ63wn5BHvBia1OpKctr0B1_Pba7k=>
YouTube: https://goo.gl/PGE1Bx 
[goo.gl]<https://urldefense.proofpoint.com/v2/url?u=https-3A__goo.gl_PGE1Bx=DwMFaQ=4DxX-JX0i28X6V65hK0ftwVK1xnmwcYC0vo7GVya1JY=sgFiQgvQASiGFaHpitF5P9M9QDCRkgKGttwwMFt2VIU=ohS7IYqetyezevBDf-oyQXOKt8D31VDZwNDVsVUI7VY=QmGhkKLvuJyXNq2ZnbO9t7OsXLzdABQXeesx2W2Kih0=>


Am Fr., 1. Feb. 2019 um 18:26 Uhr schrieb John Petrini 
mailto:jpetr...@coredial.com>>:
Hello,

We'll soon be building out four new luminous clusters with Bluestore.
Our current clusters are running filestore so we're not very familiar
with Bluestore yet and I'd like to have an idea of what to expect.

Here are the OSD hardware specs (5x per cluster):
2x 3.0GHz 18c/36t
22x 1.8TB 10K SAS (RAID1 OS + 20 OSD's)
5x 480GB Intel S4610 SSD's (WAL and DB)
192 GB RAM
4X Mellanox 25GB NIC
PERC H730p

With filestore we've found that we can achieve sub-millisecond write
latency by running very fast journals (currently Intel S4610's). My
main concern is that Bluestore doesn't use journals and instead writes
directly to the higher latency HDD; in theory resulting in slower acks
and higher write latency. How does Bluestore handle this? Can we
expect similar or better performance then our current filestore
clusters?

I've heard it repeated that Bluestore performs better than Filestore
but I've also heard some people claiming this is not always the case
with HDD's. Is there any truth to that and if so is there a
configuration we can use to achieve this same type of performance with
Bluestore?

Thanks all.
___
ceph-users mailing list
ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
[lists.ceph.com]<https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ceph.com_listinfo.cgi_ceph-2Dusers-2Dceph.com=DwMFaQ=4DxX-JX0i28X6V65hK0ftwVK1xnmwcYC0vo7GVya1JY=sgFiQgvQASiGFaHpitF5P9M9QDCRkgKGttwwMFt2VIU=ohS7IYqetyezevBDf-oyQXOKt8D31VDZwNDVsVUI7VY=r_UsqRUSVcUDBK6VVT8xdz4zNlR2EArgfG8qG45ypqM=>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bluestore HDD Cluster Advice

2019-02-02 Thread Martin Verges
Hello John,

you don't need such a big CPU, save yourself some money with a 12c/24t and
invest it in better / more disks. Same goes for memory, 128G would be
enough. Why do you install 4x 25G NIC, hard disks won't be able to use that
capacity?

In addition, you can use the 2 disks for OSDs and not OS if you choose
croit for system management meaning 10 more OSDs in your small cluster for
better performance and a lot easier to manage. The best part of it, this
feature comes with our complete free version, so it is just a gain on your
side! Try it out

Please make sure to buy the right disks, there is a huge performance gap
between 512e and 4Kn drives but near to no price difference. Bluestore does
perform better then filestore in most environments, but as always depending
on your specific workload. I would not recommend to even considering a
filestore osd anymore, instead buy the correct hardware for your use case
and configure the cluster accordingly.

--
Martin Verges
Managing director

Mobile: +49 174 9335695
E-Mail: martin.ver...@croit.io
Chat: https://t.me/MartinVerges

croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263

Web: https://croit.io
YouTube: https://goo.gl/PGE1Bx


Am Fr., 1. Feb. 2019 um 18:26 Uhr schrieb John Petrini <
jpetr...@coredial.com>:

> Hello,
>
> We'll soon be building out four new luminous clusters with Bluestore.
> Our current clusters are running filestore so we're not very familiar
> with Bluestore yet and I'd like to have an idea of what to expect.
>
> Here are the OSD hardware specs (5x per cluster):
> 2x 3.0GHz 18c/36t
> 22x 1.8TB 10K SAS (RAID1 OS + 20 OSD's)
> 5x 480GB Intel S4610 SSD's (WAL and DB)
> 192 GB RAM
> 4X Mellanox 25GB NIC
> PERC H730p
>
> With filestore we've found that we can achieve sub-millisecond write
> latency by running very fast journals (currently Intel S4610's). My
> main concern is that Bluestore doesn't use journals and instead writes
> directly to the higher latency HDD; in theory resulting in slower acks
> and higher write latency. How does Bluestore handle this? Can we
> expect similar or better performance then our current filestore
> clusters?
>
> I've heard it repeated that Bluestore performs better than Filestore
> but I've also heard some people claiming this is not always the case
> with HDD's. Is there any truth to that and if so is there a
> configuration we can use to achieve this same type of performance with
> Bluestore?
>
> Thanks all.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Bluestore HDD Cluster Advice

2019-02-01 Thread John Petrini
Hello,

We'll soon be building out four new luminous clusters with Bluestore.
Our current clusters are running filestore so we're not very familiar
with Bluestore yet and I'd like to have an idea of what to expect.

Here are the OSD hardware specs (5x per cluster):
2x 3.0GHz 18c/36t
22x 1.8TB 10K SAS (RAID1 OS + 20 OSD's)
5x 480GB Intel S4610 SSD's (WAL and DB)
192 GB RAM
4X Mellanox 25GB NIC
PERC H730p

With filestore we've found that we can achieve sub-millisecond write
latency by running very fast journals (currently Intel S4610's). My
main concern is that Bluestore doesn't use journals and instead writes
directly to the higher latency HDD; in theory resulting in slower acks
and higher write latency. How does Bluestore handle this? Can we
expect similar or better performance then our current filestore
clusters?

I've heard it repeated that Bluestore performs better than Filestore
but I've also heard some people claiming this is not always the case
with HDD's. Is there any truth to that and if so is there a
configuration we can use to achieve this same type of performance with
Bluestore?

Thanks all.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com