The way I try to look at this is:
1) How much more do the enterprise grade drives cost?
2) What are the benefits? (Faster performance, longer life, etc)
3) How much does it cost to deal with downtime, diagnose issues, and
replace malfunctioning hardware?
My personal take is that enterprise
Ok, assuming my math is right you've got ~14G of data in the mempools.
~6.5GB bluestore data
~1.8GB bluestore onode
~5GB bluestore other
Rest is other misc stuff. That seems to be pretty inline with the
numbers you posted in your screenshot. IE this doesn't appear to be a
leak, but rather
Hi Philippe,
Have you looked at the mempool stats yet?
ceph daemon osd.NNN dump_mempools
You may also want to look at the heap stats, and potentially enable
debug 5 for bluestore to see what the priority cache manager is doing.
Typically in these cases we end up seeing a ton of memory use
On 8/26/19 7:39 AM, Wido den Hollander wrote:
On 8/26/19 1:35 PM, Simon Oosthoek wrote:
On 26-08-19 13:25, Simon Oosthoek wrote:
On 26-08-19 13:11, Wido den Hollander wrote:
The reweight might actually cause even more confusion for the balancer.
The balancer uses upmap mode and that re-allo
Hi Folks,
I've updated hsbench (new S3 benchmark) to 0.2
Notable changes since 0.1:
- Can now output CSV results
- Can now output JSON results
- Fix for poor read performance with low thread counts
- New bucket listing benchmark with a new "mk" flag that lets you
control the number of ke
Hi Vladimir,
On 8/21/19 8:54 AM, Vladimir Brik wrote:
Hello
I am running a Ceph 14.2.1 cluster with 3 rados gateways.
Periodically, radosgw process on those machines starts consuming 100%
of 5 CPU cores for days at a time, even though the machine is not
being used for data transfers (nothin
Mark
On 8/15/19 6:41 PM, David Byte wrote:
Mark, did the S3 engine for fio not work?
Sent from my iPhone. Typos are Apple's fault.
On Aug 15, 2019, at 6:37 PM, Mark Nelson wrote:
Hi Guys,
Earlier this week I was working on investigating the impact of OMAP performance
on RGW and w
Hi Folks,
The basic idea behind the WAL is that for every DB write transaction you
first write it into an in-memory buffer and to a region on disk.
RocksDB typically is setup to have multiple WAL buffers, and when one or
more fills up, it will start flushing the data to L0 while new writes
On 8/14/19 1:06 PM, solarflow99 wrote:
Actually standalone WAL is required when you have either very
small fast
device (and don't want db to use it) or three devices (different in
performance) behind OSD (e.g. hdd, ssd, nvme). So WAL is to be
located
at the fastest one.
On 8/13/19 3:51 PM, Paul Emmerich wrote:
On Tue, Aug 13, 2019 at 10:04 PM Wido den Hollander wrote:
I just checked an RGW-only setup. 6TB drive, 58% full, 11.2GB of DB in
use. No slow db in use.
random rgw-only setup here: 12TB drive, 77% full, 48GB metadata and
10GB omap for index and whatev
Hi Jaime,
we only use the cache size parameters now if you've disabled
autotuning. With autotuning we adjust the cache size on the fly to try
and keep the mapped process memory under the osd_memory_target. You can
set a lower memory target than default, though you will have far less
cache
You may be interested in using my wallclock profiler to look at lock
contention:
https://github.com/markhpc/gdbpmp
It will greatly slow down the OSD but will show you where time is being
spent and so far the results appear to at least be relatively
informative. I used it recently when refa
On 8/4/19 7:36 PM, Christian Balzer wrote:
Hello,
On Sun, 4 Aug 2019 06:34:46 -0500 Mark Nelson wrote:
On 8/4/19 6:09 AM, Paul Emmerich wrote:
On Sun, Aug 4, 2019 at 3:47 AM Christian Balzer wrote:
2. Bluestore caching still broken
When writing data with the fios below, it isn
On 8/4/19 6:09 AM, Paul Emmerich wrote:
On Sun, Aug 4, 2019 at 3:47 AM Christian Balzer wrote:
2. Bluestore caching still broken
When writing data with the fios below, it isn't cached on the OSDs.
Worse, existing cached data that gets overwritten is removed from the
cache, which while of cour
Hi Danny,
Are your arm binaries built using tcmalloc? At least on x86 we saw
significantly higher memory fragmentation and memory usage with glibc
malloc.
First, you can look at the mempool stats which may provide a hint:
ceph daemon osd.NNN dump_mempools
Assuming you are using tcmallo
On 7/25/19 9:27 PM, Anthony D'Atri wrote:
We run few hundred HDD OSDs for our backup cluster, we set one RAID 0 per HDD
in order to be able
to use -battery protected- write cache from the RAID controller. It really
improves performance, for both
bluestore and filestore OSDs.
Having run someth
FWIW, the DB and WAL don't really do the same thing that the cache tier
does. The WAL is similar to filestore's journal, and the DB is
primarily for storing metadata (onodes, blobs, extents, and OMAP data).
Offloading these things to an SSD will definitely help, but you won't
see the same kin
Hi Wei Zhao,
I've used ycsb for mongodb on rbd testing before. It worked fine and
was pretty straightforward to run. The only real concern I had was that
many of the default workloads used a zipfian distribution for reads.
This basically meant reads were entirely coming from cache and didn
Hi Brett,
Can you enable debug_bluestore = 5 and debug_prioritycache = 5 on one of
the OSDs that's showing the behavior? You'll want to look in the logs
for lines that look like this:
2019-07-18T19:34:42.587-0400 7f4048b8d700 5 prioritycache tune_memory
target: 4294967296 mapped: 4260962
Some of the first performance studies we did back at Inktank were
looking at RAID-0 vs JBOD setups! :) You are absolutely right that the
controller cache (especially write-back with a battery or supercap) can
help with HDD-only configurations. Where we typically saw problems was
when you load
Earlier in bluestore's life, we couldn't handle a 4K min_alloc size on
NVMe without incurring pretty significant slowdowns (and also generally
higher amounts of metadata in the DB). Lately I've been seeing some
indications that we've improved the stack to the point where 4K
min_alloc no longer
On 6/12/19 5:51 PM, Jorge Garcia wrote:
I'm following the bluestore config reference guide and trying to
change the value for osd_memory_target. I added the following entry in
the /etc/ceph/ceph.conf file:
[osd]
osd_memory_target = 2147483648
and restarted the osd daemons doing "systemct
The truth of the matter is that folks try to boil this down to some kind
of hard and fast rule but it's often not that simple. With our current
default settings for pglog, rocksdb WAL buffers, etc, the OSD basically
needs about 1GB of RAM for bare-bones operation (not under recovery or
extreme
On 5/3/19 1:38 AM, Denny Fuchs wrote:
hi,
I never recognized the Debian /etc/default/ceph :-)
=
# Increase tcmalloc cache size
TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=134217728
that is, what is active now.
Yep, if you profile the OSD under a small write workload you can see
On 5/2/19 1:51 PM, Igor Podlesny wrote:
On Fri, 3 May 2019 at 01:29, Mark Nelson wrote:
On 5/2/19 11:46 AM, Igor Podlesny wrote:
On Thu, 2 May 2019 at 05:02, Mark Nelson wrote:
[...]
FWIW, if you still have an OSD up with tcmalloc, it's probably worth
looking at the heap stats to se
On 5/2/19 11:46 AM, Igor Podlesny wrote:
On Thu, 2 May 2019 at 05:02, Mark Nelson wrote:
[...]
FWIW, if you still have an OSD up with tcmalloc, it's probably worth
looking at the heap stats to see how much memory tcmalloc thinks it's
allocated vs how much RSS memory is being u
On 5/1/19 12:59 AM, Igor Podlesny wrote:
On Tue, 30 Apr 2019 at 20:56, Igor Podlesny wrote:
On Tue, 30 Apr 2019 at 19:10, Denny Fuchs wrote:
[..]
Any suggestions ?
-- Try different allocator.
Ah, BTW, except memory allocator there's another option: recently
backported bitmap allocator.
Igor
Fri, Apr 12, 2019, 9:01 PM Mark Nelson <mailto:mnel...@redhat.com>> wrote:
Hi Charles,
Basically the goal is to reduce write-amplification as much as
possible. The deeper that the rocksdb hierarchy gets, the worse the
write-amplifcation for compaction is going to be
Hi Charles,
Basically the goal is to reduce write-amplification as much as
possible. The deeper that the rocksdb hierarchy gets, the worse the
write-amplifcation for compaction is going to be. If you look at the
OSD logs you'll see the write-amp factors for compaction in the rocksdb
compac
false positive.
Thanks, I continue to read your resources.
Le mardi 09 avril 2019 à 09:30 -0500, Mark Nelson a écrit :
My understanding is that basically the kernel is either unable or
uninterested (maybe due to lack of memory pressure?) in reclaiming
the
memory . It's possible you might
d release that ?
Thanks,
Olivier
Le lundi 08 avril 2019 à 16:09 -0500, Mark Nelson a écrit :
One of the difficulties with the osd_memory_target work is that we
can't
tune based on the RSS memory usage of the process. Ultimately it's up
to
the kernel to decide to reclaim memory and esp
One of the difficulties with the osd_memory_target work is that we can't
tune based on the RSS memory usage of the process. Ultimately it's up to
the kernel to decide to reclaim memory and especially with transparent
huge pages it's tough to judge what the kernel is going to do even if
memory h
On 3/20/19 3:12 AM, Vitaliy Filippov wrote:
`cpupower idle-set -D 0` will help you a lot, yes.
However it seems that not only the bluestore makes it slow. >= 50% of
the latency is introduced by the OSD itself. I'm just trying to
understand WHAT parts of it are doing so much work. For example i
On 3/12/19 8:40 AM, vita...@yourcmc.ru wrote:
One way or another we can only have a single thread sending writes to
rocksdb. A lot of the prior optimization work on the write side was
to get as much processing out of the kv_sync_thread as possible.
That's still a worthwhile goal as it's typical
Our default of 4 256MB WAL buffers is arguably already too big. On one
hand we are making these buffers large to hopefully avoid short lived
data going into the DB (pglog writes). IE if a pglog write comes in and
later a tombstone invalidating it comes in, we really want those to land
in the s
On 3/12/19 7:31 AM, vita...@yourcmc.ru wrote:
Decreasing the min_alloc size isn't always a win, but ican be in some
cases. Originally bluestore_min_alloc_size_ssd was set to 4096 but we
increased it to 16384 because at the time our metadata path was slow
and increasing it resulted in a pretty s
On 3/12/19 7:24 AM, Benjamin Zapiec wrote:
Hello,
i was wondering about ceph block.db to be nearly empty and I started
to investigate.
The recommendations from ceph are that block.db should be at least
4% the size of block. So my OSD configuration looks like this:
wal.db - not explicit spec
On 3/8/19 8:12 AM, Steffen Winther Sørensen wrote:
On 8 Mar 2019, at 14.30, Mark Nelson <mailto:mnel...@redhat.com>> wrote:
On 3/8/19 5:56 AM, Steffen Winther Sørensen wrote:
On 5 Mar 2019, at 10.02, Paul Emmerich <mailto:paul.emmer...@croit.io>> wrote:
Yeah, there
On 3/8/19 5:56 AM, Steffen Winther Sørensen wrote:
On 5 Mar 2019, at 10.02, Paul Emmerich wrote:
Yeah, there's a bug in 13.2.4. You need to set it to at least ~1.2GB.
Yeap thanks, setting it at 1G+256M worked :)
Hope this won’t bloat memory during coming weekend VM backups through CephFS
/
On 3/6/19 5:12 AM, Stefan Priebe - Profihost AG wrote:
Hi Mark,
Am 05.03.19 um 23:12 schrieb Mark Nelson:
Hi Stefan,
Could you try running your random write workload against bluestore and
then take a wallclock profile of an OSD using gdbpmp? It's available here:
https://github.com/ma
On 3/5/19 4:23 PM, Vitaliy Filippov wrote:
Testing -rw=write without -sync=1 or -fsync=1 (or -fsync=32 for batch
IO, or just fio -ioengine=rbd from outside a VM) is rather pointless -
you're benchmarking the RBD cache, not Ceph itself. RBD cache is
coalescing your writes into big sequential wr
Hi Stefan,
Could you try running your random write workload against bluestore and
then take a wallclock profile of an OSD using gdbpmp? It's available here:
https://github.com/markhpc/gdbpmp
Thanks,
Mark
On 3/5/19 2:29 AM, Stefan Priebe - Profihost AG wrote:
Hello list,
while the perf
y the OSD, but we've observed
higher write-amplification on our test nodes. I suspect that might be a
worthwhile trade-off for nvdimms or optane, but I'm not sure it's a good
idea for typical NVMe drives.
Mark
On Tue, Mar 5, 2019 at 5:35 PM Mark Nelson wrote:
Hi,
Hi,
I've got a ryzen7 1700 box that I regularly run tests on along with the
upstream community performance test nodes that have Intel Xeon E5-2650v3
processors in them. The Ryzen is 3.0GHz/3.7GHz turbo while the Xeons
are 2.3GHz/3.0GHz. The Xeons are quite a bit faster clock/clock in the
t
FWIW, I've got recent tests of a fairly recent master build
(14.0.1-3118-gd239c2a) showing a single OSD hitting ~33-38K 4k randwrite
IOPS with 3 client nodes running fio (io_depth = 32) both with RBD and
with CephFS. The OSD node had older gen CPUs (Xeon E5-2650 v3) and NVMe
drives (Intel P370
On 1/30/19 7:45 AM, Alexandre DERUMIER wrote:
I don't see any smoking gun here... :/
I need to test to compare when latency are going very high, but I need to wait
more days/weeks.
The main difference between a warm OSD and a cold one is that on startup
the bluestore cache is empty. You mig
On 1/18/19 9:22 AM, Nils Fahldieck - Profihost AG wrote:
Hello Mark,
I'm answering on behalf of Stefan.
Am 18.01.19 um 00:22 schrieb Mark Nelson:
On 1/17/19 4:06 PM, Stefan Priebe - Profihost AG wrote:
Hello Mark,
after reading
http://docs.ceph.com/docs/master/rados/configuration/blue
On 1/17/19 4:06 PM, Stefan Priebe - Profihost AG wrote:
Hello Mark,
after reading
http://docs.ceph.com/docs/master/rados/configuration/bluestore-config-ref/
again i'm really confused how the behaviour is exactly under 12.2.8
regarding memory and 12.2.10.
Also i stumpled upon "When tcmalloc and
Hi Stefan,
I'm taking a stab at reproducing this in-house. Any details you can
give me that might help would be much appreciated. I'll let you know
what I find.
Thanks,
Mark
On 1/16/19 1:56 PM, Stefan Priebe - Profihost AG wrote:
i reverted the whole cluster back to 12.2.8 - recovery
Hi Stefan,
12.2.9 included the pg hard limit patches and the osd_memory_autotuning
patches. While at first I was wondering if this was autotuning, it
sounds like it may be more related to the pg hard limit. I'm not
terribly familiar with those patches though so some of the other members
fr
On 1/15/19 9:02 AM, Stefan Priebe - Profihost AG wrote:
Am 15.01.19 um 12:45 schrieb Marc Roos:
I upgraded this weekend from 12.2.8 to 12.2.10 without such issues
(osd's are idle)
it turns out this was a kernel bug. Updating to a newer kernel - has
solved this issue.
Greets,
Stefan
Hi
Hi Stefan,
Any idea if the reads are constant or bursty? One cause of heavy reads
is when rocksdb is compacting and has to read SST files from disk. It's
also possible you could see heavy read traffic during writes if data has
to be read from SST files rather than cache. It's possible this
Hi Florian,
On 12/13/18 7:52 AM, Florian Haas wrote:
On 02/12/2018 19:48, Florian Haas wrote:
Hi Mark,
just taking the liberty to follow up on this one, as I'd really like to
get to the bottom of this.
On 28/11/2018 16:53, Florian Haas wrote:
On 28/11/2018 15:52, Mark Nelson wrote:
O
Hi Tyler,
I think we had a user a while back that reported they had background
deletion work going on after upgrading their OSDs from filestore to
bluestore due to PGs having been moved around. Is it possible that your
cluster is doing a bunch of work (deletion or otherwise) beyond the
regul
On 11/28/18 8:36 AM, Florian Haas wrote:
On 14/08/2018 15:57, Emmanuel Lacour wrote:
Le 13/08/2018 à 16:58, Jason Dillaman a écrit :
See [1] for ways to tweak the bluestore cache sizes. I believe that by
default, bluestore will not cache any data but instead will only
attempt to cache its key/
Hi Robert,
Solved is probably a strong word. I'd say that things have improved.
Bluestore in general tends to handle large numbers of objects better
than filestore does for several reasons including that it doesn't suffer
from pg directory splitting (though RocksDB compaction can become a
One consideration is that you may not be able to fit higher DB levels on
the db partition and end up with a lot of waste (Nick Fisk recently saw
this on his test cluster). We've talked about potentially trying to
pre-compute the hierarchy sizing so that we can align a level boundary
to fit wit
On 11/14/18 1:45 PM, Vladimir Brik wrote:
Hello
I have a ceph 13.2.2 cluster comprised of 5 hosts, each with 16 HDDs
and 4 SSDs. HDD OSDs have about 50 PGs each, while SSD OSDs have about
400 PGs each (a lot more pools use SSDs than HDDs). Servers are fairly
powerful: 48 HT cores, 192GB of R
FWIW, here are values I measured directly from the RocksDB SST files
under different small write workloads (ie the ones where you'd expect a
larger DB footprint):
https://drive.google.com/file/d/1Ews2WR-y5k3TMToAm0ZDsm7Gf_fwvyFw/view?usp=sharing
These tests were only with 256GB of data written
On 09/10/2018 12:22 PM, Igor Fedotov wrote:
Hi Nick.
On 9/10/2018 1:30 PM, Nick Fisk wrote:
If anybody has 5 minutes could they just clarify a couple of things
for me
1. onode count, should this be equal to the number of objects stored
on the OSD?
Through reading several posts, there seems
I believe that the standard mechanisms for launching OSDs already sets
the thread cache higher than default. It's possible we might be able to
relax that now as async messenger doesn't thrash the cache as badly as
simple messenger did. I suspect there's probably still some value to
increasing
Hi Uwe,
As luck would have it we were just looking at memory allocators again
and ran some quick RBD and RGW tests that stress memory allocation:
https://drive.google.com/uc?export=download&id=1VlWvEDSzaG7fE4tnYfxYtzeJ8mwx4DFg
The gist of it is that tcmalloc looks like it's doing pretty we
On 04/01/2018 07:59 PM, Christian Balzer wrote:
Hello,
firstly, Jack pretty much correctly correlated my issues to Mark's points,
more below.
On Sat, 31 Mar 2018 08:24:45 -0500 Mark Nelson wrote:
On 03/29/2018 08:59 PM, Christian Balzer wrote:
Hello,
my crappy test cluster was ren
On 03/29/2018 08:59 PM, Christian Balzer wrote:
Hello,
my crappy test cluster was rendered inoperational by an IP renumbering
that wasn't planned and forced on me during a DC move, so I decided to
start from scratch and explore the fascinating world of Luminous/bluestore
and all the assorted bu
Personally I usually use a modified version of Mark Seger's getput tool
here:
https://github.com/markhpc/getput/tree/wip-fix-timing
The difference between this version and upstream is primarily to make
getput more accurate/useful when using something like CBT for
orchestration instead of the
On 11/20/2017 10:06 AM, Moreno, Orlando wrote:
Hi all,
I’ve been experiencing weird performance behavior when using FIO RBD
engine directly to an RBD volume with numjobs > 1. For a 4KB random
write test at 32 QD and 1 numjob, I can get about 40K IOPS, but when I
increase the numjobs to 4, it p
17 10:11 AM, Milanov, Radoslav Nikiforov wrote:
No,
What test parameters (iodepth/file size/numjobs) would make sense for 3
node/27OSD@4TB ?
- Rado
-Original Message-----
From: Mark Nelson [mailto:mnel...@redhat.com]
Sent: Thursday, November 16, 2017 10:56 AM
To: Milanov, Radoslav Nikiforov
PM
*To:* Milanov, Radoslav Nikiforov
*Cc:* Mark Nelson ; ceph-users@lists.ceph.com
*Subject:* Re: [ceph-users] Bluestore performance 50% of filestore
I'd probably say 50GB to leave some extra space over-provisioned. 50GB
should definitely prevent any DB operations from spilling over to th
.com] On Behalf Of Mark
Nelson
Sent: Tuesday, November 14, 2017 4:04 PM
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Bluestore performance 50% of filestore
Hi Radoslav,
Is RBD cache enabled and in writeback mode? Do you have client side readahead?
Both are doing better for writes th
Hi Radoslav,
Is RBD cache enabled and in writeback mode? Do you have client side
readahead?
Both are doing better for writes than you'd expect from the native
performance of the disks assuming they are typical 7200RPM drives and
you are using 3X replication (~150IOPS * 27 / 3 = ~1350 IOPS).
ning 1 OSD per physical drive or multiple..any recommendations ?
In those tests 1 OSD per NVMe. You can do better if you put multiple
OSDs on the same drive, both for filestore and bluestore.
Mark
Cheers /Maged
On 2017-11-10 18:51, Mark Nelson wrote:
FWIW, on very fast drives you can achiev
FWIW, on very fast drives you can achieve at least 1.4GB/s and 30K+
write IOPS per OSD (before replication). It's quite possible to do
better but those are recent numbers on a mostly default bluestore
configuration that I'm fairly confident to share. It takes a lot of
CPU, but it's possible.
icit WAL partition on
the same SSD?
Mensaje original
De: Nick Fisk
Fecha: 8/11/17 10:16 p. m. (GMT+01:00)
Para: 'Mark Nelson' , 'Wolfgang Lendl'
Cc: ceph-users@lists.ceph.com
Asunto: Re: [ceph-users] bluestore - wal,db on faster devices?
-Original Message-
Fro
On 11/08/2017 03:16 PM, Nick Fisk wrote:
-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
Mark Nelson
Sent: 08 November 2017 19:46
To: Wolfgang Lendl
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] bluestore - wal,db on faster devices
ore metadata) -
and less sense in rbd environments - correct?
br
wolfgang
On 11/08/2017 02:21 PM, Mark Nelson wrote:
Hi Wolfgang,
In bluestore the WAL serves sort of a similar purpose to filestore's
journal, but bluestore isn't dependent on it for guaranteeing
durability of large write
Hi Wolfgang,
In bluestore the WAL serves sort of a similar purpose to filestore's
journal, but bluestore isn't dependent on it for guaranteeing durability
of large writes. With bluestore you can often get higher large-write
throughput than with filestore when using HDD-only or flash-only OSDs
On 11/03/2017 08:25 AM, Wido den Hollander wrote:
Op 3 november 2017 om 13:33 schreef Mark Nelson :
On 11/03/2017 02:44 AM, Wido den Hollander wrote:
Op 3 november 2017 om 0:09 schreef Nigel Williams :
On 3 November 2017 at 07:45, Martin Overgaard Hansen wrote:
I want to bring
On 11/03/2017 04:08 AM, Jorge Pinilla López wrote:
well I haven't found any recomendation either but I think that
sometimes the SSD space is being wasted.
If someone wanted to write it, you could have bluefs share some of the
space on the drive for hot object data and release space as neede
On 11/03/2017 02:44 AM, Wido den Hollander wrote:
Op 3 november 2017 om 0:09 schreef Nigel Williams :
On 3 November 2017 at 07:45, Martin Overgaard Hansen wrote:
I want to bring this subject back in the light and hope someone can provide
insight regarding the issue, thanks.
Thanks Marti
On 10/25/2017 03:51 AM, Caspar Smit wrote:
Hi,
I've asked the exact same question a few days ago, same answer:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-October/021708.html
I guess we'll have to bite the bullet on this one and take this into
account when designing.
This is one
Memory usage is still quite high here even with a large onode cache!
Are you using erasure coding? I recently was able to reproduce a bug in
bluestore causing excessive memory usage during large writes with EC,
but have not tracked down exactly what's going on yet.
Mark
On 10/18/2017 06:48 A
On 10/17/2017 01:54 AM, Wido den Hollander wrote:
Op 16 oktober 2017 om 18:14 schreef Richard Hesketh
:
On 16/10/17 13:45, Wido den Hollander wrote:
Op 26 september 2017 om 16:39 schreef Mark Nelson :
On 09/26/2017 01:10 AM, Dietmar Rieder wrote:
thanks David,
that's confirming w
Hi Jorge,
I was sort of responsible for all of this. :)
So basically there are different caches in different places:
- rocksdb bloom filter and index cache
- rocksdb block cache (which can be configured to include filters and
indexes)
- rocksdb compressed block cache
- bluestore onode cache
On 10/03/2017 07:59 AM, Alex Gorbachev wrote:
Hi Sam,
On Mon, Oct 2, 2017 at 6:01 PM Sam Huracan mailto:nowitzki.sa...@gmail.com>> wrote:
Anyone can help me?
On Oct 2, 2017 17:56, "Sam Huracan" mailto:nowitzki.sa...@gmail.com>> wrote:
Hi,
I'm reading this document:
or either
partition yet.
On Mon, Sep 25, 2017 at 10:53 AM Dietmar Rieder
mailto:dietmar.rie...@i-med.ac.at>> wrote:
On 09/25/2017 02:59 PM, Mark Nelson wrote:
> On 09/25/2017 03:31 AM, TYLin wrote:
>> Hi,
>>
>> To my understand, the bluestore write
On 09/25/2017 05:02 PM, Nigel Williams wrote:
On 26 September 2017 at 01:10, David Turner wrote:
If they are on separate
devices, then you need to make it as big as you need to to ensure that it
won't spill over (or if it does that you're ok with the degraded performance
while the db partitio
e. Specifying one
large DB partition per OSD will cover both uses.
thanks,
Ben
On Thu, Sep 21, 2017 at 12:15 PM, Dietmar Rieder
wrote:
On 09/21/2017 05:03 PM, Mark Nelson wrote:
On 09/21/2017 03:17 AM, Dietmar Rieder wrote:
On 09/21/2017 09:45 AM, Maged Mokhtar wrote:
On 2017-09-21 07:56, Lazu
On 09/21/2017 03:17 AM, Dietmar Rieder wrote:
On 09/21/2017 09:45 AM, Maged Mokhtar wrote:
On 2017-09-21 07:56, Lazuardi Nasution wrote:
Hi,
I'm still looking for the answer of these questions. Maybe someone can
share their thought on these. Any comment will be helpful too.
Best regards,
Hi Rafael,
In the original email you mentioned 4M block size, seq read, but here it
looks like you are doing 4k writes? Can you clarify? If you are doing
4k direct sequential writes with iodepth=1 and are also using librbd
cache, please make sure that librbd is set to writeback mode in both
On 09/21/2017 03:19 AM, Maged Mokhtar wrote:
On 2017-09-21 10:01, Dietmar Rieder wrote:
Hi,
I'm in the same situation (NVMEs, SSDs, SAS HDDs). I asked the same
questions to myself.
For now I decided to use the NVMEs as wal and db devices for the SAS
HDDs and on the SSDs I colocate wal and d
ues (in my particular case client load on the cluster is
very low and I don't have to honour any guarantees about client performance -
getting back into HEALTH_OK asap is preferable).
Rich
On 13/09/17 21:14, Mark Nelson wrote:
Hi Richard,
Regarding recovery speed, have you looked throug
Hi Richard,
Regarding recovery speed, have you looked through any of Neha's results
on recovery sleep testing earlier this summer?
https://www.spinics.net/lists/ceph-devel/msg37665.html
She tested bluestore and filestore under a couple of different
scenarios. The gist of it is that time to
Hi Bryan,
Check out your SCSI device failures, but if that doesn't pan out, Sage
and I have been tracking this:
http://tracker.ceph.com/issues/21171
There's a fix in place being tested now!
Mark
On 08/29/2017 05:41 PM, Bryan Banister wrote:
Found some bad stuff in the messages file about S
On 08/23/2017 07:17 PM, Mark Nelson wrote:
On 08/23/2017 06:18 PM, Xavier Trilla wrote:
Oh man, what do you know!... I'm quite amazed. I've been reviewing
more documentation about min_replica_size and seems like it doesn't
work as I thought (Although I remember specific
On 08/23/2017 06:18 PM, Xavier Trilla wrote:
Oh man, what do you know!... I'm quite amazed. I've been reviewing more
documentation about min_replica_size and seems like it doesn't work as I
thought (Although I remember specifically reading it somewhere some years ago
:/ ).
And, as all repli
Hi Mehmet!
On 08/16/2017 11:12 AM, Mehmet wrote:
:( no suggestions or recommendations on this?
Am 14. August 2017 16:50:15 MESZ schrieb Mehmet :
Hi friends,
my actual hardware setup per OSD-node is as follow:
# 3 OSD-Nodes with
- 2x Intel(R) Xeon(R) CPU E5-2603 v3 @ 1.60GHz =
On 08/14/2017 02:42 PM, Nick Fisk wrote:
-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
Ronny Aasen
Sent: 14 August 2017 18:55
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] luminous/bluetsore osd memory requirements
On 10.08.2017 17:
On 08/14/2017 12:52 PM, Ashley Merrick wrote:
Hello,
Hi Ashley!
Currently run 10x4TB , 2xSSD for Journal, planning to move fully to BS,
looking at adding extra servers.
With the removal of the double write on BS and from the testing so far
of BS (having WAL & DB on SSD Seeing very minimal S
On 06/27/2017 06:24 AM, Wido den Hollander wrote:
Op 27 juni 2017 om 13:05 schreef Christian Balzer :
On Tue, 27 Jun 2017 11:24:54 +0200 (CEST) Wido den Hollander wrote:
Hi,
I've been looking in the docs and the source code of BlueStore to figure out if
it issues TRIM/Discard [0] on SSD
Hello Massimiliano,
Based on the configuration below, it appears you have 8 SSDs total (2
nodes with 4 SSDs each)?
I'm going to assume you have 3x replication and are you using filestore,
so in reality you are writing 3 copies and doing full data journaling
for each copy, for 6x writes per c
1 - 100 of 607 matches
Mail list logo