ating-bluestores-block-db/
>
> 3.- Follow the documentation
>
> https://swamireddy.wordpress.com/2016/02/19/ceph-how-to-add-the-ssd-journal/
>
> Thanks for the help
>
> El dom., 7 jul. 2019 a las 14:39, Christian Wuerdig (<
> christian.wuer...@gmail.com>) escribió:
>
&
One thing to keep in mind is that the blockdb/wal becomes a Single Point Of
Failure for all OSDs using it. So if that SSD dies essentially you have to
consider all OSDs using it as lost. I think most go with something like 4-8
OSDs per blockdb/wal drive but it really depends how risk-averse you are
on is triggered. The additional improvement is Snappy compression.
> We rebuild ceph with support for it. I can create PR with it, if you want :)
>
>
> Best Regards,
>
> Rafał Wądołowski
> Cloud & Security Engineer
>
> On 25.06.2019 22:16, Christian Wuerdig wrote:
>
>
The sizes are determined by rocksdb settings - some details can be found
here: https://tracker.ceph.com/issues/24361
One thing to note, in this thread
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-October/030775.html
it's noted that rocksdb could use up to 100% extra space during compact
The simple answer is because k+1 is the default min_size for EC pools.
min_size means that the pool will still accept writes if that many failure
domains are still available. If you set min_size to k then you have entered
the dangerous territory that if you loose another failure domain (OSD or
host
On Sun, 28 Apr 2019 at 21:45, Igor Podlesny wrote:
> On Sun, 28 Apr 2019 at 16:14, Paul Emmerich
> wrote:
> > Use k+m for PG calculation, that value also shows up as "erasure size"
> > in ceph osd pool ls detail
>
> So does it mean that for PG calculation those 2 pools are equivalent:
>
> 1) EC(
If you use librados directly it's up to you to ensure you can identify your
objects. Generally RADOS stores objects and not files so when you provide
your object ids you need to come up with a convention so you can correctly
identify them. If you need to provide meta data (i.e. a list of all
existi
tions when you really don't want to have to deal
with under-resourced hardware.
On Wed, 8 Aug 2018 at 12:26, Cheyenne Forbes
wrote:
>
> Next time I will ask there, any number of core recommendation?
>
> Regards,
>
> Cheyenne O. Forbes
>
>
> On Tue, Aug 7, 2018 at
It should be added though that you're running at only 1/3 of the
recommended RAM usage for the OSD setup alone - not to mention that
you also co-host MON, MGR and MDS deamons on there. The next time you
run into an issue - in particular with OSD recovery - you may be in a
pickle again and then it m
ceph-users is a better place to ask this kind of question.
Anyway the 1GB RAM per TB storage recommendation still stands as far as I
know plus you want some for the OS and some safety margin so in your case
64GB seem sensible
On Wed, 8 Aug 2018, 01:51 Cheyenne Forbes,
wrote:
> The case is 28TB
Generally the recommendation is: if your redundancy is X you should have at
least X+1 entities in your failure domain to allow ceph to automatically
self-heal
Given your setup of 6 severs and failure domain host means you should
select k+m=5 at most. So 3+2 should make for a good profile in your c
The general recommendation is to target around 100 PG/OSD. Have you tried
the https://ceph.com/pgcalc/ tool?
On Wed, 4 Apr 2018 at 21:38, Osama Hasebou wrote:
> Hi Everyone,
>
> I would like to know what kind of setup had the Ceph community been using
> for their Openstack's Ceph configuration w
I think the primary area where people are concerned about latency are rbd
and 4k block size access. OTOH 2.3us latency seems to be 2 orders of
magnitude below of what seems to be realistically achievable on a real
world cluster anyway (
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-July/
Hm, so you're running OSD nodes with 2GB of RAM and 2x10TB = 20TB of
storage? Literally everything posted on this list in relation to HW
requirements and related problems will tell you that this simply isn't
going to work. The slightest hint of a problem will simply kill the OSD
nodes with OOM. Hav
In case of bluestore if your blockdb is on a different drive to the
OSD and that's included in your hardware loss then I think you're
pretty much toast. Not sure if you can re-build the blockdb from the
OSD data somehow. In case of filestore if you lose your journal drive
you also risk data corrupt
Depends on what you mean with "your pool overloads"? What's your
hardware setup (CPU, RAM, how many nodes, network etc.)? What can you
see when you monitor the system resources with atop or the likes?
On Sat, Jan 13, 2018 at 8:59 PM, Mike O'Connor wrote:
> I followed the announcement of Luminous
You should do your reference test with dd with oflag=direct,dsync
direct will only bypass the cache while dsync will fsync on every
block which is much closer to reality of what ceph is doing afaik
On Thu, Jan 4, 2018 at 9:54 PM, Rafał Wądołowski
wrote:
> Hi folks,
>
> I am currently benchmarki
A while back there was a thread on the ML where someone posted a bash
script to slowly increase the number of PGs in steps of 256 AFAIR, the
script would monitor the cluster activity and once all data shuffling
had finished it would do another round until the target is hit.
That was on filestore t
ify when necessary
>
>
>
>
>
> -Original Message-
> From: Christian Wuerdig [mailto:christian.wuer...@gmail.com]
> Sent: dinsdag 2 januari 2018 19:40
> To: 于相洋
> Cc: Ceph-User
> Subject: Re: [ceph-users] Questions about pg num setting
>
> Have you had a look at http:/
The main difference is that rados bench uses 4MB objects while your dd
test uses 4k block size
rados bench shows an average of 283 IOPS which at 4k blocksize would
be around 1.1MB so it's somewhat consistent with the dd result
Monitor your CPU usage, network latency with something like atop on
the
Have you had a look at http://ceph.com/pgcalc/?
Generally if you have too many PGs per OSD you can get yourself into
trouble during recovery and backfilling operations consuming a lot
more RAM than you have and eventually making your cluster unusable
(some more info can be found here for example:
On Thu, Dec 7, 2017 at 10:24 PM, Marcus Priesch wrote:
> Hello Alwin, Dear All,
[snip]
>> Mixing of spinners with SSDs is not recommended, as spinners will slow
>> down the pools residing on that root.
>
> why should this happen ? i would assume that osd's are seperate parts
> running on hosts -
In filestore the journal is crucial for the operation of the OSD to
ensure consistency. If it's toast then so is the associated OSD in
most cases. I think people often overlook this fact when they share
many OSDs to a single journal drive to save cost.
On Sun, Nov 26, 2017 at 5:23 AM, Hauke Hombur
As per documentation: http://docs.ceph.com/docs/luminous/radosgw/
"The S3 and Swift APIs share a common namespace, so you may write data
with one API and retrieve it with the other."
So you can access one pool through both APIs and the data will be
available via both.
On Wed, Nov 15, 2017 at 7:52
think an osd should 'crash' in such situation.
> 2. How else should I 'rados put' an 8GB file?
>
>
>
>
>
>
> -Original Message-
> From: Christian Wuerdig [mailto:christian.wuer...@gmail.com]
> Sent: maandag 13 november 2017 0:12
> To: Marc
in more hosts.
>
> Thanks for the help!
>
> Tim Gipson
>
>
> On 11/12/17, 5:14 PM, "Christian Wuerdig" wrote:
>
> I might be wrong, but from memory I think you can use
> http://ceph.com/pgcalc/ and use k+m for the size
>
> On Sun, Nov 12, 20
I might be wrong, but from memory I think you can use
http://ceph.com/pgcalc/ and use k+m for the size
On Sun, Nov 12, 2017 at 5:41 AM, Ashley Merrick wrote:
> Hello,
>
> Are you having any issues with getting the pool working or just around the
> PG num you should use?
>
> ,Ashley
>
> Get Outloo
As per: https://www.spinics.net/lists/ceph-devel/msg38686.html
Bluestore as a hard 4GB object size limit
On Sat, Nov 11, 2017 at 9:27 AM, Marc Roos wrote:
>
> osd's are crashing when putting a (8GB) file in a erasure coded pool,
> just before finishing. The same osd's are used for replicated poo
The default failure domain is host and you will need 5 (=k+m) nodes
for this config. If you have 4 nodes you can run k=3,m=1 or k=2,m=2
otherwise you'd have to change failure domain to OSD
On Fri, Nov 10, 2017 at 10:52 AM, Marc Roos wrote:
>
> I added an erasure k=3,m=2 coded pool on a 3 node tes
It should be noted that the general advise is to not use such large
objects since cluster performance will suffer, see also this thread:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-September/021051.html
libradosstriper might be an option which will automatically break the
object into
I'm not a big expert but the OP said he's suspecting bitrot is at
least part of issue in which case you can have the situation where the
drive has ACK'ed the write but a later scrub discovered checksum
errors
Plus you don't need to actually loose a drive to get inconsistent pgs
with size=2 min_size
Hm, no necessarily directly related to your performance problem,
however: These SSDs have a listed endurance of 72TB total data written
- over a 5 year period that's 40GB a day or approx 0.04 DWPD. Given
that you run the journal for each OSD on the same disk, that's
effectively at most 0.02 DWPD (a
class
>1/ 3 filestore
>1/ 3 journal
>0/ 5 ms
>1/ 5 mon
>0/10 monc
>1/ 5 paxos
>0/ 5 tp
>1/ 5 auth
>1/ 5 crypto
>1/ 1 finisher
>1/ 5 heartbeatmap
>1/ 5 perfcounter
>1/ 5 rgw
>1/10 civetweb
>1/ 5 j
>From which version of ceph to which other version of ceph did you
upgrade? Can you provide logs from crashing OSDs? The degraded object
percentage being larger than 100% has been reported before
(https://www.spinics.net/lists/ceph-users/msg39519.html) and looks
like it's been fixed a week or so ag
What version of Ceph are you using? There were a few bugs leaving
behind orphaned objects (e.g. http://tracker.ceph.com/issues/18331 and
http://tracker.ceph.com/issues/10295). If that's your problem then
there is a tool for finding these objects so you can then manually
delete them - have a google
Maybe an additional example where the numbers don't line up all so
nicely would be good as well. For example it's not immediately obvious
to me what would happen with the stripe settings given by your example
but you write 97M of data
Would it be 4 objects of 24M and 4 objects of 250KB? Or will the
You're not the only one, happens to me too. I found some old ML thread
from a couple years back where someone mentioned the same thing.
I do notice from time to time spam coming through (not much though and
it seems to come in waves) although I'm not sure how much gmail is
bouncing but nobody else
See also this ML thread regarding removing the cluster name option:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-June/018520.html
On Mon, Oct 16, 2017 at 11:42 AM, Erik McCormick
wrote:
> Do not, under any circumstances, make a custom named cluster. There be pain
> and suffering (and
The default filesize limit for CephFS is 1TB, see also here:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-May/018208.html
(also includes a pointer on how to increase it)
On Fri, Oct 6, 2017 at 12:45 PM, Shawfeng Dong wrote:
> Dear all,
>
> We just set up a Ceph cluster, running the la
yes, at least that's how I'd interpret the information given in this
thread:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-February/016521.html
On Tue, Oct 3, 2017 at 1:11 AM, Webert de Souza Lima
wrote:
> Hey Christian,
>
>> On 29 Sep 2017 12:32 a.m., "C
I'm pretty sure the orphan find command does exactly just that -
finding orphans. I remember some emails on the dev list where Yehuda
said he wasn't 100% comfortable of automating the delete just yet.
So the purpose is to run the orphan find tool and then delete the
orphaned objects once you're hap
There is a ceph command "reweight-by-utilization" you can run to
adjust the OSD weights automatically based on their utilization:
http://docs.ceph.com/docs/master/rados/operations/control/#osd-subsystem
Some people run this on a periodic basis (cron script)
Check the mailing list archives, for exa
Assuming you're using Bluestore you could experiments with the cache
settings
(http://docs.ceph.com/docs/master/rados/configuration/bluestore-config-ref/)
In your case setting bluestore_cache_size_hdd lower than the default
1GB might help with the RAM usage
various people have reported solving O
What type of EC config (k+m) was used if I may ask?
On Fri, Sep 8, 2017 at 1:34 AM, Mohamad Gebai wrote:
> Hi,
>
> These numbers are probably not as detailed as you'd like, but it's
> something. They show the overhead of reading and/or writing to EC pools as
> compared to 3x replicated pools usin
Judging by the github repo, development on it has all but stalled, the last
commit was more then 3 months ago (
https://github.com/ceph/calamari/commits/master)
Also there is the new dashboard in the new ceph mgr deamon in Luminous -
so my guess is that pretty much Calamari is dead.
On Thu, Jul 2
probably why its was run using QD=1. It also makes sense that cpu freq
> will be more important than cores.
>
>
> But then it is not generic enough to be used as an advise!
> It is just a line in 3D-space.
> As there are so many
>
> --WjW
>
> On 2017-06-24 12:52, Wi
The general advice floating around is that your want CPUs with high clock
speeds rather than more cores to reduce latency and increase IOPS for SSD
setups (see also
http://www.sys-pro.co.uk/ceph-storage-fast-cpus-ssd-performance/) So
something like a E5-2667V4 might bring better results in that sit
Yet another option is to change the failure domain to OSD instead host
(this avoids having to move disks around and will probably meet you initial
expectations).
Means your cluster will become unavailable when you loose a host until you
fix it though. OTOH you probably don't have too much leeway an
Well, what's "best" really depends on your needs and use-case. The general
advise which has been floated several times now is to have at least N+2
entities of your failure domain in your cluster.
So for example if you run with size=3 then you should have at least 5 OSDs
if your failure domain is OS
On Thu, May 4, 2017 at 7:53 PM, Fuxion Cloud wrote:
> Hi all,
>
> Im newbie in ceph technology. We have ceph deployed by vendor 2 years ago
> with Ubuntu 14.04LTS without fine tuned the performance. I noticed that the
> performance of storage is very slow. Can someone please help to advise how
>
On Tue, Mar 21, 2017 at 8:57 AM, Karol Babioch wrote:
> Hi,
>
> Am 20.03.2017 um 05:34 schrieb Christian Balzer:
> > you do realize that you very much have a corner case setup there, right?
>
> Yes, I know that this is not exactly a recommendation, but I hoped it
> would be good enough for the st
According to:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-May/009485.html it
seems not entirely safe to copy an RBD pool this way.
This thread mentions doing a rados ls and the get/put the objects but Greg
mentioned that this may also have issues with snapshots.
Maybe cppool has been
ere is some data
> lost, since ceph did not do any backfill or other operation. That’s the
> problem...
>
>
Ok that output is indeed a bit different. However as you should note the
actual data stored in the cluster goes from 4809 to 4830 GB. 4830 * 3 is
actually only 14490 GB so cur
On Tue, Jan 10, 2017 at 8:23 AM, Marcus Müller
wrote:
> Hi all,
>
> Recently I added a new node with new osds to my cluster, which, of course
> resulted in backfilling. At the end, there are 4 pgs left in the state 4
> active+remapped and I don’t know what to do.
>
> Here is how my cluster looks
No official documentation but here is how I got it to work on Ubuntu
16.04.01 (in this case I'm using a self-signed certificate):
assuming you're running rgw on a computer called rgwnode:
1. create self-signed certificate
ssh rgwnode
openssl req -x509 -nodes -newkey rsa:4096 -keyout key.pem -out
Hi,
it's useful to generally provide some detail around the setup, like:
What are your pool settings - size and min_size?
What is your failure domain - osd or host?
What version of ceph are you running on which OS?
You can check which specific PGs are problematic by running "ceph health
detail" a
What are your pool size and min_size settings? An object with less than
min_size replicas will not receive I/O (
http://docs.ceph.com/docs/jewel/rados/operations/pools/#set-the-number-of-object-replicas).
So if size=2 and min_size=1 then an OSD failure means blocked operations to
all objects locate
On Wed, Nov 2, 2016 at 5:19 PM, Ashley Merrick
wrote:
> Hello,
>
> Thanks for your reply, when you say latest's version do you .6 and not .5?
>
> The use case is large scale storage VM's, which may have a burst of high
> write's during new storage being loaded onto the environment, looking to
> p
58 matches
Mail list logo