Re: [ceph-users] Ceph performance IOPS

2019-07-15 Thread Christian Wuerdig
-bluestores-block-db/ > > 3.- Follow the documentation > > https://swamireddy.wordpress.com/2016/02/19/ceph-how-to-add-the-ssd-journal/ > > Thanks for the help > > El dom., 7 jul. 2019 a las 14:39, Christian Wuerdig (< > christian.wuer...@gmail.com>) escribió: > >&g

Re: [ceph-users] Ceph performance IOPS

2019-07-07 Thread Christian Wuerdig
One thing to keep in mind is that the blockdb/wal becomes a Single Point Of Failure for all OSDs using it. So if that SSD dies essentially you have to consider all OSDs using it as lost. I think most go with something like 4-8 OSDs per blockdb/wal drive but it really depends how risk-averse you

Re: [ceph-users] Thoughts on rocksdb and erasurecode

2019-06-26 Thread Christian Wuerdig
on is triggered. The additional improvement is Snappy compression. > We rebuild ceph with support for it. I can create PR with it, if you want :) > > > Best Regards, > > Rafał Wądołowski > Cloud & Security Engineer > > On 25.06.2019 22:16, Christian Wuerdig wrote: >

Re: [ceph-users] Thoughts on rocksdb and erasurecode

2019-06-25 Thread Christian Wuerdig
The sizes are determined by rocksdb settings - some details can be found here: https://tracker.ceph.com/issues/24361 One thing to note, in this thread http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-October/030775.html it's noted that rocksdb could use up to 100% extra space during

Re: [ceph-users] Nautilus, k+m erasure coding a profile vs size+min_size

2019-05-21 Thread Christian Wuerdig
The simple answer is because k+1 is the default min_size for EC pools. min_size means that the pool will still accept writes if that many failure domains are still available. If you set min_size to k then you have entered the dangerous territory that if you loose another failure domain (OSD or

Re: [ceph-users] How does CEPH calculates PGs per OSD for erasure coded (EC) pools?

2019-04-29 Thread Christian Wuerdig
On Sun, 28 Apr 2019 at 21:45, Igor Podlesny wrote: > On Sun, 28 Apr 2019 at 16:14, Paul Emmerich > wrote: > > Use k+m for PG calculation, that value also shows up as "erasure size" > > in ceph osd pool ls detail > > So does it mean that for PG calculation those 2 pools are equivalent: > > 1)

Re: [ceph-users] Using Ceph central backup storage - Best practice creating pools

2019-01-22 Thread Christian Wuerdig
If you use librados directly it's up to you to ensure you can identify your objects. Generally RADOS stores objects and not files so when you provide your object ids you need to come up with a convention so you can correctly identify them. If you need to provide meta data (i.e. a list of all

Re: [ceph-users] [Ceph-community] How much RAM and CPU cores would you recommend when using ceph only as block storage for KVM?

2018-08-19 Thread Christian Wuerdig
when you really don't want to have to deal with under-resourced hardware. On Wed, 8 Aug 2018 at 12:26, Cheyenne Forbes wrote: > > Next time I will ask there, any number of core recommendation? > > Regards, > > Cheyenne O. Forbes > > > On Tue, Aug 7, 2018 at 2:49 PM,

Re: [ceph-users] MDS stuck in 'rejoin' after network fragmentation caused OSD flapping

2018-08-19 Thread Christian Wuerdig
It should be added though that you're running at only 1/3 of the recommended RAM usage for the OSD setup alone - not to mention that you also co-host MON, MGR and MDS deamons on there. The next time you run into an issue - in particular with OSD recovery - you may be in a pickle again and then it

Re: [ceph-users] [Ceph-community] How much RAM and CPU cores would you recommend when using ceph only as block storage for KVM?

2018-08-07 Thread Christian Wuerdig
ceph-users is a better place to ask this kind of question. Anyway the 1GB RAM per TB storage recommendation still stands as far as I know plus you want some for the OS and some safety margin so in your case 64GB seem sensible On Wed, 8 Aug 2018, 01:51 Cheyenne Forbes, wrote: > The case is

Re: [ceph-users] Default erasure code profile and sustaining loss of one host containing 4 OSDs

2018-07-22 Thread Christian Wuerdig
Generally the recommendation is: if your redundancy is X you should have at least X+1 entities in your failure domain to allow ceph to automatically self-heal Given your setup of 6 severs and failure domain host means you should select k+m=5 at most. So 3+2 should make for a good profile in your

Re: [ceph-users] amount of PGs/pools/OSDs for your openstack / Ceph

2018-04-07 Thread Christian Wuerdig
The general recommendation is to target around 100 PG/OSD. Have you tried the https://ceph.com/pgcalc/ tool? On Wed, 4 Apr 2018 at 21:38, Osama Hasebou wrote: > Hi Everyone, > > I would like to know what kind of setup had the Ceph community been using > for their

Re: [ceph-users] Difference in speed on Copper of Fiber ports on switches

2018-03-22 Thread Christian Wuerdig
I think the primary area where people are concerned about latency are rbd and 4k block size access. OTOH 2.3us latency seems to be 2 orders of magnitude below of what seems to be realistically achievable on a real world cluster anyway (

Re: [ceph-users] XFS Metadata corruption while activating OSD

2018-03-11 Thread Christian Wuerdig
Hm, so you're running OSD nodes with 2GB of RAM and 2x10TB = 20TB of storage? Literally everything posted on this list in relation to HW requirements and related problems will tell you that this simply isn't going to work. The slightest hint of a problem will simply kill the OSD nodes with OOM.

Re: [ceph-users] Disaster Backups

2018-02-01 Thread Christian Wuerdig
In case of bluestore if your blockdb is on a different drive to the OSD and that's included in your hardware loss then I think you're pretty much toast. Not sure if you can re-build the blockdb from the OSD data somehow. In case of filestore if you lose your journal drive you also risk data

Re: [ceph-users] Have I configured erasure coding wrong ?

2018-01-14 Thread Christian Wuerdig
Depends on what you mean with "your pool overloads"? What's your hardware setup (CPU, RAM, how many nodes, network etc.)? What can you see when you monitor the system resources with atop or the likes? On Sat, Jan 13, 2018 at 8:59 PM, Mike O'Connor wrote: > I followed the

Re: [ceph-users] Performance issues on Luminous

2018-01-05 Thread Christian Wuerdig
You should do your reference test with dd with oflag=direct,dsync direct will only bypass the cache while dsync will fsync on every block which is much closer to reality of what ceph is doing afaik On Thu, Jan 4, 2018 at 9:54 PM, Rafał Wądołowski wrote: > Hi folks,

Re: [ceph-users] Increasing PG number

2018-01-03 Thread Christian Wuerdig
A while back there was a thread on the ML where someone posted a bash script to slowly increase the number of PGs in steps of 256 AFAIR, the script would monitor the cluster activity and once all data shuffling had finished it would do another round until the target is hit. That was on filestore

Re: [ceph-users] Questions about pg num setting

2018-01-03 Thread Christian Wuerdig
is then how to identify when necessary > > > > > > -----Original Message- > From: Christian Wuerdig [mailto:christian.wuer...@gmail.com] > Sent: dinsdag 2 januari 2018 19:40 > To: 于相洋 > Cc: Ceph-User > Subject: Re: [ceph-users] Questions about pg num

Re: [ceph-users] slow 4k writes, Luminous with bluestore backend

2018-01-02 Thread Christian Wuerdig
The main difference is that rados bench uses 4MB objects while your dd test uses 4k block size rados bench shows an average of 283 IOPS which at 4k blocksize would be around 1.1MB so it's somewhat consistent with the dd result Monitor your CPU usage, network latency with something like atop on the

Re: [ceph-users] Questions about pg num setting

2018-01-02 Thread Christian Wuerdig
Have you had a look at http://ceph.com/pgcalc/? Generally if you have too many PGs per OSD you can get yourself into trouble during recovery and backfilling operations consuming a lot more RAM than you have and eventually making your cluster unusable (some more info can be found here for example:

Re: [ceph-users] Hangs with qemu/libvirt/rbd when one host disappears

2017-12-07 Thread Christian Wuerdig
On Thu, Dec 7, 2017 at 10:24 PM, Marcus Priesch wrote: > Hello Alwin, Dear All, [snip] >> Mixing of spinners with SSDs is not recommended, as spinners will slow >> down the pools residing on that root. > > why should this happen ? i would assume that osd's are seperate

Re: [ceph-users] ceph osd after xfs repair only 50 percent data and osd won't start

2017-11-26 Thread Christian Wuerdig
In filestore the journal is crucial for the operation of the OSD to ensure consistency. If it's toast then so is the associated OSD in most cases. I think people often overlook this fact when they share many OSDs to a single journal drive to save cost. On Sun, Nov 26, 2017 at 5:23 AM, Hauke

Re: [ceph-users] S3/Swift :: Pools Ceph

2017-11-14 Thread Christian Wuerdig
As per documentation: http://docs.ceph.com/docs/luminous/radosgw/ "The S3 and Swift APIs share a common namespace, so you may write data with one API and retrieve it with the other." So you can access one pool through both APIs and the data will be available via both. On Wed, Nov 15, 2017 at

Re: [ceph-users] Getting errors on erasure pool writes k=2, m=1

2017-11-13 Thread Christian Wuerdig
gt; > 1. I don’t think an osd should 'crash' in such situation. > 2. How else should I 'rados put' an 8GB file? > > > > > > > -Original Message- > From: Christian Wuerdig [mailto:christian.wuer...@gmail.com] > Sent: maandag 13 november 2017 0:12 > To: Marc

Re: [ceph-users] Erasure Coding Pools and PG calculation - documentation

2017-11-12 Thread Christian Wuerdig
le without > bringing in more hosts. > > Thanks for the help! > > Tim Gipson > > > On 11/12/17, 5:14 PM, "Christian Wuerdig" <christian.wuer...@gmail.com> wrote: > > I might be wrong, but from memory I think you can use > http://ceph.com/pgcalc/

Re: [ceph-users] Erasure Coding Pools and PG calculation - documentation

2017-11-12 Thread Christian Wuerdig
I might be wrong, but from memory I think you can use http://ceph.com/pgcalc/ and use k+m for the size On Sun, Nov 12, 2017 at 5:41 AM, Ashley Merrick wrote: > Hello, > > Are you having any issues with getting the pool working or just around the > PG num you should use? >

Re: [ceph-users] Getting errors on erasure pool writes k=2, m=1

2017-11-12 Thread Christian Wuerdig
As per: https://www.spinics.net/lists/ceph-devel/msg38686.html Bluestore as a hard 4GB object size limit On Sat, Nov 11, 2017 at 9:27 AM, Marc Roos wrote: > > osd's are crashing when putting a (8GB) file in a erasure coded pool, > just before finishing. The same osd's

Re: [ceph-users] Undersized fix for small cluster, other than adding a 4th node?

2017-11-12 Thread Christian Wuerdig
The default failure domain is host and you will need 5 (=k+m) nodes for this config. If you have 4 nodes you can run k=3,m=1 or k=2,m=2 otherwise you'd have to change failure domain to OSD On Fri, Nov 10, 2017 at 10:52 AM, Marc Roos wrote: > > I added an erasure k=3,m=2

Re: [ceph-users] Pool shard/stripe settings for file too large files?

2017-11-09 Thread Christian Wuerdig
It should be noted that the general advise is to not use such large objects since cluster performance will suffer, see also this thread: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-September/021051.html libradosstriper might be an option which will automatically break the object into

Re: [ceph-users] PGs inconsistent, do I fear data loss?

2017-11-02 Thread Christian Wuerdig
I'm not a big expert but the OP said he's suspecting bitrot is at least part of issue in which case you can have the situation where the drive has ACK'ed the write but a later scrub discovered checksum errors Plus you don't need to actually loose a drive to get inconsistent pgs with size=2

Re: [ceph-users] How to increase the size of requests written to a ceph image

2017-10-26 Thread Christian Wuerdig
Hm, no necessarily directly related to your performance problem, however: These SSDs have a listed endurance of 72TB total data written - over a 5 year period that's 40GB a day or approx 0.04 DWPD. Given that you run the journal for each OSD on the same disk, that's effectively at most 0.02 DWPD

Re: [ceph-users] Infinite degraded objects

2017-10-25 Thread Christian Wuerdig
0/ 5 objclass >1/ 3 filestore >1/ 3 journal >0/ 5 ms >1/ 5 mon >0/10 monc >1/ 5 paxos >0/ 5 tp >1/ 5 auth >1/ 5 crypto >1/ 1 finisher >1/ 5 heartbeatmap >1/ 5 perfcounter >1/ 5 rgw >1/10 civetweb

Re: [ceph-users] Infinite degraded objects

2017-10-24 Thread Christian Wuerdig
>From which version of ceph to which other version of ceph did you upgrade? Can you provide logs from crashing OSDs? The degraded object percentage being larger than 100% has been reported before (https://www.spinics.net/lists/ceph-users/msg39519.html) and looks like it's been fixed a week or so

Re: [ceph-users] Reported bucket size incorrect (Luminous)

2017-10-24 Thread Christian Wuerdig
What version of Ceph are you using? There were a few bugs leaving behind orphaned objects (e.g. http://tracker.ceph.com/issues/18331 and http://tracker.ceph.com/issues/10295). If that's your problem then there is a tool for finding these objects so you can then manually delete them - have a google

Re: [ceph-users] Bareos and libradosstriper works only for 4M sripe_unit size

2017-10-16 Thread Christian Wuerdig
Maybe an additional example where the numbers don't line up all so nicely would be good as well. For example it's not immediately obvious to me what would happen with the stripe settings given by your example but you write 97M of data Would it be 4 objects of 24M and 4 objects of 250KB? Or will

Re: [ceph-users] list admin issues

2017-10-15 Thread Christian Wuerdig
You're not the only one, happens to me too. I found some old ML thread from a couple years back where someone mentioned the same thing. I do notice from time to time spam coming through (not much though and it seems to come in waves) although I'm not sure how much gmail is bouncing but nobody else

Re: [ceph-users] Creating a custom cluster name using ceph-deploy

2017-10-15 Thread Christian Wuerdig
See also this ML thread regarding removing the cluster name option: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-June/018520.html On Mon, Oct 16, 2017 at 11:42 AM, Erik McCormick wrote: > Do not, under any circumstances, make a custom named cluster. There

Re: [ceph-users] Ceph cache pool full

2017-10-05 Thread Christian Wuerdig
The default filesize limit for CephFS is 1TB, see also here: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-May/018208.html (also includes a pointer on how to increase it) On Fri, Oct 6, 2017 at 12:45 PM, Shawfeng Dong wrote: > Dear all, > > We just set up a Ceph

Re: [ceph-users] RGW how to delete orphans

2017-10-02 Thread Christian Wuerdig
Sep 2017 12:32 a.m., "Christian Wuerdig" >> <christian.wuer...@gmail.com> wrote: >>> >>> I'm pretty sure the orphan find command does exactly just that - >>> finding orphans. I remember some emails on the dev list where Yehuda >>> said he wasn't 100%

Re: [ceph-users] RGW how to delete orphans

2017-09-28 Thread Christian Wuerdig
I'm pretty sure the orphan find command does exactly just that - finding orphans. I remember some emails on the dev list where Yehuda said he wasn't 100% comfortable of automating the delete just yet. So the purpose is to run the orphan find tool and then delete the orphaned objects once you're

Re: [ceph-users] Usage not balanced over OSDs

2017-09-17 Thread Christian Wuerdig
There is a ceph command "reweight-by-utilization" you can run to adjust the OSD weights automatically based on their utilization: http://docs.ceph.com/docs/master/rados/operations/control/#osd-subsystem Some people run this on a periodic basis (cron script) Check the mailing list archives, for

Re: [ceph-users] OSD memory usage

2017-09-15 Thread Christian Wuerdig
Assuming you're using Bluestore you could experiments with the cache settings (http://docs.ceph.com/docs/master/rados/configuration/bluestore-config-ref/) In your case setting bluestore_cache_size_hdd lower than the default 1GB might help with the RAM usage various people have reported solving

Re: [ceph-users] Luminous BlueStore EC performance

2017-09-07 Thread Christian Wuerdig
What type of EC config (k+m) was used if I may ask? On Fri, Sep 8, 2017 at 1:34 AM, Mohamad Gebai wrote: > Hi, > > These numbers are probably not as detailed as you'd like, but it's > something. They show the overhead of reading and/or writing to EC pools as > compared to 3x

Re: [ceph-users] Ceph kraken: Calamari Centos7

2017-07-20 Thread Christian Wuerdig
Judging by the github repo, development on it has all but stalled, the last commit was more then 3 months ago ( https://github.com/ceph/calamari/commits/master) Also there is the new dashboard in the new ceph mgr deamon in Luminous - so my guess is that pretty much Calamari is dead. On Thu, Jul

Re: [ceph-users] Ceph random read IOPS

2017-06-26 Thread Christian Wuerdig
--WjW > > On 2017-06-24 12:52, Willem Jan Withagen wrote: > > On 24-6-2017 05:30, Christian Wuerdig wrote: > > The general advice floating around is that your want CPUs with high > clock speeds rather than more cores to reduce latency and increase IOPS > for SSD setups (see also &

Re: [ceph-users] Ceph random read IOPS

2017-06-23 Thread Christian Wuerdig
The general advice floating around is that your want CPUs with high clock speeds rather than more cores to reduce latency and increase IOPS for SSD setups (see also http://www.sys-pro.co.uk/ceph-storage-fast-cpus-ssd-performance/) So something like a E5-2667V4 might bring better results in that

Re: [ceph-users] handling different disk sizes

2017-06-05 Thread Christian Wuerdig
Yet another option is to change the failure domain to OSD instead host (this avoids having to move disks around and will probably meet you initial expectations). Means your cluster will become unavailable when you loose a host until you fix it though. OTOH you probably don't have too much leeway

Re: [ceph-users] Recovery stuck in active+undersized+degraded

2017-06-02 Thread Christian Wuerdig
Well, what's "best" really depends on your needs and use-case. The general advise which has been floated several times now is to have at least N+2 entities of your failure domain in your cluster. So for example if you run with size=3 then you should have at least 5 OSDs if your failure domain is

Re: [ceph-users] Ceph Performance

2017-05-04 Thread Christian Wuerdig
On Thu, May 4, 2017 at 7:53 PM, Fuxion Cloud wrote: > Hi all, > > Im newbie in ceph technology. We have ceph deployed by vendor 2 years ago > with Ubuntu 14.04LTS without fine tuned the performance. I noticed that the > performance of storage is very slow. Can someone

Re: [ceph-users] Understanding Ceph in case of a failure

2017-03-20 Thread Christian Wuerdig
On Tue, Mar 21, 2017 at 8:57 AM, Karol Babioch wrote: > Hi, > > Am 20.03.2017 um 05:34 schrieb Christian Balzer: > > you do realize that you very much have a corner case setup there, right? > > Yes, I know that this is not exactly a recommendation, but I hoped it > would be

Re: [ceph-users] Migrating data from a Ceph clusters to another

2017-02-11 Thread Christian Wuerdig
According to: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-May/009485.html it seems not entirely safe to copy an RBD pool this way. This thread mentions doing a rados ls and the get/put the objects but Greg mentioned that this may also have issues with snapshots. Maybe cppool has

Re: [ceph-users] PGs stuck active+remapped and osds lose data?!

2017-01-09 Thread Christian Wuerdig
this looks like there is some data > lost, since ceph did not do any backfill or other operation. That’s the > problem... > > Ok that output is indeed a bit different. However as you should note the actual data stored in the cluster goes from 4809 to 4830 GB. 4830 * 3 is actually on

Re: [ceph-users] PGs stuck active+remapped and osds lose data?!

2017-01-09 Thread Christian Wuerdig
On Tue, Jan 10, 2017 at 8:23 AM, Marcus Müller wrote: > Hi all, > > Recently I added a new node with new osds to my cluster, which, of course > resulted in backfilling. At the end, there are 4 pgs left in the state 4 > active+remapped and I don’t know what to do. > >

Re: [ceph-users] rgw civetweb ssl official documentation?

2016-12-19 Thread Christian Wuerdig
No official documentation but here is how I got it to work on Ubuntu 16.04.01 (in this case I'm using a self-signed certificate): assuming you're running rgw on a computer called rgwnode: 1. create self-signed certificate ssh rgwnode openssl req -x509 -nodes -newkey rsa:4096 -keyout key.pem

Re: [ceph-users] Pgs stuck on undersized+degraded+peered

2016-12-09 Thread Christian Wuerdig
Hi, it's useful to generally provide some detail around the setup, like: What are your pool settings - size and min_size? What is your failure domain - osd or host? What version of ceph are you running on which OS? You can check which specific PGs are problematic by running "ceph health detail"

Re: [ceph-users] VM disk operation blocked during OSDs failures

2016-11-04 Thread Christian Wuerdig
What are your pool size and min_size settings? An object with less than min_size replicas will not receive I/O ( http://docs.ceph.com/docs/jewel/rados/operations/pools/#set-the-number-of-object-replicas). So if size=2 and min_size=1 then an OSD failure means blocked operations to all objects

Re: [ceph-users] Hammer Cache Tiering

2016-11-01 Thread Christian Wuerdig
On Wed, Nov 2, 2016 at 5:19 PM, Ashley Merrick wrote: > Hello, > > Thanks for your reply, when you say latest's version do you .6 and not .5? > > The use case is large scale storage VM's, which may have a burst of high > write's during new storage being loaded onto the