[ceph-users] Is it possible to recover from block.db failure?

2017-10-19 Thread Caspar Smit
will the --mkjournal command re-initialize that or is the --mkjournal command only used for FileStore ? Kind regards and thanks in advance for any reply, Caspar Smit ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi

Re: [ceph-users] how to upgrade CEPH journal?

2017-11-10 Thread Caspar Smit
he ceph.conf (so if you want a 20GB DB, use that as value for bluestore_block_db_size) Caspar > On Thu, Nov 9, 2017 at 6:26 PM, Caspar Smit <caspars...@supernas.eu> > wrote: > >> Rudi, >> >> You can set the size of block.db and block.wal partitions in the >> c

Re: [ceph-users] No ops on some OSD

2017-11-13 Thread Caspar Smit
Weird # ceph --version ceph version 12.2.1 (fc129ad90a65dc0b419412e77cb85ac230da42a6) luminous (stable) # ceph osd status +++---+---++-++-+ | id | host | used | avail | wr ops | wr data | rd ops | rd data |

Re: [ceph-users] Getting errors on erasure pool writes k=2, m=1

2017-11-13 Thread Caspar Smit
Hi, Why would Ceph 12.2.1 give you this message: 2017-11-10 20:39:31.296101 7f840ad45e40 -1 WARNING: the following dangerous and experimental features are enabled: bluestore Or is that a leftover warning message from an old client? Kind regards, Caspar 2017-11-10 21:27 GMT+01:00 Marc Roos

Re: [ceph-users] Journal / WAL drive size?

2017-11-23 Thread Caspar Smit
and not always a usable amount. Caspar Met vriendelijke groet, Caspar Smit Systemengineer SuperNAS Dorsvlegelstraat 13 1445 PA Purmerend t: (+31) 299 410 414 e: caspars...@supernas.eu w: www.supernas.eu 2017-11-23 10:27 GMT+01:00 Rudi Ahlers <rudiahl...@gmail.com>: > Hi, > > Can

[ceph-users] Copy RBD image from replicated to erasure pool possible?

2017-12-18 Thread Caspar Smit
Hi all, http://ceph.com/community/new-luminous-erasure-coding-rbd-cephfs/ Since it is possible in Luminous to use RBD directly on erasure coded pools the question arises how i can migrate an RBD image from a replicated pool to an erasure coded pool. I've got two pools configured, one replicated

Re: [ceph-users] Copy RBD image from replicated to erasure pool possible?

2017-12-18 Thread Caspar Smit
Hi all, Allthough undocumented, i just tried: "rbd -p rbd copy disk1 disk1ec --data-pool ecpool" And it worked! :) The copy is now on the erasure coded pool. Kind regards, Caspar 2017-12-18 22:32 GMT+01:00 Caspar Smit <caspars...@supernas.eu>: > Hi all, > > http

Re: [ceph-users] Disk Down Emergency

2017-11-16 Thread Caspar Smit
2017-11-16 14:05 GMT+01:00 Georgios Dimitrakakis : > Dear cephers, > > I have an emergency on a rather small ceph cluster. > > My cluster consists of 2 OSD nodes with 10 disks x4TB each and 3 monitor > nodes. > > The version of ceph running is Firefly v.0.80.9 >

Re: [ceph-users] Disk Down Emergency

2017-11-16 Thread Caspar Smit
2017-11-16 14:43 GMT+01:00 Wido den Hollander <w...@42on.com>: > > > Op 16 november 2017 om 14:40 schreef Georgios Dimitrakakis < > gior...@acmac.uoc.gr>: > > > > > > @Sean Redmond: No I don't have any unfound objects. I only have "stuck > >

Re: [ceph-users] Luminous LTS: `ceph osd crush class create` is gone?

2017-11-03 Thread Caspar Smit
2017-11-03 7:59 GMT+01:00 Brad Hubbard : > On Fri, Nov 3, 2017 at 4:04 PM, Linh Vu wrote: > > Hi all, > > > > > > Back in Luminous Dev and RC, I was able to do this: > > > > > > `ceph osd crush class create myclass` > > This was removed as part of

Re: [ceph-users] 答复: Re: Luminous LTS: `ceph osd crush class create` isgone?

2017-11-06 Thread Caspar Smit
"That is not true. "ceph osd crush set-device-class" will fail if the input OSD has already been assigned a class. Instead you should do "ceph osd crush rm-device-class" before proceeding." You are absolutely right, sorry for the confusion! Caspar 2017-11-04 2:22 GMT+01:00

Re: [ceph-users] Erasure pool

2017-11-09 Thread Caspar Smit
2017-11-08 22:05 GMT+01:00 Marc Roos : > > Can anyone advice on a erasure pool config to store > > - files between 500MB and 8GB, total 8TB > - just for archiving, not much reading (few files a week) > - hdd pool > - now 3 node cluster (4th coming) > - would like to save

Re: [ceph-users] how to upgrade CEPH journal?

2017-11-09 Thread Caspar Smit
2017-11-09 17:02 GMT+01:00 Alwin Antreich : > Hi Rudi, > On Thu, Nov 09, 2017 at 04:09:04PM +0200, Rudi Ahlers wrote: > > Hi, > > > > Can someone please tell me what the correct procedure is to upgrade a > CEPH > > journal? > > > > I'm running ceph: 12.2.1 on Proxmox 5.1,

Re: [ceph-users] how to upgrade CEPH journal?

2017-11-09 Thread Caspar Smit
Rudi, You can set the size of block.db and block.wal partitions in the ceph.conf configuration file using: bluestore_block_db_size = 16106127360 (which is 15GB, just calculate the correct number for your needs) bluestore_block_wal_size = 16106127360 Kind regards, Caspar 2017-11-09 17:19

Re: [ceph-users] Running Jewel and Luminous mixed for a longer period

2017-12-06 Thread Caspar Smit
2017-12-05 18:39 GMT+01:00 Richard Hesketh : > On 05/12/17 17:10, Graham Allan wrote: > > On 12/05/2017 07:20 AM, Wido den Hollander wrote: > >> Hi, > >> > >> I haven't tried this before but I expect it to work, but I wanted to > >> check before proceeding. > >> > >>

Re: [ceph-users] Bluestore with SSD-backed DBs; what if the SSD fails?

2017-10-25 Thread Caspar Smit
Hi, I've asked the exact same question a few days ago, same answer: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-October/021708.html I guess we'll have to bite the bullet on this one and take this into account when designing. Kind regards, Caspar 2017-10-25 10:39 GMT+02:00

Re: [ceph-users] Added two OSDs, 10% of pgs went inactive

2017-12-20 Thread Caspar Smit
Hi Daniel, I've had the same problem with creating a new 12.2.2 cluster where i couldn't get some pgs out of the "activating+remapped" status after i switched some OSD's from one chassis to another (there was no data on it yet). I tried restarting OSD's to no avail. Couldn't find anything about

[ceph-users] Prioritize recovery over backfilling

2017-12-20 Thread Caspar Smit
Hi all, I've been lowering some weights (0.95) of a few OSD's in our cluster. I have max backfills at the default of 1 so every OSD can only do backfilll of 1 pg at a time. So sometimes the status of a the rebalanced pg's change to only 10 active+remapped+backfilling But

Re: [ceph-users] How to see PGs of a pool on a OSD

2018-05-22 Thread Caspar Smit
Here you go: ps. You might have to map your poolnames to pool ids http://cephnotes.ksperis.com/blog/2015/02/23/get-the-number-of-placement-groups-per-osd Kind regards, Caspar 2018-05-22 9:13 GMT+02:00 Pardhiv Karri : > Hi, > > Our ceph cluster have 12 pools and only 3

Re: [ceph-users] Data recovery after loosing all monitors

2018-05-22 Thread Caspar Smit
2018-05-22 15:51 GMT+02:00 Wido den Hollander : > > > On 05/22/2018 03:38 PM, George Shuklin wrote: > > Good news, it's not an emergency, just a curiosity. > > > > Suppose I lost all monitors in a ceph cluster in my laboratory. I have > > all OSDs intact. Is it possible to recover

Re: [ceph-users] Crush Map Changed After Reboot

2018-05-22 Thread Caspar Smit
Fwir, you could also put this into your ceph.conf to explicitly put an osd into the correct chassis at start if you have other osd's which you still want the crush_update_on_start setting set to true for: [osd.34] osd crush location = "chassis=ceph-osd3-internal" [osd.35] osd crush

Re: [ceph-users] Ceph health error (was: Prioritize recovery over backfilling)

2018-06-08 Thread Caspar Smit
6.1 is down+remapped, acting [6,3,2147483647,29,2147483647,2147483647] pg 6.3f is down+remapped, acting [20,24,2147483647,2147483647,3,28] Kind regards, Caspar Smit 2018-06-08 8:53 GMT+02:00 Caspar Smit : > Update: > > I've unset nodown to let it continue but now 4 osd's are down a

Re: [ceph-users] Ceph health error (was: Prioritize recovery over backfilling)

2018-06-08 Thread Caspar Smit
head", "oid": "3#6:fc074663:::rbd_data.5.6c1d9574b0dc51.000312db:head#", "attr_lens": { "_": 297, "hinfo_key": 18, "snapset": 35 }

Re: [ceph-users] Ceph health error (was: Prioritize recovery over backfilling)

2018-06-07 Thread Caspar Smit
what should i do now? Thank you very much for any help Kind regards, Caspar 2018-06-07 13:33 GMT+02:00 Sage Weil : > On Wed, 6 Jun 2018, Caspar Smit wrote: > > Hi all, > > > > We have a Luminous 12.2.2 cluster with 3 nodes and i recently added a > node > > to it. &

Re: [ceph-users] Ceph health error (was: Prioritize recovery over backfilling)

2018-06-08 Thread Caspar Smit
8] 9: (ThreadPool::WorkThread::entry()+0x10) [0x558e08bac540] 10: (()+0x7494) [0x7f4c709ca494] 11: (clone()+0x3f) [0x7f4c6fa51aff] NOTE: a copy of the executable, or `objdump -rdS ` is needed to interpret this. Any help is highly appreciated. Kind regards, Caspar Smit 2018-06-08 7:57

Re: [ceph-users] SSD Bluestore Backfills Slow

2018-06-04 Thread Caspar Smit
t;,* "osd_objectstore": "bluestore", "rotational": "1" Looks to me if i'm hitting the same issue, isn't it? ps. An upgrade of Ceph is planned in the near future but for now i would like to use the workaround if applicable to me. Thank you in a

[ceph-users] Prioritize recovery over backfilling

2018-06-06 Thread Caspar Smit
ese drives notorious for this behaviour? Anyone has experience with these drives in a CEPH environment? Kind regards, Caspar Smit ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Prioritize recovery over backfilling

2018-06-07 Thread Caspar Smit
of OSD's are flagged down it should be better when there's no activity at all, that when the osd's come back up there is nothing to be done (no due recovery/backfilling). Kind regards, Caspar 2018-06-07 8:47 GMT+02:00 Piotr Dałek : > On 18-06-06 09:29 PM, Caspar Smit wrote: > >> Hi all, &

Re: [ceph-users] incomplete PG for erasure coding pool after OSD failure

2018-06-26 Thread Caspar Smit
node so crush can find a place to store the data. Another option is to look into the following blog to use erasure coding on small clusters: http://cephnotes.ksperis.com/blog/2017/01/27/erasure-code-on-small-clusters/ Kind regards, Caspar Met vriendelijke groet, Caspar Smit Systemengineer

[ceph-users] Balancer: change from crush-compat to upmap

2018-06-25 Thread Caspar Smit
Hi All, I've been using the balancer module in crush-compat mode for quite a while now and want to switch to upmap mode since all my clients are now luminous (v12.2.5) i've reweighted the compat weight-set back to as close as the original crush weights using 'ceph osd crush reweight-compat'

Re: [ceph-users] Adding SSD-backed DB & WAL to existing HDD OSD

2018-07-03 Thread Caspar Smit
2018-07-03 4:27 GMT+02:00 Brad Fitzpatrick : > Hello, > > I was wondering if it's possible or how best to add a DB & WAL to an OSD > retroactively? (Still using Luminous) > > I hurriedly created some HDD-backed bluestore OSDs without their WAL & DB > on SSDs, and then loaded them up with data. >

Re: [ceph-users] Is it possible to recover from block.db failure?

2017-10-19 Thread Caspar Smit
s for them. > > On Thu, Oct 19, 2017 at 7:44 AM Caspar Smit <caspars...@supernas.eu> wrote: >> >> Hi all, >> >> I'm testing some scenario's with the new Ceph luminous/bluestore >> combination. >> >> I've created a demo setup with 3 nodes (each

[ceph-users] Move an erasure coded RBD image to another pool.

2018-01-08 Thread Caspar Smit
Hi all, I've migrated all of my replicated rbd images to erasure coded images using "rbd copy" with the "--data-pool" parameter. So i now have a replicated pool with 4K pgs, that is only storing RBD headers and metadata. RBD data is stored on the erasure pool. Now i would like to move the

[ceph-users] Erasure code ruleset for small cluster

2018-02-02 Thread Caspar Smit
Hi all, I'd like to setup a small cluster (5 nodes) using erasure coding. I would like to use k=5 and m=3. Normally you would need a minimum of 8 nodes (preferably 9 or more) for this. Then i found this blog: https://ceph.com/planet/erasure-code-on-small-clusters/ This sounded ideal to me so i

Re: [ceph-users] Erasure code ruleset for small cluster

2018-02-05 Thread Caspar Smit
# crushtool --test -i compiled-crushmap-new --rule 1 --show-mappings --x 1 --num-rep 8 CRUSH rule 1 x 1 [] Kind regards, Caspar Smit 2018-02-02 19:09 GMT+01:00 Gregory Farnum <gfar...@redhat.com>: > On Fri, Feb 2, 2018 at 8:13 AM, Caspar Smit <caspars...@supernas.eu> > wrote: > >

Re: [ceph-users] Question about Erasure-coding clusters and resiliency

2018-02-13 Thread Caspar Smit
Hi Tim, With the current setup you can only handle 1 host failure without loosing any data, BUT everything will probably freeze until you bring the failed node (or the OSD"s in it) back up. Your setup indicates k=6, m=2 and all 8 shards are distributed to 4 hosts (2 shards/osds per host). Be

[ceph-users] balancer mgr module

2018-02-16 Thread Caspar Smit
Hi, After watching Sage's talk at LinuxConfAU about making distributed storage easy he mentioned the Balancer Manager module. After enabling this module, pg's should get balanced automagically around the cluster. The module was added in Ceph Luminous v12.2.2 Since i couldn't find much

Re: [ceph-users] balancer mgr module

2018-02-16 Thread Caspar Smit
ceph balancer reset # we do this because balancer rm is broken, > and myplan isn't removed automatically after execution > > v12.2.3 has quite a few balancer fixes, and also adds a pool-specific > balancing (which should hopefully fix my upmap issue). > > Hope that helps! > >

Re: [ceph-users] Many concurrent drive failures - How do I activate pgs?

2018-02-22 Thread Caspar Smit
Hi Sean and David, Do you have any follow ups / news on the Intel DC S4600 case? We are looking into this drives to use as DB/WAL devices for a new to be build cluster. Did Intel provide anything (like new firmware) which should fix the issues you were having or are these drives still

Re: [ceph-users] mon service failed to start

2018-02-19 Thread Caspar Smit
Hi Behnam, I would firstly recommend running a filesystem check on the monitor disk first to see if there are any inconsistencies. Is the disk where the monitor is running on a spinning disk or SSD? If SSD you should check the Wear level stats through smartctl. Maybe trim (discard) enabled on

Re: [ceph-users] Ceph Bluestore performance question

2018-02-19 Thread Caspar Smit
"I checked and the OSD-hosts peaked at a load average of about 22 (they have 24+24HT cores) in our dd benchmark, but stayed well below that (only about 20 % per OSD daemon) in the rados bench test." Maybe because your dd test uses bs=1M and rados bench is using 4M as default block size? Caspar

Re: [ceph-users] Many concurrent drive failures - How do I activate pgs?

2018-02-23 Thread Caspar Smit
Hi all, Thanks for all your follow ups on this. The Samsung SM863a is indeed a very good alternative, thanks! We ordered both (SM863a & DC S4600) so we can compare. Intel's response (I mean the lack of it) is not very promising. Allthough we have very good experiences with Intel DC SSD's we

[ceph-users] Proper procedure to replace DB/WAL SSD

2018-02-23 Thread Caspar Smit
Hi All, What would be the proper way to preventively replace a DB/WAL SSD (when it is nearing it's DWPD/TBW limit and not failed yet). It hosts DB partitions for 5 OSD's Maybe something like: 1) ceph osd reweight 0 the 5 OSD's 2) let backfilling complete 3) destroy/remove the 5 OSD's 4)

Re: [ceph-users] Cache tiering on Erasure coded pools

2017-12-27 Thread Caspar Smit
Also carefully read the word of caution section on David's link (which is absent in the jewel version of the docs), a cache tier in front of an ersure coded data pool for RBD is almost always a bad idea. Caspar Met vriendelijke groet, Caspar Smit Systemengineer SuperNAS Dorsvlegelstraat 13 1445

Re: [ceph-users] JBOD question

2018-07-25 Thread Caspar Smit
Satish, Yes, that card support 'both'. You have to flash the IR firmware (IT Firmware = JBOD only) and then you are able to create RAID1 sets in the BIOS of the card and any ununsed disks will be seen by the OS as 'jbod' Kind regards, Caspar Smit 2018-07-23 20:43 GMT+02:00 Satish Patel : >

Re: [ceph-users] Default erasure code profile and sustaining loss of one host containing 4 OSDs

2018-07-20 Thread Caspar Smit
ity. Of course a failure domain of 'host' is required to do this but since you have 6 hosts that would be ok. Met vriendelijke groet, Caspar Smit Systemengineer SuperNAS Dorsvlegelstraat 13 1445 PA Purmerend t: (+31) 299 410 414 e: caspars...@supernas.eu w: www.supernas.eu 2018-07-20 14:02 GMT

Re: [ceph-users] Default erasure code profile and sustaining loss of one host containing 4 OSDs

2018-07-20 Thread Caspar Smit
Ziggy, The default min_size for your pool is 3 so losing ANY single OSD (not even host) will result in reduced data availability: https://patchwork.kernel.org/patch/8546771/ Use m=2 to be able to handle a node failure. Met vriendelijke groet, Caspar Smit Systemengineer SuperNAS

Re: [ceph-users] v12.2.7 Luminous released

2018-07-18 Thread Caspar Smit
2018-07-18 3:04 GMT+02:00 Linh Vu : > Thanks for all your hard work in putting out the fixes so quickly! :) > > We have a cluster on 12.2.5 with Bluestore and EC pool but for CephFS, not > RGW. In the release notes, it says RGW is a risk especially the garbage > collection, and the recommendation

Re: [ceph-users] Balancer: change from crush-compat to upmap

2018-07-18 Thread Caspar Smit
GMT+02:00 Xavier Trilla : > Hi Caspar, > > > > Did you find any information regarding the migration from crush-compat to > unmap? I’m facing the same situation. > > > > Thanks! > > > > > > *De:* ceph-users * En nombre de *Caspar > Smit >

Re: [ceph-users] Self shutdown of 1 whole system (Derbian stretch/Ceph 12.2.7/bluestore)

2018-07-23 Thread Caspar Smit
Do you have any hardware watchdog running in the system? A watchdog could trigger a powerdown if it meets some value. Any event logs from the chassis itself? Kind regards, Caspar 2018-07-21 10:31 GMT+02:00 Nicolas Huillard : > Hi all, > > One of my server silently shutdown last night, with no

Re: [ceph-users] SSD OSDs crashing after upgrade to 12.2.7

2018-09-06 Thread Caspar Smit
Hi, These reports are kind of worrying since we have a 12.2.5 cluster too waiting to upgrade. Did you have a luck with upgrading to 12.2.8 or still the same behavior? Is there a bugtracker for this issue? Kind regards, Caspar Op di 4 sep. 2018 om 09:59 schreef Wolfgang Lendl <

Re: [ceph-users] Slow requests

2018-07-04 Thread Caspar Smit
Hi Ben, At first glance i would say the CPU's are a bit weak for this setup. Recommended is to have at least 1 core per OSD. Since you have 8 cores and 10 OSD's there isn't much left for other processes. Furthermore, did you upgrade the firmware of those DC S4500's to the latest firmware?

Re: [ceph-users] BlueStore questions

2018-03-08 Thread Caspar Smit
Hi Frank, 2018-03-04 1:40 GMT+01:00 Frank Ritchie : > Hi all, > > I have a few questions on using BlueStore. > > With FileStore it is not uncommon to see 1 nvme device being used as the > journal device for up to 12 OSDs. > > Can an adequately sized nvme device also be

Re: [ceph-users] Memory leak in Ceph OSD?

2018-03-01 Thread Caspar Smit
Stefan, How many OSD's and how much RAM are in each server? bluestore_cache_size=6G will not mean each OSD is using max 6GB RAM right? Our bluestore hdd OSD's with bluestore_cache_size at 1G use ~4GB of total RAM. The cache is a part of the memory usage by bluestore OSD's. Kind regards, Caspar

Re: [ceph-users] Case where a separate Bluestore WAL/DB device crashes...

2018-03-01 Thread Caspar Smit
s/aren't/are/ :) Met vriendelijke groet, Caspar Smit Systemengineer SuperNAS Dorsvlegelstraat 13 1445 PA Purmerend t: (+31) 299 410 414 e: caspars...@supernas.eu w: www.supernas.eu 2018-03-01 16:31 GMT+01:00 David Turner <drakonst...@gmail.com>: > This aspect of osds has not cha

Re: [ceph-users] SSD as DB/WAL performance with/without drive write cache

2018-03-13 Thread Caspar Smit
back cache setting has slower write than > write through. > > But I would offer to pay more attention to IOPS than to the sequential > write speed, especially on the small blocks workload. > > 2018-03-13 21:33 GMT+05:00 Caspar Smit <caspars...@supernas.eu>: > >> Hi

[ceph-users] SSD as DB/WAL performance with/without drive write cache

2018-03-13 Thread Caspar Smit
Hi all, I've tested some new Samsung SM863 960GB and Intel DC S4600 240GB SSD's using the method described at Sebastien Han's blog: https://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/ The first thing stated there is to disable the drive's

Re: [ceph-users] Ceph see the data size larger than actual stored data in rbd

2018-03-14 Thread Caspar Smit
Hi, In order to reclaim space in Ceph you will need to use the discard feature of KRBD: https://www.sebastien-han.fr/blog/2015/01/26/ceph-and-krbd-discard/ Kind regards, Caspar 2018-03-14 10:34 GMT+01:00 Mostafa Hamdy Abo El-Maty El-Giar < mostafaha...@mans.edu.eg>: > Hi , > > I found some

Re: [ceph-users] Backfilling on Luminous

2018-03-16 Thread Caspar Smit
Hi David, What about memory usage? 1] 23 OSD nodes: 15x 10TB Seagate Ironwolf filestore with journals on Intel DC P3700, 70% full cluster, Dual Socket E5-2620 v4 @ 2.10GHz, 128GB RAM. If you upgrade to bluestore, memory usage will likely increase. 15x10TB ~~ 150GB RAM needed especially in

Re: [ceph-users] CHOOSING THE NUMBER OF PLACEMENT GROUPS

2018-03-09 Thread Caspar Smit
Hi Will, Yes, adding new pools will increase the number of PG's per OSD. But you can always decrease the number of pg's per OSD by adding new Hosts/OSD's. When you design a cluster you have to calculate how many pools you're going to use and use that information with PGcalc.

Re: [ceph-users] Best way to remove an OSD node

2018-04-17 Thread Caspar Smit
Hi John, Thanks for pointing out that script, do you have a link to it? I'm not able to find it. Just want to look at the script to understand its strategy. Kind regards, Caspar Smit 2018-04-16 13:11 GMT+02:00 John Petrini <jpetr...@coredial.com>: > There's a gentle reweight pyth

Re: [ceph-users] pg's are stuck in active+undersized+degraded+remapped+backfill_wait even after introducing new osd's to cluster

2018-04-18 Thread Caspar Smit
Hi Dilip, Looking at the output of ceph -s it's still recovering (there are still pgs in recovery_wait, backfill_wait, recovering state) so you will have to be patient to let ceph recover. The output of ceph osd dump doesn't mention osd.7 (it's referring to pool 7) Kind regards, Caspar Smit

Re: [ceph-users] Cluster Re-balancing

2018-04-18 Thread Caspar Smit
Hi Monis, The settings you mention do not prevent data movement to overloaded OSD's, they are a threshold when CEPH warns when an OSD is nearfull or backfillfull. No expert on this but setting backfillfull lower then nearfull is not recommended, the nearfull state should be reached first in stead

[ceph-users] Best way to remove an OSD node

2018-04-16 Thread Caspar Smit
Hi All, What would be the best way to remove an entire OSD node from a cluster? I've ran into problems removing OSD's from that node 1 by 1, eventually the last few OSD's are overloaded with data. Setting the crush weight of all these OSD's to 0 at once seems a bit rigorous Is there also a

[ceph-users] Lost space or expected?

2018-03-20 Thread Caspar Smit
Hi all, Here's the output of 'rados df' for one of our clusters (Luminous 12.2.2): ec_pool 75563G 19450232 0 116701392 0 0 0 385351922 27322G 800335856 294T rbd 42969M 10881 0 32643 0 0 0 615060980 14767G 970301192 207T rbdssd 252G 65446 0 196338 0 0 0 29392480 1581G 211205402 2601G

Re: [ceph-users] wal and db device on SSD partitions?

2018-03-21 Thread Caspar Smit
2018-03-21 7:20 GMT+01:00 ST Wong (ITSC) : > Hi all, > > > > We got some decommissioned servers from other projects for setting up > OSDs. They’ve 10 2TB SAS disks with 4 2TB SSD. > > We try to test with bluestores and hope to play wal and db devices on > SSD. Need advice

Re: [ceph-users] Proper procedure to replace DB/WAL SSD

2018-02-26 Thread Caspar Smit
gt;> >> Which also makes me wonder: what is actually the format of WAL and >> BlockDB in bluestore? Is there any documentation available about it? >> >> Best, >> >> Nico >> >> >> Caspar Smit <caspars...@supernas.eu> writes: >> >>

Re: [ceph-users] fast_read in EC pools

2018-02-27 Thread Caspar Smit
Oliver, Be aware that for k=4,m=2 the min_size will be 5 (k+1), so after a node failure the min_size is already reached. Any OSD failure beyond the node failure will probably result in some PG's to be become incomplete (I/O Freeze) until the incomplete PG's data is recovered to another OSD in

Re: [ceph-users] Proper procedure to replace DB/WAL SSD

2018-02-27 Thread Caspar Smit
). There are a few threads that mention how to check > how much of your DB partition is in use. Once it's full, it spills over to > the HDD. > > > On Tue, Feb 27, 2018, 6:19 AM Caspar Smit <caspars...@supernas.eu> wrote: > >> 2018-02-26 23:01 GMT+01:00 Gregory Farnum &

Re: [ceph-users] fast_read in EC pools

2018-02-27 Thread Caspar Smit
Oliver, Here's the commit info: https://github.com/ceph/ceph/commit/48e40fcde7b19bab98821ab8d604eab920591284 Caspar 2018-02-27 14:28 GMT+01:00 Oliver Freyermuth <freyerm...@physik.uni-bonn.de> : > Am 27.02.2018 um 14:16 schrieb Caspar Smit: > > Oliver, > > > >

Re: [ceph-users] Proper procedure to replace DB/WAL SSD

2018-02-27 Thread Caspar Smit
2018-02-26 23:01 GMT+01:00 Gregory Farnum <gfar...@redhat.com>: > On Mon, Feb 26, 2018 at 3:23 AM Caspar Smit <caspars...@supernas.eu> > wrote: > >> 2018-02-24 7:10 GMT+01:00 David Turner <drakonst...@gmail.com>: >> >>> Caspar, it looks like

Re: [ceph-users] Proper procedure to replace DB/WAL SSD

2018-02-27 Thread Caspar Smit
n the partprobe. Caspar > On Mon, Feb 26, 2018 at 6:23 AM Caspar Smit <caspars...@supernas.eu> > wrote: > >> 2018-02-24 7:10 GMT+01:00 David Turner <drakonst...@gmail.com>: >> >>> Caspar, it looks like your idea should work. Worst case scenario seems >

Re: [ceph-users] Erasure coding with more chunks than servers

2018-10-05 Thread Caspar Smit
Hi Vlad, You can check this blog: http://cephnotes.ksperis.com/blog/2017/01/27/erasure-code-on-small-clusters Note! Be aware that these settings do not automatically cover a node failure. Check out this thread why: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-February/024423.html

Re: [ceph-users] Fwd: [Ceph-community] After Mimic upgrade OSD's stuck at booting.

2018-09-25 Thread Caspar Smit
You can set: *osd_scrub_during_recovery = false* and in addition maybe set the noscrub and nodeep-scrub flags to let it settle. Kind regards, Caspar Op di 25 sep. 2018 om 12:39 schreef Sergey Malinin : > Just let it recover. > > data: > pools: 1 pools, 4096 pgs > objects: 8.95 M

Re: [ceph-users] yet another deep-scrub performance topic

2018-12-11 Thread Caspar Smit
Furthermore, presuming you are running Jewel or Luminous you can change some settings in ceph.conf to mitigate the deep-scrub impact: osd scrub max interval = 4838400 osd scrub min interval = 2419200 osd scrub interval randomize ratio = 1.0 osd scrub chunk max = 1 osd scrub chunk min = 1 osd

Re: [ceph-users] yet another deep-scrub performance topic

2018-12-11 Thread Caspar Smit
Hi Vladimir, While it is advisable to investigate why deep-scrub is killing your performance (it's enabled for a reason) and find ways to fix that (seperate block.db SSD's for instance might help) here's a way to accomodate your needs: For all your 7200RPM Spinner based pools do: ceph osd pool

Re: [ceph-users] yet another deep-scrub performance topic

2018-12-11 Thread Caspar Smit
": "524288", "osd_disk_thread_ioprio_class": "", "osd_disk_thread_ioprio_priority": "-1", You can check your differences with the defaults using: ceph daemon osd.x config diff Kind regards, Caspar Op di 11 dec. 2018 om 12:36 schreef Janne

[ceph-users] Scheduling deep-scrub operations

2018-12-14 Thread Caspar Smit
Hi all, We have operating hours from 4 pm until 7 am each weekday and 24 hour days in the weekend. I was wondering if it's possible to allow deep-scrubbing from 7 am until 15 pm only on weekdays and prevent any deep-scrubbing in the weekend. I've seen the osd scrub begin/end hour settings but

Re: [ceph-users] Scheduling deep-scrub operations

2018-12-14 Thread Caspar Smit
here for available option: > > https://github.com/ceph/ceph/blob/luminous/src/common/options.cc#L2533 > > -- Dan > > > On Fri, Dec 14, 2018 at 12:25 PM Caspar Smit > wrote: > > > > Hi all, > > > > We have operating hours from 4 pm until 7 am each weekday

[ceph-users] recovering vs backfilling

2019-01-10 Thread Caspar Smit
Hi all, I wanted to test Dan's upmap-remapped script for adding new osd's to a cluster. (Then letting the balancer gradually move pgs to the new OSD afterwards) I've created a fresh (virtual) 12.2.10 4-node cluster with very small disks (16GB each). 2 OSD's per node. Put ~20GB of data on the

Re: [ceph-users] Help Ceph Cluster Down

2019-01-07 Thread Caspar Smit
gt;> >> >> >>> pg 14.8ef is activating+degraded, acting [9,36] >> >> >> >>> pg 14.8f8 is active+undersized+degraded, acting [30] >> >> >> >>> pg 14.901 is activating+degraded, acting [22,37] >> >> >> &

Re: [ceph-users] Help Ceph Cluster Down

2019-01-04 Thread Caspar Smit
Hi Arun, How did you end up with a 'working' cluster with so many pgs per OSD? "too many PGs per OSD (2968 > max 200)" To (temporarily) allow this kind of pgs per osd you could try this: Change these values in the global section in your ceph.conf: mon max pg per osd = 200 osd max pg per osd

Re: [ceph-users] Help Ceph Cluster Down

2019-01-04 Thread Caspar Smit
ive+undersized+degraded > 1 remapped+peering > 1 active+clean+remapped > 1 activating+undersized+degraded+remapped > > io: > client: 0 B/s rd, 25397 B/s wr, 4 op/s rd, 4 op/s wr > > I will update number of PGs per OSD

Re: [ceph-users] pool/volume live migration

2019-02-08 Thread Caspar Smit
Hi Luis, According to slide 21 of Sage's presentation at FOSDEM it is coming in Nautilus: https://fosdem.org/2019/schedule/event/ceph_project_status_update/attachments/slides/3251/export/events/attachments/ceph_project_status_update/slides/3251/ceph_new_in_nautilus.pdf Kind regards, Caspar Op

Re: [ceph-users] best practices for EC pools

2019-02-08 Thread Caspar Smit
Op vr 8 feb. 2019 om 11:31 schreef Scheurer François < francois.scheu...@everyware.ch>: > Dear Eugen Block > Dear Alan Johnson > > > Thank you for your answers. > > So we will use EC 3+2 on 6 nodes. > Currently with only 4 osd's per node, then 8 and later 20. > > > >Just to add, that a more

Re: [ceph-users] backfill_toofull after adding new OSDs

2019-01-31 Thread Caspar Smit
Hi Jan, You might be hitting the same issue as Wido here: https://www.spinics.net/lists/ceph-users/msg50603.html Kind regards, Caspar Op do 31 jan. 2019 om 14:36 schreef Jan Kasprzak : > Hello, ceph users, > > I see the following HEALTH_ERR during cluster rebalance: > >

Re: [ceph-users] Understanding incomplete PGs

2019-07-05 Thread Caspar Smit
. If the above statement is true, you could *temporarily* set min_size to 2 (on your EC pools) to get back access to your data again but this is a very dangerous action. Losing another OSD during this period results in actual data loss. Kind regards, Caspar Smit Op vr 5 jul. 2019 om 01:17 schreef Kyle

Re: [ceph-users] [events] Ceph Day Netherlands July 2nd - CFP ends June 3rd

2019-07-08 Thread Caspar Smit
Mike, Do you know if the slides from the presentations at Ceph Day Netherlands will be made available? (and if yes, where to find them?) Kind regards, Caspar Smit Op wo 29 mei 2019 om 16:42 schreef Mike Perez : > Hi everyone, > > This is the last week to submit for the Ceph Day Ne

Re: [ceph-users] problem with degraded PG

2019-06-14 Thread Caspar Smit
regards, Caspar Smit Op vr 14 jun. 2019 om 11:52 schreef Luk : > Here is ceph osd tree, in first post there is also ceph osd df tree: > > https://pastebin.com/Vs75gpwZ > > > > > Ahh I was thinking of chooseleaf_vary_r, which you already have. > > So probably not rel

[ceph-users] Is a scrub error (read_error) on a primary osd safe to repair?

2019-12-04 Thread Caspar Smit
Both PG's are replicated PG's with 3 copies. I'm on Luminous 12.2.5 on this installation, is it safe to just run "ceph pg repair" on those PG's or will it then overwrite the two good copies with the bad one from the primary? If the latter is true, what is the correct way to resolve this? Kind

Re: [ceph-users] Is a scrub error (read_error) on a primary osd safe to repair?

2019-12-05 Thread Caspar Smit
RAM gets low (We still have to reboot the cluster nodes once in a while to free up RAM). Furthermore we will upgrade to 12.2.12 soon Caspar Smit Systemengineer SuperNAS Dorsvlegelstraat 13 1445 PA Purmerend t: (+31) 299 410 414 e: caspars...@supernas.eu w: www.supernas.eu Op do 5 dec. 2019 om 07