Re: [ceph-users] CRUSH tunables for production system? / Data Distribution?

2013-11-13 Thread Oliver Schulz
Dear Greg, I believe 3.8 is after CRUSH_TUNABLES v1 was implemented in the kernel, so it shouldn't hurt you to turn them on if you need them. (And the crush tool is just out of date; we should update that text!) However, if you aren't having distribution issues on your cluster I wouldn't bother

[ceph-users] Constant slow / blocked requests with otherwise healthy cluster

2013-11-27 Thread Oliver Schulz
Dear Ceph Experts, our Ceph cluster suddenly went into a state of OSDs constantly having blocked or slow requests, rendering the cluster unusable. This happened during normal use, there were no updates, etc. All disks seem to be healthy (smartctl, iostat, etc.). A complete hardware reboot

Re: [ceph-users] Constant slow / blocked requests with otherwise healthy cluster

2013-11-28 Thread Oliver Schulz
Hi Michael, Sounds like what I was having starting a couple of days ago, played [...] yes, that sounds ony too familiar. :-( Updated to 3.12 kernel and restarted all of the ceph nodes and it's now happily churning through a rados -p rbd bench 300 write -t 120 that Weird - but if that

Re: [ceph-users] Constant slow / blocked requests with otherwise healthy cluster

2013-11-28 Thread Oliver Schulz
our Ceph cluster suddenly went into a state of OSDs constantly having blocked or slow requests, rendering the cluster unusable. This happened during normal use, there were no updates, etc. our cluster seems to have recovered overnight and is back to normal behaviour. This morning, everything

[ceph-users] Blocked requests during and after CephFS delete

2013-12-08 Thread Oliver Schulz
Hello Ceph-Gurus, a short while ago I reported some trouble we had with our cluster suddenly going into a state of blocked requests. We did a few tests, and we can reproduce the problem: During / after deleting of a substantial chunk of data on CephFS (a few TB), ceph health shows blocked

[ceph-users] XFS or btrfs for production systems with modern Kernel?

2013-06-07 Thread Oliver Schulz
Hello, the CEPH Hard disk and file system recommendations page states that XFS is the recommend OSD file system for production systems. Does that still hold true for the last kernels versions (e.g. Ubuntu 12.04 with lts-raring kernel 3.8.5)? Would btrfs provide a significant performance

[ceph-users] Privileges for read-only CephFS access?

2015-02-18 Thread Oliver Schulz
Dear Ceph Experts, is it possible to define a Ceph user/key with privileges that allow for read-only CephFS access but do not allow write or other modifications to the Ceph cluster? I would like to export a sub-tree of our CephFS via HTTPS. Alas, web-servers are inviting targets, so in the

Re: [ceph-users] Privileges for read-only CephFS access?

2015-02-18 Thread Oliver Schulz
Hi Florian, On 18.02.2015 22:58, Florian Haas wrote: is it possible to define a Ceph user/key with privileges that allow for read-only CephFS access but do not allow All you should need to do is [...] However, I've just tried the above with ceph-fuse on firefly, and [...] So I believe you've

Re: [ceph-users] Privileges for read-only CephFS access?

2015-02-18 Thread Oliver Schulz
Dear Greg, On 18.02.2015 23:41, Gregory Farnum wrote: is it possible to define a Ceph user/key with privileges that allow for read-only CephFS access but do not allow ...and deletes, unfortunately. :( I don't think this is presently a thing it's possible to do until we get a much better user

[ceph-users] How to identify MDS client failing to respond to capability release?

2015-07-30 Thread Oliver Schulz
Hello Ceph Experts, lately, ceph status on our cluster often states: mds0: Client CLIENT_ID failing to respond to capability release How can I identify which client is at fault (hostname or IP address) from the CLIENT_ID? What could be the source of the failing to respond to capability

Re: [ceph-users] How to identify MDS client failing to respond to capability release?

2015-07-30 Thread Oliver Schulz
a more recent kernel or a fuse client. John On 30/07/15 08:32, Oliver Schulz wrote: Hello Ceph Experts, lately, ceph status on our cluster often states: mds0: Client CLIENT_ID failing to respond to capability release How can I identify which client is at fault (hostname or IP address) from

[ceph-users] How to repair MDS damage?

2017-02-14 Thread Oliver Schulz
Dear Ceph Experts, after upgrading our Ceph cluster from Hammer to Jewel, the MDS (after a few days) found some metadata damage: # ceph status [...] health HEALTH_ERR mds0: Metadata damage detected [...] The output of # ceph tell mds.0 damage ls is: [ {

Re: [ceph-users] Shared WAL/DB device partition for multiple OSDs?

2018-05-12 Thread Oliver Schulz
Dear David, On 11.05.2018 22:10, David Turner wrote: For if you should do WAL only on the NVMe vs use a filestore journal, that depends on your write patterns, use case, etc. we mostly use CephFS, for scientific data processing. It's mainly larger files (10 MB to 10 GB, but sometimes also a

Re: [ceph-users] Shared WAL/DB device partition for multiple OSDs?

2018-05-12 Thread Oliver Schulz
more longevity to the life of the drive. You cannot change the size of any part of a bluestore osd after creation. On Sat, May 12, 2018, 3:09 PM Oliver Schulz <oliver.sch...@tu-dortmund.de <mailto:oliver.sch...@tu-dortmund.de>> wrote: Dear David, On 11.05.2018 22:10,

Re: [ceph-users] Shared WAL/DB device partition for multiple OSDs?

2018-05-11 Thread Oliver Schulz
something like: $ ceph-deploy osd create --bluestore --data=/dev/sdb --block-db /dev/nvme0n1p1 $HOSTNAME $ ceph-deploy osd create --bluestore --data=/dev/sdc --block-db /dev/nvme0n1p1 $HOSTNAME On Fri, May 11, 2018 at 10:35 AM Oliver Schulz <oliver.sch...@tu-dortmund.de <mailto:oliver.sch

Re: [ceph-users] Shared WAL/DB device partition for multiple OSDs?

2018-05-11 Thread Oliver Schulz
576M 0 part > ├─nvme0n1p15 259:15 0 1G 0 part > ├─nvme0n1p16 259:16 0 576M 0 part > ├─nvme0n1p17 259:17 0 1G 0 part > ├─nvme0n1p18 259:18 0 576M 0 part > ├─nvme0n1p19 259:19 0 1G 0 part > ├─nvme0n1p20 259:20 0 576M 0 part > ├─nvme0

[ceph-users] Shared WAL/DB device partition for multiple OSDs?

2018-05-11 Thread Oliver Schulz
Dear Ceph Experts, I'm trying to set up some new OSD storage nodes, now with bluestore (our existing nodes still use filestore). I'm a bit unclear on how to specify WAL/DB devices: Can several OSDs share one WAL/DB partition? So, can I do ceph-deploy osd create --bluestore

Re: [ceph-users] Shared WAL/DB device partition for multiple OSDs?

2018-05-11 Thread Oliver Schulz
Dear David, thanks a lot for the detailed answer(s) and clarifications! Can I ask just a few more questions? On 11.05.2018 18:46, David Turner wrote: partitions is 10GB per 1TB of OSD.  If your OSD is a 4TB disk you should be looking closer to a 40GB block.db partition.  If your block.db

[ceph-users] Increasing number of PGs by not a factor of two?

2018-05-16 Thread Oliver Schulz
Dear all, we have a Ceph cluster that has slowly evolved over several years and Ceph versions (started with 18 OSDs and 54 TB in 2013, now about 200 OSDs and 1.5 PB, still the same cluster, with data continuity). So there are some "early sins" in the cluster configuration, left over from the

[ceph-users] How to fix a Ceph PG in unkown state with no OSDs?

2018-06-14 Thread Oliver Schulz
Dear all, I have a serious problem with our Ceph cluster: One of our PGs somehow ended up in this state (reported by "ceph health detail": pg 1.XXX is stuck inactive for ..., current state unknown, last acting [] Also, "ceph pg map 1.xxx" reports: osdmap e525812 pg 1.721 (1.721) -> up

Re: [ceph-users] How to fix a Ceph PG in unkown state with no OSDs?

2018-06-14 Thread Oliver Schulz
reported as inactive. On Thu, Jun 14, 2018 at 8:40 AM Oliver Schulz mailto:oliver.sch...@tu-dortmund.de>> wrote: Dear all, I have a serious problem with our Ceph cluster: One of our PGs somehow ended up in this state (reported by "ceph health detail":      

Re: [ceph-users] How to fix a Ceph PG in unkown state with no OSDs?

2018-06-14 Thread Oliver Schulz
pg reporting, but I believe if it’s reporting the state as unknown that means *no* running osd which contains any copy of that pg. That’s not something which ceph could do on its own without failures of osds. What’s the output of “ceph -s”? On Thu, Jun 14, 2018 at 2:15 PM Oliver Schulz mai

Re: [ceph-users] How to fix a Ceph PG in unkown state with no OSDs?

2018-06-14 Thread Oliver Schulz
n > changes. How do I do that, best? Just set all the weights back to 1.00? Cheers, Oliver P.S.: Thanks so much for helping! On 14.06.2018 21:37, Gregory Farnum wrote: On Thu, Jun 14, 2018 at 3:26 PM Oliver Schulz mailto:oliver.sch...@tu-dortmund.de>> wrote: But the content

Re: [ceph-users] How to fix a Ceph PG in unkown state with no OSDs?

2018-06-14 Thread Oliver Schulz
disaster recovery tools. Your cluster was offline here because it couldn't do some writes, but it should still be self-consistent. On Thu, Jun 14, 2018 at 4:52 PM Oliver Schulz mailto:oliver.sch...@tu-dortmund.de>> wrote: They are recovered now, looks like it just took a bit fo

Re: [ceph-users] How to fix a Ceph PG in unkown state with no OSDs?

2018-06-14 Thread Oliver Schulz
-o crushmap crushtool -d crushmap -o crushmap.txt Paul 2018-06-14 22:39 GMT+02:00 Oliver Schulz <mailto:oliver.sch...@tu-dortmund.de>>: Thanks, Greg!! I reset all the OSD weights to 1.00, and I think I'm in a much better state now. The only trouble left in "ceph

Re: [ceph-users] How to fix a Ceph PG in unkown state with no OSDs?

2018-06-14 Thread Oliver Schulz
ery_wait", after. Cheers, Oliver On 14.06.2018 22:09, Gregory Farnum wrote: On Thu, Jun 14, 2018 at 4:07 PM Oliver Schulz mailto:oliver.sch...@tu-dortmund.de>> wrote: Hi Greg, I increased the hard limit and rebooted everything. The PG without acting OSDs still has none, but

Re: [ceph-users] How to fix a Ceph PG in unkown state with no OSDs?

2018-06-14 Thread Oliver Schulz
in various versions and forcing that kind of priority without global decision making is prone to issues. But yep, looks like things will eventually become all good now. :) On Thu, Jun 14, 2018 at 4:39 PM Oliver Schulz mailto:oliver.sch...@tu-dortmund.de>> wrote: Thanks, Greg!! I

Re: [ceph-users] How to fix a Ceph PG in unkown state with no OSDs?

2018-06-14 Thread Oliver Schulz
Ah, I see some OSDs actually are over the 200 PG limit - I'll increase the hard limit and restart everything. On 14.06.2018 21:26, Oliver Schulz wrote: But the contents of the remapped PGs should still be Ok, right? What confuses me is that they don't backfill - why don't the "move&q

Re: [ceph-users] How to fix a Ceph PG in unkown state with no OSDs?

2018-06-14 Thread Oliver Schulz
overdose protection that was added in luminous. Check the list archives for the exact name, but you’ll want to increase the pg hard limit and restart the osds that exceeded the previous/current setting. -Greg On Thu, Jun 14, 2018 at 2:33 PM Oliver Schulz mailto:oliver.sch...@tu-dortmund.de>> wr

Re: [ceph-users] Backfill stops after a while after OSD reweight

2018-06-23 Thread Oliver Schulz
Hi Konstantin, thanks! "set-all-straw-buckets-to-straw2" was what I was looking for. Didn't see it in the docs. Thanks again! Cheers, Oliver On 23.06.2018 06:39, Konstantin Shalygin wrote: Yes, I know that section of the docs, but can't find how to change the crush rules after "ceph osd

Re: [ceph-users] Backfill stops after a while after OSD reweight

2018-06-20 Thread Oliver Schulz
common). Values used for EC are set_chooseleaf_tries = 5 and set_choose_tries = 100. You can configure them by adding them as the first steps of the rule. You can also configure an upmap exception. But in general it is often not the best idea to have only 3 racks for repli

[ceph-users] Backfill stops after a while after OSD reweight

2018-06-20 Thread Oliver Schulz
Dear all, we (somewhat) recently extended our Ceph cluster, and updated it to Luminous. By now, the fill level on some ODSs is quite high again, so I'd like to re-balance via "OSD reweight". I'm running into the following problem, however: Not matter what I do (reweigt a little, or a lot, or

Re: [ceph-users] CephFS with erasure coding, do I need a cache-pool?

2018-07-17 Thread Oliver Schulz
store on your CephFS. Cheers, Linh *From:* ceph-users on behalf of Oliver Schulz *Sent:* Sunday, 15 July 2018 9:46:16 PM *To:* ceph-users *Subject:* [ceph-users] CephFS with erasure coding, do I need a cache-pool

Re: [ceph-users] CephFS with erasure coding, do I need a cache-pool?

2018-07-17 Thread Oliver Schulz
Hi Greg, On 17.07.2018 03:01, Gregory Farnum wrote: Since Luminous, you can use an erasure coded pool (on bluestore) directly as a CephFS data pool, no cache pool needed. More than that, we'd really prefer you didn't use cache pools for anything. Just Say No. :) Thanks for the

Re: [ceph-users] CephFS with erasure coding, do I need a cache-pool?

2018-07-17 Thread Oliver Schulz
but increase latency especially for small files, so it also depends on how important performance is and what kind of file size you store on your CephFS. Cheers, Linh *From:* ceph-users on behalf of Oliver Schulz *Sent

Re: [ceph-users] CephFS with erasure coding, do I need a cache-pool?

2018-07-17 Thread Oliver Schulz
On 18.07.2018 00:43, Gregory Farnum wrote: > But you could also do workaround like letting it choose (K+M)/2 racks > and putting two shards in each rack. Oh yes, you are more susceptible to top-of-rack switch failures in this case or whatever. It's just one option — many people

Re: [ceph-users] CephFS with erasure coding, do I need a cache-pool?

2018-07-17 Thread Oliver Schulz
h special allocation.  Cheers, Linh ---- *From:* Oliver Schulz *Sent:* Tuesday, 17 July 2018 11:39:26 PM *To:* Linh Vu; ceph-users *Subject:* Re: [ceph-users] CephFS with erasure coding, do I need a cache-pool? Dear Linh, anothe

[ceph-users] CephFS with erasure coding, do I need a cache-pool?

2018-07-15 Thread Oliver Schulz
Dear all, we're planning a new Ceph-Clusterm, with CephFS as the main workload, and would like to use erasure coding to use the disks more efficiently. Access pattern will probably be more read- than write-heavy, on average. I don't have any practical experience with erasure- coded pools so

Re: [ceph-users] CephFS with erasure coding, do I need a cache-pool?

2018-07-19 Thread Oliver Schulz
the P3700 we had, and allow us to get more out of our flash drives. ---- *From:* Oliver Schulz *Sent:* Wednesday, 18 July 2018 12:00:14 PM *To:* Linh Vu; ceph-users *Subject:* Re: [ceph-users] CephFS with erasure coding,

Re: [ceph-users] CephFS with erasure coding, do I need a cache-pool?

2018-07-19 Thread Oliver Schulz
about 2x faster than the P3700 we had, and allow us to get more out of our flash drives. *From:* Oliver Schulz *Sent:* Wednesday, 18 July 2018 12:00:14 PM *To:* Linh Vu; ceph-users *Subject:* Re: [ceph-users] CephFS

Re: [ceph-users] CephFS with erasure coding, do I need a cache-pool?

2018-07-16 Thread Oliver Schulz
Dear John, On 16.07.2018 16:25, John Spray wrote: Since Luminous, you can use an erasure coded pool (on bluestore) directly as a CephFS data pool, no cache pool needed. Great! I'll be happy to go without a cache pool then. Thanks for your help, John, Oliver

[ceph-users] Using ceph deploy with mon.a instead of mon.hostname?

2018-04-20 Thread Oliver Schulz
Dear Ceph Experts, I'm try to switch an old Ceph cluster from manual administration to ceph-deploy, but I'm running into the following error: # ceph-deploy gatherkeys HOSTNAME [HOSTNAME][INFO ] Running command: /usr/bin/ceph --connect-timeout=25 --cluster=ceph

Re: [ceph-users] Using ceph deploy with mon.a instead of mon.hostname?

2018-04-20 Thread Oliver Schulz
find 'MON_HOSTNAME' in monmap Any ideas? Cheers, Oliver On 04/20/2018 11:46 AM, Stefan Kooman wrote: Quoting Oliver Schulz (oliver.sch...@tu-dortmund.de): Dear Ceph Experts, I'm try to switch an old Ceph cluster from manual administration to ceph-deploy, but I'm running into the following e

Re: [ceph-users] Backfill stops after a while after OSD reweight

2018-06-22 Thread Oliver Schulz
bles 2018-06-20 18:27 GMT+02:00 Oliver Schulz mailto:oliver.sch...@tu-dortmund.de>>: Thanks, Paul - I could probably activate the Jewel tunables profile without losing too many clients - most are running at least kernel 4.2, I think. I'll go hunting for older clients ...

[ceph-users] Ceph bluestore performance on 4kn vs. 512e?

2019-02-25 Thread Oliver Schulz
Dear all, in real-world use, is there a significant performance benefit in using 4kn instead of 512e HDDs (using Ceph bluestore with block-db on NVMe-SSD)? Cheers and thanks for any advice, Oliver ___ ceph-users mailing list