Re: [ceph-users] EC related osd crashes (luminous 12.2.4)

2018-04-05 Thread Adam Tygart
Thank you! Setting norecover has seemed to work in terms of keeping the osds up. I am glad my logs were of use to tracking this down. I am looking forward to future updates. Let me know if you need anything else. -- Adam On Thu, Apr 5, 2018 at 10:13 PM, Josh Durgin wrote: >

Re: [ceph-users] EC related osd crashes (luminous 12.2.4)

2018-04-05 Thread Josh Durgin
On 04/05/2018 08:11 PM, Josh Durgin wrote: On 04/05/2018 06:15 PM, Adam Tygart wrote: Well, the cascading crashes are getting worse. I'm routinely seeing 8-10 of my 518 osds crash. I cannot start 2 of them without triggering 14 or so of them to crash repeatedly for more than an hour. I've ran

Re: [ceph-users] EC related osd crashes (luminous 12.2.4)

2018-04-05 Thread Josh Durgin
On 04/05/2018 06:15 PM, Adam Tygart wrote: Well, the cascading crashes are getting worse. I'm routinely seeing 8-10 of my 518 osds crash. I cannot start 2 of them without triggering 14 or so of them to crash repeatedly for more than an hour. I've ran another one of them with more logging, debug

Re: [ceph-users] EC related osd crashes (luminous 12.2.4)

2018-04-05 Thread Adam Tygart
Well, the cascading crashes are getting worse. I'm routinely seeing 8-10 of my 518 osds crash. I cannot start 2 of them without triggering 14 or so of them to crash repeatedly for more than an hour. I've ran another one of them with more logging, debug osd = 20; debug ms = 1 (definitely more than

[ceph-users] EC related osd crashes (luminous 12.2.4)

2018-04-05 Thread Adam Tygart
Hello all, I'm having some stability issues with my ceph cluster at the moment. Using CentOS 7, and Ceph 12.2.4. I have osds that are segfaulting regularly. roughly every minute or so, and it seems to be getting worse, now with cascading failures. Backtraces look like this: ceph version 12.2.4

Re: [ceph-users] Cephfs hardlink snapshot

2018-04-05 Thread Patrick Donnelly
Hi Marc, On Wed, Apr 4, 2018 at 11:21 PM, Marc Roos wrote: > > 'Hard links do not interact well with snapshots' is this still an issue? > Because I am using rsync and hardlinking. And it would be nice if I can > snapshot the directory, instead of having to copy it.

[ceph-users] Luminous and Bluestore: low load and high latency on RBD

2018-04-05 Thread Alex Gorbachev
I am seeing a strange behavior with Luminous 12.2.4 and Bluestore, Ubuntu kernel 4.14 and 108 x 10K RPM Seagate drives, no SSD WAL/DB and 8GB Areca controllers, 10 GbE networking on OSD nodes and 56GbE on clients. Single stream IOs to RD volumes return with 50-1000 ms latency, while atop shows

Re: [ceph-users] ceph-deploy: recommended?

2018-04-05 Thread ceph . novice
... we use (only!) ceph-deploy in all our environments, tools and scripts. If I look in the efforts went into ceph-volume and all the related issues, "manual LVM" overhead and/or still missing features, PLUS the in the same discussions mentioned recommendations to use something like

[ceph-users] Ceph luminous - troubleshooting performance issues overall DSK 100%, busy 1%

2018-04-05 Thread Steven Vacaroaia
Hi, I have a strange issue - OSDs from a specific server are introducing huge performance issue This is a brand new installation on 3 identical servers - DELL R620 with PERC H710 , bluestore DB and WAL on SSD, 10GB dedicated private/public networks When I add the OSD I see gaps like below

Re: [ceph-users] Mon scrub errors

2018-04-05 Thread kefu chai
On Thu, Apr 5, 2018 at 3:10 PM, Rickard Nilsson wrote: > Hi, > > Im' having a cluster with three moitors, two mds and nine osd. Lately I've > been getting scrub errors from the monitors; > > 2018-04-05 07:26:52.147185 [ERR] mon.2 ScrubResult(keys >

[ceph-users] Cephfs hardlink snapshot

2018-04-05 Thread Marc Roos
'Hard links do not interact well with snapshots' is this still an issue? Because I am using rsync and hardlinking. And it would be nice if I can snapshot the directory, instead of having to copy it. http://docs.ceph.com/docs/master/dev/cephfs-snapshots/

Re: [ceph-users] rgw make container private again

2018-04-05 Thread Valéry Tschopp
There is an existing bug related to the impossibility to delete the read ACL of Swift. Check https://tracker.ceph.com/issues/22897 V. On 30/03/18, 18:11 , "ceph-users on behalf of Vladimir Prokofev" wrote: As usual, I

Re: [ceph-users] bluestore OSD did not start at system-boot

2018-04-05 Thread Nico Schottelius
Hey Ansgar, we have a similar "problem": in our case all servers are wiped on reboot, as they boot their operating system from the network into initramfs. While the OS configuration is done with cdist [0], we consider ceph osds more dynamic data and just re-initialise all osds on boot using the

[ceph-users] bluestore OSD did not start at system-boot

2018-04-05 Thread Ansgar Jazdzewski
hi folks, i just figured out that my ODS's did not start because the filsystem is not mounted. So i wrote a script to Hack my way around it # #! /usr/bin/env bash DATA=( $(ceph-volume lvm list | grep -e 'osd id\|osd fsid' | awk '{print $3}' | tr '\n' ' ') ) OSDS=$(( ${#DATA[@]}/2 )) for

Re: [ceph-users] Use trimfs on already mounted RBD image

2018-04-05 Thread Damian Dabrowski
Ok, thanks for Your reply. 2018-04-05 8:15 GMT+02:00 Wido den Hollander : > > > On 04/04/2018 07:30 PM, Damian Dabrowski wrote: > > Hello, > > > > I wonder if it is any way to run `trimfs` on rbd image which is > > currently used by the KVM process? (when I don't have access to

Re: [ceph-users] Rados bucket issues, default.rgw.buckets.index growing every day

2018-04-05 Thread Mark Schouten
On Wed, 2018-04-04 at 09:38 +0200, Mark Schouten wrote: > I have some issues with my bucket index. As you can see in the > attachment, everyday around 16:30, the amount of objects in the > default.rgw.buckets.index increases. This happens since upgrading > from > 12.2.2 to 12.2.4. I disabled

[ceph-users] Mon scrub errors

2018-04-05 Thread Rickard Nilsson
Hi, Im' having a cluster with three moitors, two mds and nine osd. Lately I've been getting scrub errors from the monitors; 2018-04-05 07:26:52.147185 [ERR] mon.2 ScrubResult(keys {osd_pg_creating=1,osdmap=99} crc {osd_pg_creating=1404726104,osdmap=3323124730}) 2018-04-05 07:26:52.147167 [ERR]

Re: [ceph-users] ceph-deploy: recommended?

2018-04-05 Thread Wido den Hollander
On 04/04/2018 08:58 PM, Robert Stanford wrote: > >  I read a couple of versions ago that ceph-deploy was not recommended > for production clusters.  Why was that?  Is this still the case?  We > have a lot of problems automating deployment without ceph-deploy. > > In the end it is just a

[ceph-users] Ceph Dashboard IRC Channel

2018-04-05 Thread Kai Wagner
Hi all, we've created a new #ceph-dashboard channel on OFTC to talk about all the related dashboard functionalities and developments. This means that the old "openattic" channel on Freenode is just for openATTIC and everything new regarding the mgr module will now be discussed in the new channel

Re: [ceph-users] ceph-deploy: recommended?

2018-04-05 Thread Dietmar Rieder
On 04/04/2018 08:58 PM, Robert Stanford wrote: > >  I read a couple of versions ago that ceph-deploy was not recommended > for production clusters.  Why was that?  Is this still the case?  We > have a lot of problems automating deployment without ceph-deploy. > We are using it in production on

Re: [ceph-users] Use trimfs on already mounted RBD image

2018-04-05 Thread Wido den Hollander
On 04/04/2018 07:30 PM, Damian Dabrowski wrote: > Hello, > > I wonder if it is any way to run `trimfs` on rbd image which is > currently used by the KVM process? (when I don't have access to VM) > > I know that I can do this by qemu-guest-agent but not all VMs have it > installed. > > I can't

Re: [ceph-users] What do you use to benchmark your rgw?

2018-04-05 Thread Thomas Bennett
Hi Mathew, We approached the problem by first running swift-bench for performance tuning and configuration. Since it was the easiest to get up and running and test the gateway. Then we wrote a python script using python boto and python futures to model our usecase and test s3. We found the most