Re: [ceph-users] Show and Tell: Grafana cluster dashboard

2018-05-07 Thread Kai Wagner
Looks very good. Is it anyhow possible to display the reason why a cluster is in an error or warning state? Thinking about the output from ceph -s if this could by shown in case there's a failure. I think this will not be provided by default but wondering if it's possible to add. Kai On 05/07/201

[ceph-users] list submissions

2018-05-07 Thread ZHONG
___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Show and Tell: Grafana cluster dashboard

2018-05-07 Thread Konstantin Shalygin
And a question: Is there a way to get the Cluster IOPS with prometheus metrics? I did this with collectd, but can't find a suitable metric from ceph-mgr. sum(irate(ceph_pool_rd[30s])) sum(irate(ceph_pool_wr[30s])) k ___ ceph-users mailing list c

[ceph-users] Shutting down: why OSDs first?

2018-05-07 Thread Bryan Henderson
There is a lot of advice around on shutting down a Ceph cluster that says to shut down the OSDs before the monitors and bring up the monitors before the OSDs, but no one explains why. I would have thought it would be better to shut down the monitors first and bring them up last, so they don't have

Re: [ceph-users] What is the meaning of size and min_size for erasure-coded pools?

2018-05-07 Thread Maciej Puzio
Paul, many thanks for your reply. Thinking about it, I can't decide if I'd prefer to operate the storage server without redundancy, or have it automatically force a downtime, subjecting me to a rage of my users and my boss. But I think that the typical expectation is that system serves the data whi

Re: [ceph-users] What is the meaning of size and min_size for erasure-coded pools?

2018-05-07 Thread Paul Emmerich
The docs seem wrong here. min_size is available for erasure coded pools and works like you'd expect it to work. Still, it's not a good idea to reduce it to the number of data chunks. Paul 2018-05-07 23:26 GMT+02:00 Maciej Puzio : > I am an admin in a research lab looking for a cluster storage >

[ceph-users] What is the meaning of size and min_size for erasure-coded pools?

2018-05-07 Thread Maciej Puzio
I am an admin in a research lab looking for a cluster storage solution, and a newbie to ceph. I have setup a mini toy cluster on some VMs, to familiarize myself with ceph and to test failure scenarios. I am using ceph 12.2.4 on Ubuntu 18.04. I created 5 OSDs (one OSD per VM), an erasure-coded pool

Re: [ceph-users] cephfs-data-scan safety on active filesystem

2018-05-07 Thread Gregory Farnum
Absolutely not. Please don't do this. None of the CephFS disaster recovery tooling in any way plays nicely with a live filesystem. I haven't looked at these docs in a while, are they not crystal clear about all these operations being offline and in every way dangerous? :/ -Greg On Mon, May 7, 2018

[ceph-users] cephfs-data-scan safety on active filesystem

2018-05-07 Thread Ryan Leimenstoll
Hi All, We recently experienced a failure with our 12.2.4 cluster running a CephFS instance that resulted in some data loss due to a seemingly problematic OSD blocking IO on its PGs. We restarted the (single active) mds daemon during this, which caused damage due to the journal not having the

Re: [ceph-users] Proper procedure to replace DB/WAL SSD

2018-05-07 Thread Alfredo Deza
On Wed, May 2, 2018 at 12:18 PM, Nicolas Huillard wrote: > Le dimanche 08 avril 2018 à 20:40 +, Jens-U. Mozdzen a écrit : >> sorry for bringing up that old topic again, but we just faced a >> corresponding situation and have successfully tested two migration >> scenarios. > > Thank you very mu

Re: [ceph-users] slow requests are blocked

2018-05-07 Thread Jean-Charles Lopez
Hi, ceph health detail This will tell you which OSDs are experiencing the problem so you can then go and inspect the logs and use the admin socket to find out which requests are at the source. Regards JC > On May 7, 2018, at 03:52, Grigory Murashov wrote: > > Hello! > > I'm not much experi

Re: [ceph-users] Show and Tell: Grafana cluster dashboard

2018-05-07 Thread Dietmar Rieder
+1 for supporting both! Disclosure: Prometheus user Dietmar On 05/07/2018 04:53 PM, Reed Dier wrote: > I’ll +1 on InfluxDB rather than Prometheus, though I think having a version > for each infrastructure path would be best. > I’m sure plenty here have existing InfluxDB infrastructure as their

Re: [ceph-users] Show and Tell: Grafana cluster dashboard

2018-05-07 Thread Wido den Hollander
On 05/07/2018 04:53 PM, Reed Dier wrote: > I’ll +1 on InfluxDB rather than Prometheus, though I think having a version > for each infrastructure path would be best. > I’m sure plenty here have existing InfluxDB infrastructure as their TSDB of > choice, and moving to Prometheus would be less adv

Re: [ceph-users] Show and Tell: Grafana cluster dashboard

2018-05-07 Thread Reed Dier
I’ll +1 on InfluxDB rather than Prometheus, though I think having a version for each infrastructure path would be best. I’m sure plenty here have existing InfluxDB infrastructure as their TSDB of choice, and moving to Prometheus would be less advantageous. Conversely, I’m sure all of the Prometh

Re: [ceph-users] Deleting an rbd image hangs

2018-05-07 Thread Jan Marquardt
Am 30.04.18 um 09:26 schrieb Jan Marquardt: > Am 27.04.18 um 20:48 schrieb David Turner: >> This old [1] blog post about removing super large RBDs is not relevant >> if you're using object map on the RBDs, however it's method to manually >> delete an RBD is still valid.  You can see if this works

Re: [ceph-users] Show and Tell: Grafana cluster dashboard

2018-05-07 Thread Marc Roos
Looks nice - I rather have some dashboards with collectd/influxdb. - Take into account bigger tv/screens eg 65" uhd. I am putting more stats on them than viewing them locally in a webbrowser. - What is to be considered most important to have on your ceph dashboard? As a newbie I find it diffic

Re: [ceph-users] Show and Tell: Grafana cluster dashboard

2018-05-07 Thread Jan Fajerski
On Mon, May 07, 2018 at 02:45:14PM +0200, Kurt Bauer wrote: Jan Fajerski 7. May 2018 at 14:21 On Mon, May 07, 2018 at 02:05:59PM +0200, Kurt Bauer wrote: Hi Jan, first of all thanks for this dashboard. A few comments: -) 'vonage-status-panel' is needed, which i

Re: [ceph-users] Show and Tell: Grafana cluster dashboard

2018-05-07 Thread Kurt Bauer
Jan Fajerski 7. May 2018 at 14:21 On Mon, May 07, 2018 at 02:05:59PM +0200, Kurt Bauer wrote: Hi Jan, first of all thanks for this dashboard. A few comments: -) 'vonage-status-panel' is needed, which isn't mentioned in the ReadMe Yes, my bad. Will update t

Re: [ceph-users] Show and Tell: Grafana cluster dashboard

2018-05-07 Thread Jan Fajerski
On Mon, May 07, 2018 at 02:05:59PM +0200, Kurt Bauer wrote: Hi Jan, first of all thanks for this dashboard. A few comments: -) 'vonage-status-panel' is needed, which isn't mentioned in the ReadMe Yes, my bad. Will update the README -) Using ceph 12.2.4 the mon metric for me is apparen

Re: [ceph-users] Show and Tell: Grafana cluster dashboard

2018-05-07 Thread Kurt Bauer
Hi Jan, first of all thanks for this dashboard. A few comments: -) 'vonage-status-panel' is needed, which isn't mentioned in the ReadMe -) Using ceph 12.2.4 the mon metric for me is apparently called 'ceph_mon_quorum_count' not 'ceph_mon_quorum_status' And a question: Is there a way to get the

Re: [ceph-users] something missing in filestore to bluestore conversion

2018-05-07 Thread Eugen Block
Small correction: the correct command to delete the auth caps would be "ceph auth del osd.", of course. The OSDs will start immediately after completing the "ceph-volume prepare", but they won't start on a clean reboot. It seems that the "prepare" is mounting the /var/lib/ceph/osd/ceph-os

Re: [ceph-users] something missing in filestore to bluestore conversion

2018-05-07 Thread Alfredo Deza
On Mon, May 7, 2018 at 7:24 AM, Eugen Block wrote: > Hi, > > I'm not sure if this is deprecated or something, but I usually have to > execute an additional "ceph auth del " before recreating an OSD. > Otherwise the OSD fails to start. Maybe this is a missing step. > > Regards, > Eugen > > > Zitat

Re: [ceph-users] something missing in filestore to bluestore conversion

2018-05-07 Thread Gary Molenkamp
Thanks Eugen, The OSDs will start immediately after completing the "ceph-volume prepare", but they won't start on a clean reboot.   It seems that the "prepare" is mounting the /var/lib/ceph/osd/ceph-osdX path/structure but this is missing now in my boot process. Gary. On 2018-05-07 07:24 A

Re: [ceph-users] something missing in filestore to bluestore conversion

2018-05-07 Thread Eugen Block
Hi, I'm not sure if this is deprecated or something, but I usually have to execute an additional "ceph auth del " before recreating an OSD. Otherwise the OSD fails to start. Maybe this is a missing step. Regards, Eugen Zitat von Gary Molenkamp : Good morning all, Last week I started co

[ceph-users] something missing in filestore to bluestore conversion

2018-05-07 Thread Gary Molenkamp
Good morning all, Last week I started converting my filestore based OSDs to bluestore using the following steps assembled from documentation and mailing list: admin:  ceph osd out ${OSD_ID} on stor-node: systemctl kill ceph-osd@${OSD_ID} umount /var/lib/ceph/osd/ceph-${OSD_ID} ceph-disk zap /

[ceph-users] slow requests are blocked

2018-05-07 Thread Grigory Murashov
Hello! I'm not much experiensed in ceph troubleshouting that why I ask for help. I have multiple warnings coming from zabbix as a result of ceph -s REQUEST_SLOW: HEALTH_WARN : 21 slow requests are blocked > 32 sec I don't see any hardware problems that time. I'm able to find the same strings

[ceph-users] Show and Tell: Grafana cluster dashboard

2018-05-07 Thread Jan Fajerski
Hi all, I'd like to request comments and feedback about a Grafana dashboard for Ceph cluster monitoring. https://youtu.be/HJquM127wMY https://github.com/ceph/ceph/pull/21850 The goal is to eventually have a set of default dashboards in the Ceph repository that offer decent monitoring for clu

Re: [ceph-users] Luminous update 12.2.4 -> 12.2.5 mds 'stuck' in rejoin

2018-05-07 Thread Marc Roos
I have a mds.a and a mds.c if I stop mds.a it looks like osds are going down again. If I keep mds in rejoin osd's stay up. -Original Message- From: Marc Roos Sent: maandag 7 mei 2018 6:51 To: ceph-users Subject: [ceph-users] Luminous update 12.2.4 -> 12.2.5 mds 'stuck' in rejoin mds:

[ceph-users] Luminous update 12.2.4 -> 12.2.5 mds 'stuck' in rejoin

2018-05-07 Thread Marc Roos
mds: cephfs-1/1/1 up {0=a=up:rejoin}, 1 up:standby 2018-05-07 11:37:29.006507 7ff32bc69700 1 heartbeat_map is_healthy 'MDSRank' had timed out after 15 2018-05-07 11:37:29.006515 7ff32bc69700 1 mds.beacon.a _send skipping beacon, heartbeat map not healthy 2018-05-07 11:37:32.943408 7ff32fc71

[ceph-users] ceph-mgr does not start after upgrade to 12.2.5

2018-05-07 Thread Iban Cabrillo
Hi, After upgrade to 12.2.5 Centos7v3 (12.2.2 was the previous version) and restart mons, mgrs, the dashboard is unaccesible: ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous (stable) [root@cephmon03 ~]# ceph -s cluster: id: 6f5a65a7-316c-4825-afcb-4286

[ceph-users] Upgrade from 12.2.4 to 12.2.5 osd/down up, logs flooded heartbeat_check: no reply from

2018-05-07 Thread Marc Roos
And logs flooded with such messages. May 7 10:47:48 c01 ceph-osd: 2018-05-07 10:47:48.201963 7f7d94afc700 -1 osd.7 19394 heartbeat_check: no reply from 192.168.10.112:6804 osd.10 ever on either front or back, first ping sent 2018 -05-07 10:47:20.970982 (cutoff 2018-05-07 10:47:28.201961) May

Re: [ceph-users] Luminous radosgw S3/Keystone integration issues

2018-05-07 Thread Dan van der Ster
Hi Matt, That's great! I sent the PR here: https://github.com/ceph/ceph/pull/21846 I don't have the necessary karma, but it would be really nice if this could be added the the luminous backport queue. Thanks! Dan On Fri, May 4, 2018 at 5:18 PM, Matt Benjamin wrote: > Hi Dan, > > We agreed in

Re: [ceph-users] mgr dashboard differs from ceph status

2018-05-07 Thread Tracy Reed
On Mon, May 07, 2018 at 12:13:00AM PDT, Janne Johansson spake thusly: > > mgr: ceph01(active), standbys: ceph-ceph07, ceph03 > > Don't know if it matters, but the naming seems different even though I guess > you are running mgr's on the same nodes as the mons, but ceph07 is called > "ceph-ceph

Re: [ceph-users] mgr dashboard differs from ceph status

2018-05-07 Thread Janne Johansson
2018-05-04 8:21 GMT+02:00 Tracy Reed : > > services: > mon: 3 daemons, quorum ceph01,ceph03,ceph07 > mgr: ceph01(active), standbys: ceph-ceph07, ceph03 > osd: 78 osds: 78 up, 78 in > Don't know if it matters, but the naming seems different even though I guess you are running mgr's o