Re: [ceph-users] separate monitoring node

2018-06-22 Thread Reed Dier
> On Jun 22, 2018, at 2:14 AM, Stefan Kooman wrote: > > Just checking here: Are you using the telegraf ceph plugin on the nodes? > In that case you _are_ duplicating data. But the good news is that you > don't need to. There is a Ceph mgr telegraf plugin now (mimic) which > also works on

Re: [ceph-users] Recovery after datacenter outage

2018-06-22 Thread Gregory Farnum
On Fri, Jun 22, 2018 at 2:26 AM Christian Zunker wrote: > Hi List, > > we are running a ceph cluster (12.2.5) as backend to our OpenStack cloud. > > Yesterday our datacenter had a power outage. As this wouldn't be enough, > we also had a separated ceph cluster because of networking problems. > >

Re: [ceph-users] separate monitoring node

2018-06-22 Thread Stefan Kooman
Quoting Reed Dier (reed.d...@focusvq.com): > > > On Jun 22, 2018, at 2:14 AM, Stefan Kooman wrote: > > > > Just checking here: Are you using the telegraf ceph plugin on the nodes? > > In that case you _are_ duplicating data. But the good news is that you > > don't need to. There is a Ceph mgr

Re: [ceph-users] unfound blocks IO or gives IO error?

2018-06-22 Thread Gregory Farnum
On Fri, Jun 22, 2018 at 6:22 AM Sergey Malinin wrote: > From > http://docs.ceph.com/docs/mimic/rados/troubleshooting/troubleshooting-pg/ > : > > "Now 1 knows that these object exist, but there is no live ceph-osd who > has a copy. In this case, IO to those objects will block, and the cluster >

Re: [ceph-users] Recovery after datacenter outage

2018-06-22 Thread Jason Dillaman
It sounds like your OpenStack users do not have the correct caps to blacklist dead clients. See step 6 in the upgrade section of Luminous’ release notes or (preferably) use the new “profile rbd”-style caps if you don’t use older clients. The reason why repairing the object map seemed to fix

Re: [ceph-users] Backfill stops after a while after OSD reweight

2018-06-22 Thread Konstantin Shalygin
Yes, I know that section of the docs, but can't find how to change the crush rules after "ceph osd crush tunables ...". Could you give me a hint? What you mean? All what you need after upgrade to Luminous is: ceph osd crush tunables optimal ceph osd crush set-all-straw-buckets-to-straw2 k

Re: [ceph-users] PG status is "active+undersized+degraded"

2018-06-22 Thread Dave.Chen
Hi Burkhard, Thanks for your explanation, I created an new OSD with 2TB from another node, it truly solved the issue, the status of Ceph cluster is " health HEALTH_OK" now. Another question is if three homogeneous OSD is spread across 2 nodes, I still got the warning message, and the status

Re: [ceph-users] PG status is "active+undersized+degraded"

2018-06-22 Thread Dave.Chen
I saw these statement from this link ( http://docs.ceph.com/docs/master/rados/operations/crush-map/ ), it that the reason which leads to the warning? " This, combined with the default CRUSH failure domain, ensures that replicas or erasure code shards are separated across hosts and a single

Re: [ceph-users] How to throttle operations like "rbd rm"

2018-06-22 Thread ceph
Hi Paul, Am 14. Juni 2018 00:33:09 MESZ schrieb Paul Emmerich : >2018-06-13 23:53 GMT+02:00 : > >> Hi yao, >> >> IIRC there is a *sleep* Option which is usefull when delete Operation >is >> being done from ceph sleep_trim or something like that. >> > >you are thinking of "osd_snap_trim_sleep"

Re: [ceph-users] separate monitoring node

2018-06-22 Thread Stefan Kooman
Quoting Denny Fuchs (linuxm...@4lin.net): > hi, > > > Am 19.06.2018 um 17:17 schrieb Kevin Hrpcek : > > > > # ceph auth get client.icinga > > exported keyring for client.icinga > > [client.icinga] > > key = > > caps mgr = "allow r" > > caps mon = "allow r" > > thats the point: It's

Re: [ceph-users] radosgw failover help

2018-06-22 Thread Konstantin Shalygin
Has any one, done or working a way to do S3(radosgw) failover. I am trying to work out away to have 2 radosgw servers, with an VIP when one server goes down it will go over to the other. May be better failover + load balancing? For example - nginx do this + TLS. k

[ceph-users] Recovery after datacenter outage

2018-06-22 Thread Christian Zunker
Hi List, we are running a ceph cluster (12.2.5) as backend to our OpenStack cloud. Yesterday our datacenter had a power outage. As this wouldn't be enough, we also had a separated ceph cluster because of networking problems. First of all thanks a lot to the ceph developers. After the network

[ceph-users] Howto add another client user id to a cluster

2018-06-22 Thread Steffen Winther Sørensen
Anyone, We’ve ceph clients that we want to let mount two cephfs from each their own ceph clusters. Both cluster are standard created w/ceph-deploy and possible only has knowledge of each their client.admin. How could we allow a new client id to access the 2. cluster eg. as admin2? On ceph

Re: [ceph-users] CentOS Dojo at CERN

2018-06-22 Thread Willem Jan Withagen
On 21-6-2018 14:44, Dan van der Ster wrote: On Thu, Jun 21, 2018 at 2:41 PM Kai Wagner wrote: On 20.06.2018 17:39, Dan van der Ster wrote: And BTW, if you can't make it to this event we're in the early days of planning a dedicated Ceph + OpenStack Days at CERN around May/June 2019. More news

Re: [ceph-users] Backfill stops after a while after OSD reweight

2018-06-22 Thread Oliver Schulz
Yes, I know that section of the docs, but can't find how to change the crush rules after "ceph osd crush tunables ...". Could you give me a hint? Another question, if I may: Would you recommend going from my ancient tunables to hammer directly (or even to jewel, if I can get the clients updated)

[ceph-users] unfound blocks IO or gives IO error?

2018-06-22 Thread Dan van der Ster
Hi all, Quick question: does an IO with an unfound object result in an IO error or should the IO block? During a jewel to luminous upgrade some PGs passed through a state with unfound objects for a few seconds. And this seems to match the times when we had a few IO errors on RBD attached

Re: [ceph-users] unfound blocks IO or gives IO error?

2018-06-22 Thread Sergey Malinin
From http://docs.ceph.com/docs/mimic/rados/troubleshooting/troubleshooting-pg/ : "Now 1 knows that these object exist, but there is no live ceph-osd who has a copy. In this case, IO to those objects will block, and the

Re: [ceph-users] unfound blocks IO or gives IO error?

2018-06-22 Thread Dan van der Ster
Thanks. So I'm going to continue looking for the cause of these IO errors. -- dan On Fri, Jun 22, 2018 at 3:22 PM Sergey Malinin wrote: > > From > http://docs.ceph.com/docs/mimic/rados/troubleshooting/troubleshooting-pg/ : > > "Now 1 knows that these object exist, but there is no live ceph-osd