Re: [ceph-users] [Ceph-deploy] Cluster Name

2018-08-01 Thread Glen Baars
Hello Erik, We are going to use RBD-mirror to replicate the clusters. This seems to need separate cluster names. Kind regards, Glen Baars From: Erik McCormick Sent: Thursday, 2 August 2018 9:39 AM To: Glen Baars Cc: Thode Jocelyn ; Vasu Kulkarni ; ceph-users@lists.ceph.com Subject: Re:

Re: [ceph-users] Ceph MDS and hard links

2018-08-01 Thread Yan, Zheng
On Thu, Aug 2, 2018 at 3:36 AM Benjeman Meekhof wrote: > > I've been encountering lately a much higher than expected memory usage > on our MDS which doesn't align with the cache_memory limit even > accounting for potential over-runs. Our memory limit is 4GB but the > MDS process is steadily at

Re: [ceph-users] Luminous OSD crashes every few seconds: FAILED assert(0 == "past_interval end mismatch")

2018-08-01 Thread Brad Hubbard
If you don't already know why, you should investigate why your cluster could not recover after the loss of a single osd. Your solution seems valid given your description. On Thu, Aug 2, 2018 at 12:15 PM, J David wrote: > On Wed, Aug 1, 2018 at 9:53 PM, Brad Hubbard wrote: >> What is the

Re: [ceph-users] Luminous OSD crashes every few seconds: FAILED assert(0 == "past_interval end mismatch")

2018-08-01 Thread J David
On Wed, Aug 1, 2018 at 9:53 PM, Brad Hubbard wrote: > What is the status of the cluster with this osd down and out? Briefly, miserable. All client IO was blocked. 36 pgs were stuck “down.” pg query reported that they were blocked by that OSD, despite that OSD not holding any replicas for

Re: [ceph-users] Luminous OSD crashes every few seconds: FAILED assert(0 == "past_interval end mismatch")

2018-08-01 Thread J David
It seems I got around this issue with the following process. 1. I noted from the error that the pg causing the problem was 2.621. 2. I did “ceph pg 2.621 query” and I saw that that pg had nothing whatsoever to do with the affected OSD. 3. I looked in the /var/lib/ceph/osd/ceph-14/current

Re: [ceph-users] Luminous OSD crashes every few seconds: FAILED assert(0 == "past_interval end mismatch")

2018-08-01 Thread Brad Hubbard
What is the status of the cluster with this osd down and out? On Thu, Aug 2, 2018 at 5:42 AM, J David wrote: > Hello all, > > On Luminous 12.2.7, during the course of recovering from a failed OSD, > one of the other OSDs started repeatedly crashing every few seconds > with an assertion failure:

Re: [ceph-users] fyi: Luminous 12.2.7 pulled wrong osd disk, resulted in node down

2018-08-01 Thread Brad Hubbard
On Wed, Aug 1, 2018 at 10:38 PM, Marc Roos wrote: > > > Today we pulled the wrong disk from a ceph node. And that made the whole > node go down/be unresponsive. Even to a simple ping. I cannot find to > much about this in the log files. But I expect that the > /usr/bin/ceph-osd process caused a

Re: [ceph-users] [Ceph-deploy] Cluster Name

2018-08-01 Thread Erik McCormick
Don't set a cluster name. It's no longer supported. It really only matters if you're running two or more independent clusters on the same boxes. That's generally inadvisable anyway. Cheers, Erik On Wed, Aug 1, 2018, 9:17 PM Glen Baars wrote: > Hello Ceph Users, > > Does anyone know how to set

Re: [ceph-users] [Ceph-deploy] Cluster Name

2018-08-01 Thread Glen Baars
Hello Ceph Users, Does anyone know how to set the Cluster Name when deploying with Ceph-deploy? I have 3 clusters to configure and need to correctly set the name. Kind regards, Glen Baars -Original Message- From: ceph-users On Behalf Of Glen Baars Sent: Monday, 23 July 2018 5:59 PM

[ceph-users] Ceph Balancer per Pool/Crush Unit

2018-08-01 Thread Reed Dier
Hi Cephers, I’m starting to play with the Ceph Balancer plugin after moving to straw2 and running into something I’m surprised I haven’t seen posted here. My cluster has two crush roots, one for HDD, one for SSD. Right now, HDD’s are a single pool to themselves, SSD’s are a single pool to

Re: [ceph-users] OMAP warning ( again )

2018-08-01 Thread Brad Hubbard
rgw is not really my area but I'd suggest before you do *anything* you establish which object it is talking about. On Thu, Aug 2, 2018 at 8:08 AM, Brent Kennedy wrote: > Ceph health detail gives this: > HEALTH_WARN 1 large omap objects > LARGE_OMAP_OBJECTS 1 large omap objects > 1 large

[ceph-users] Error: journal specified but not allowed by osd backend

2018-08-01 Thread David Majchrzak
Hi! Trying to replace an OSD on a Jewel cluster (filestore data on HDD + journal device on SSD). I've set noout and removed the flapping drive (read errors) and replaced it with a new one. I've taken down the osd UUID to be able to prepare the new disk with the same osd.ID. The journal device

Re: [ceph-users] OMAP warning ( again )

2018-08-01 Thread Brent Kennedy
Ceph health detail gives this: HEALTH_WARN 1 large omap objects LARGE_OMAP_OBJECTS 1 large omap objects 1 large objects found in pool '.rgw.buckets.index' Search the cluster log for 'Large omap object found' for more details. The ceph.log file on the monitor server only shows the 1 large

[ceph-users] Luminous OSD crashes every few seconds: FAILED assert(0 == "past_interval end mismatch")

2018-08-01 Thread J David
Hello all, On Luminous 12.2.7, during the course of recovering from a failed OSD, one of the other OSDs started repeatedly crashing every few seconds with an assertion failure: 2018-08-01 12:17:20.584350 7fb50eded700 -1 log_channel(cluster) log [ERR] : 2.621 past_interal bound [19300,21449) end

[ceph-users] Ceph MDS and hard links

2018-08-01 Thread Benjeman Meekhof
I've been encountering lately a much higher than expected memory usage on our MDS which doesn't align with the cache_memory limit even accounting for potential over-runs. Our memory limit is 4GB but the MDS process is steadily at around 11GB used. Coincidentally we also have a new user heavily

Re: [ceph-users] PGs activating+remapped, PG overdose protection?

2018-08-01 Thread Paul Emmerich
You should probably have used 2048 following the usual target of 100 PGs per OSD. Just increase the mon_max_pg_per_osd option, ~200 is still okay-ish and your cluster will grow out of it :) Paul 2018-08-01 19:55 GMT+02:00 Alexandros Afentoulis : > Hello people :) > > we are facing a situation

[ceph-users] PGs activating+remapped, PG overdose protection?

2018-08-01 Thread Alexandros Afentoulis
Hello people :) we are facing a situation quite similar to the one described here: http://tracker.ceph.com/issues/23117 Namely: we have a Luminous cluster consisting of 16 hosts, where each host holds 12 OSDs on spinning disks and 4 OSDs on SSDs. Let's forget the SSDs for now since they're not

Re: [ceph-users] Force cephfs delayed deletion

2018-08-01 Thread Kamble, Nitin A
From: John Spray Date: Wednesday, August 1, 2018 at 4:02 AM To: "Kamble, Nitin A" Cc: "arya...@intermedia.net" , "ceph-users@lists.ceph.com" Subject: Re: [ceph-users] Force cephfs delayed deletion [External Email] On Tue, Jul 31, 2018 at 11:43 PM Kamble,

Re: [ceph-users] Force cephfs delayed deletion

2018-08-01 Thread Kamble, Nitin A
From: "Yan, Zheng" Date: Tuesday, July 31, 2018 at 8:14 PM To: "Kamble, Nitin A" Cc: "arya...@intermedia.net" , John Spray , ceph-users Subject: Re: [ceph-users] Force cephfs delayed deletion [External Email] On Wed, Aug 1, 2018 at 6:43 AM Kamble, Nitin A

Re: [ceph-users] rbdmap service issue

2018-08-01 Thread Ilya Dryomov
On Wed, Aug 1, 2018 at 11:13 AM wrote: > > Hi! > > I find a rbd map service issue: > [root@dx-test ~]# systemctl status rbdmap > ● rbdmap.service - Map RBD devices >Loaded: loaded (/usr/lib/systemd/system/rbdmap.service; enabled; vendor > preset: disabled) >Active: active (exited)

Re: [ceph-users] Intermittent client reconnect delay following node fail

2018-08-01 Thread William Lawton
I didn't lose any clients this time around, all clients reconnected within at most 21 seconds. We think the very long client disconnections occurred when both the mgr and mds were active on the failed node, which was not the case for any of my recent 10 tests. We have noticed in the client logs

Re: [ceph-users] Remove host weight 0 from crushmap

2018-08-01 Thread Simon Ironside
On 01/08/18 13:39, Marc Roos wrote: Is there already a command to remove an host from the crush map (like ceph osd crush rm osd.23), without having to 'manually' edit the crush map? Yes, it's the same: ceph osd crush remove Simon ___ ceph-users

[ceph-users] Remove host weight 0 from crushmap

2018-08-01 Thread Marc Roos
Is there already a command to remove an host from the crush map (like ceph osd crush rm osd.23), without having to 'manually' edit the crush map? ___ ceph-users mailing list ceph-users@lists.ceph.com

[ceph-users] fyi: Luminous 12.2.7 pulled wrong osd disk, resulted in node down

2018-08-01 Thread Marc Roos
Today we pulled the wrong disk from a ceph node. And that made the whole node go down/be unresponsive. Even to a simple ping. I cannot find to much about this in the log files. But I expect that the /usr/bin/ceph-osd process caused a kernel panic. Linux c01 3.10.0-693.11.1.el7.x86_64 CentOS

Re: [ceph-users] CephFS configuration for millions of small files

2018-08-01 Thread Paul Emmerich
Please keep the discussion on the mailing list. With 11 nodes and requirements I'd probably go for 8+2 or 7+3 depending on the exact requirements. The problem with +1 is that you either accept writes when you cannot guarantee redundancy or you have a downtime when one osd is down. Yes, you can

Re: [ceph-users] Intermittent client reconnect delay following node fail

2018-08-01 Thread John Spray
On Wed, Aug 1, 2018 at 12:09 PM William Lawton wrote: > > Thanks for the advice John. > > Our CentOS 7 clients use linux kernel v3.10 so I upgraded one of them to use > v4.17 and have run 10 more node fail tests. Unfortunately, the kernel upgrade > on the client hasn't resolved the issue. > >

Re: [ceph-users] Intermittent client reconnect delay following node fail

2018-08-01 Thread William Lawton
Thanks for the advice John. Our CentOS 7 clients use linux kernel v3.10 so I upgraded one of them to use v4.17 and have run 10 more node fail tests. Unfortunately, the kernel upgrade on the client hasn't resolved the issue. With each test I took down the active MDS node and monitored how long

Re: [ceph-users] Force cephfs delayed deletion

2018-08-01 Thread John Spray
On Tue, Jul 31, 2018 at 11:43 PM Kamble, Nitin A wrote: > Hi John, > > > > I am running ceph Luminous 12.2.1 release on the storage nodes with > v4.4.114 kernel on the cephfs clients. > > > > 3 client nodes are running 3 instances of a test program. > > The test program is doing this repeatedly

[ceph-users] PG went to Down state on OSD failure

2018-08-01 Thread shrey chauhan
Hi, I am trying to understand what happens when an OSD fails. Few days back I wanted to check what happens when an OSD goes down for that what I did was I just went to the node and stopped one of the osd's service. When OSD went in down and out state pgs started recovering and after sometime

[ceph-users] safe to remove leftover bucket index objects

2018-08-01 Thread Dan van der Ster
Dear rgw friends, Somehow we have more than 20 million objects in our default.rgw.buckets.index pool. They are probably leftover from this issue we had last year: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-June/018565.html and we want to clean the leftover / unused index objects To

[ceph-users] Run ceph-rest-api in Mimic

2018-08-01 Thread Ha, Son Hai
Hello everybody! Because some of my applications are depended on the obsoleted ceph-rest-api module, I would like to know if there is a way to run it in Mimic? If I understood correctly, the new restful plugin (http://docs.ceph.com/docs/mimic/mgr/restful/) in mgr does not provide cluster

Re: [ceph-users] Run ceph-rest-api in Mimic

2018-08-01 Thread Wido den Hollander
On 08/01/2018 12:00 PM, Ha, Son Hai wrote: > Hello everybody! > >   > > Because some of my applications are depended on the obsoleted > ceph-rest-api module, I would like to know if there is a way to run it > in Mimic? If I understood correctly, the new restful plugin >

Re: [ceph-users] mgr abort during upgrade 12.2.5 -> 12.2.7 due to multiple active RGW clones

2018-08-01 Thread Burkhard Linke
Hi, On 08/01/2018 11:14 AM, Dan van der Ster wrote: Sounds like https://tracker.ceph.com/issues/24982 Thx, I've added the information to the bug report. Regards, Burkhard ___ ceph-users mailing list ceph-users@lists.ceph.com

Re: [ceph-users] mgr abort during upgrade 12.2.5 -> 12.2.7 due to multiple active RGW clones

2018-08-01 Thread Dan van der Ster
Sounds like https://tracker.ceph.com/issues/24982 On Wed, Aug 1, 2018 at 10:18 AM Burkhard Linke wrote: > > Hi, > > > I'm currently upgrading our ceph cluster to 12.2.7. Most steps are fine, > but all mgr instances abort after restarting: > > > > > -10> 2018-08-01 09:57:46.357696

[ceph-users] rbdmap service issue

2018-08-01 Thread xiang . dai
Hi! I find a rbd map service issue: [root@dx-test ~]# systemctl status rbdmap ● rbdmap.service - Map RBD devices Loaded: loaded (/usr/lib/systemd/system/rbdmap.service; enabled; vendor preset: disabled) Active: active (exited) (Result: exit-code) since 六 2018-07-28 13:55:01 CST; 11min ago

[ceph-users] Optane 900P device class automatically set to SSD not NVME

2018-08-01 Thread Jake Grimmett
Dear All, Not sure if this is a bug, but when I add Intel Optane 900P drives, their device class is automatically set to SSD rather than NVME. This happens under Mimic 13.2.0 and 13.2.1 [root@ceph2 ~]# ceph-volume lvm prepare --bluestore --data /dev/nvme0n1 (SNIP see http://p.ip.fi/eopR for

[ceph-users] mgr abort during upgrade 12.2.5 -> 12.2.7 due to multiple active RGW clones

2018-08-01 Thread Burkhard Linke
Hi, I'm currently upgrading our ceph cluster to 12.2.7. Most steps are fine, but all mgr instances abort after restarting:    -10> 2018-08-01 09:57:46.357696 7fc481221700  5 -- 192.168.6.134:6856/5968 >> 192.168.6.131:6814/2743 conn(0x564cf2bf9000 :6856

Re: [ceph-users] ceph-mgr dashboard behind reverse proxy

2018-08-01 Thread Burkhard Linke
Hi, On 07/30/2018 04:09 PM, Tobias Florek wrote: Hi! I want to set up the dashboard behind a reverse proxy. How do people determine which ceph-mgr is active? Is there any simple and elegant solution? You can use haproxy. It supports periodic check for the availability of the configured

Re: [ceph-users] is there any filesystem like wrapper that dont need to map and mount rbd ?

2018-08-01 Thread ceph
Sound like cephfs to me On 08/01/2018 09:33 AM, Will Zhao wrote: > Hi: >I want to use ceph rbd, because it shows better performance. But I dont > like kernal module and isci target process. So here is my requirments: >I dont want to map it and mount it , But I still want to use some >

[ceph-users] is there any filesystem like wrapper that dont need to map and mount rbd ?

2018-08-01 Thread Will Zhao
Hi: I want to use ceph rbd, because it shows better performance. But I dont like kernal module and isci target process. So here is my requirments: I dont want to map it and mount it , But I still want to use some filesystem like api, or at least I can write multiple files to the rbd