[ceph-users] ceph auto repair. What is wrong?

2018-08-23 Thread Fyodor Ustinov
Hi! I have fresh ceph cluster. 12 host and 3 osd on each host (one - hdd and two - ssd). Each host located in own rack. I make such crush configuration on fresh ceph installation: sudo ceph osd crush add-bucket R-26-3-1 rack sudo ceph osd crush add-bucket R-26-3-2 rack sudo ceph osd

Re: [ceph-users] Why does Ceph probe for end of MDS log?

2018-08-23 Thread Gregory Farnum
On Thu, Aug 23, 2018 at 7:47 PM Bryan Henderson wrote: > I've been reading MDS log code, and I have a question: why does it "probe > for > the end of the log" after reading the log header when starting up? > > As I understand it, the log header says the log had been written up to > Location X

[ceph-users] Why does Ceph probe for end of MDS log?

2018-08-23 Thread Bryan Henderson
I've been reading MDS log code, and I have a question: why does it "probe for the end of the log" after reading the log header when starting up? As I understand it, the log header says the log had been written up to Location X ("write_pos") the last time the log was committed, but the end-probe

Re: [ceph-users] A self test on the usage of 'step choose|chooseleaf'

2018-08-23 Thread Gregory Farnum
On Thu, Aug 23, 2018 at 4:11 PM Cody wrote: > Hi everyone, > > As a newbie, I am grateful for receiving many helps from people on > this mailing list. I would like to quickly test my understanding on > 'step choose|chooseleaf' and wish you could point out any of my > mistakes. > > Suppose I have

Re: [ceph-users] Ceph RGW Index Sharding In Jewel

2018-08-23 Thread Russell Holloway
Thanks. Unfortunately even my version of hammer is too old on 0.94.5. I think my only route to address this issue is to figure out the upgrade, at the very least to 0.94.10. The biggest issue again is the deployment tool originally used is set on 0.94.5 and pretty convoluted and no longer

Re: [ceph-users] Stability Issue with 52 OSD hosts

2018-08-23 Thread Tyler Bishop
Thanks for the info. I was investigating bluestore as well. My host dont go unresponsive but I do see parallel io slow down. On Thu, Aug 23, 2018, 8:02 PM Andras Pataki wrote: > We are also running some fairly dense nodes with CentOS 7.4 and ran into > similar problems. The nodes ran

Re: [ceph-users] Stability Issue with 52 OSD hosts

2018-08-23 Thread Andras Pataki
We are also running some fairly dense nodes with CentOS 7.4 and ran into similar problems.  The nodes ran filestore OSDs (Jewel, then Luminous).  Sometimes a node would be so unresponsive that one couldn't even ssh to it (even though the root disk was a physically separate drive on a separate

[ceph-users] A self test on the usage of 'step choose|chooseleaf'

2018-08-23 Thread Cody
Hi everyone, As a newbie, I am grateful for receiving many helps from people on this mailing list. I would like to quickly test my understanding on 'step choose|chooseleaf' and wish you could point out any of my mistakes. Suppose I have a following topology: root=default

Re: [ceph-users] Ceph Testing Weekly Tomorrow — With Kubernetes/Install discussion

2018-08-23 Thread Brett Niver
Great discussion! On Thu, Aug 23, 2018 at 12:38 PM, Gregory Farnum wrote: > And the recording from this session is now available at > https://www.youtube.com/watch?v=0WHHTjdgarQ > > We didn't come to any conclusions but I think we've got a good > understanding of the motivations and concerns,

Re: [ceph-users] Dashboard can't activate in Luminous?

2018-08-23 Thread Dan Mick
On 08/23/2018 02:52 PM, Robert Stanford wrote: > >  I just installed a new luminous cluster.  When I run this command: > ceph mgr module enable dashboard > > I get this response: > all mgr daemons do not support module 'dashboard' > > All daemons are Luminous (I confirmed this by runing ceph

[ceph-users] Dashboard can't activate in Luminous?

2018-08-23 Thread Robert Stanford
I just installed a new luminous cluster. When I run this command: ceph mgr module enable dashboard I get this response: all mgr daemons do not support module 'dashboard' All daemons are Luminous (I confirmed this by runing ceph version). Why would this error appear? Thank you R

Re: [ceph-users] Unexpected behaviour after monitors upgrade from Jewel to Luminous

2018-08-23 Thread Adrien Gillard
Sending back, forgot the plain text for ceph-devel. Sorry about that. On Thu, Aug 23, 2018 at 9:57 PM Adrien Gillard wrote: > > We are running CentOS 7.5 with upstream Ceph packages, no remote syslog, just > default local logging. > > After looking a bit deeper into pprof, --alloc_space seems

Re: [ceph-users] Unexpected behaviour after monitors upgrade from Jewel to Luminous

2018-08-23 Thread Adrien Gillard
We are running CentOS 7.5 with upstream Ceph packages, no remote syslog, just default local logging. After looking a bit deeper into pprof, --alloc_space seems to represent allocations that happened since the program started which goes along with the quick deallocation of the memory.

Re: [ceph-users] Stability Issue with 52 OSD hosts

2018-08-23 Thread Tyler Bishop
Yes I've reviewed all the logs from monitor and host. I am not getting useful errors (or any) in dmesg or general messages. I have 2 ceph clusters, the other cluster is 300 SSD and i never have issues like this. That's why Im looking for help. On Thu, Aug 23, 2018 at 3:22 PM Alex Gorbachev

Re: [ceph-users] Stability Issue with 52 OSD hosts

2018-08-23 Thread Alex Gorbachev
On Wed, Aug 22, 2018 at 11:39 PM Tyler Bishop wrote: > > During high load testing I'm only seeing user and sys cpu load around 60%... > my load doesn't seem to be anything crazy on the host and iowait stays > between 6 and 10%. I have very good `ceph osd perf` numbers too. > > I am using

Re: [ceph-users] Mimic prometheus plugin -no socket could be created

2018-08-23 Thread Steven Vacaroaia
Did all that .. even tried to change port Also selinux and firewalld are disabled Thanks for taking the trouble to suggest something Steven On Thu, 23 Aug 2018 at 13:46, John Spray wrote: > On Thu, Aug 23, 2018 at 5:18 PM Steven Vacaroaia wrote: > > > > Hi All, > > > > I am trying to enable

Re: [ceph-users] Mimic prometheus plugin -no socket could be created

2018-08-23 Thread John Spray
On Thu, Aug 23, 2018 at 5:18 PM Steven Vacaroaia wrote: > > Hi All, > > I am trying to enable prometheus plugin with no success due to "no socket > could be created" > > The instructions for enabling the plugin are very straightforward and simple > > Note > My ultimate goal is to use Prometheus

Re: [ceph-users] Question about 'firstn|indep'

2018-08-23 Thread Gregory Farnum
On Thu, Aug 23, 2018 at 10:21 AM Cody wrote: > So, is it okay to say that compared to the 'firstn' mode, the 'indep' > mode may have the least impact on a cluster in an event of OSD > failure? Could I use 'indep' for replica pool as well? > You could, but shouldn't. Imagine if the primary OSD

[ceph-users] RGW pools don't show up in luminous

2018-08-23 Thread Robert Stanford
I installed a new Ceph cluster with Luminous, after a long time working with Jewel. I created my RGW pools the same as always (pool create default.rgw.buckets.data etc.), but they don't show up in ceph df with Luminous. Has the command changed? Thanks R

Re: [ceph-users] Question about 'firstn|indep'

2018-08-23 Thread Cody
So, is it okay to say that compared to the 'firstn' mode, the 'indep' mode may have the least impact on a cluster in an event of OSD failure? Could I use 'indep' for replica pool as well? Thank you! Regards, Cody On Wed, Aug 22, 2018 at 7:12 PM Gregory Farnum wrote: > > On Wed, Aug 22, 2018 at

Re: [ceph-users] Unexpected behaviour after monitors upgrade from Jewel to Luminous

2018-08-23 Thread Gregory Farnum
On Thu, Aug 23, 2018 at 8:42 AM Adrien Gillard wrote: > With a bit of profiling, it seems all the memory is allocated to > ceph::logging::Log::create_entry (see below) > > Shoould this be normal ? Is it because some OSDs are down and it logs the > results of its osd_ping ? > Hmm, is that where

Re: [ceph-users] HEALTH_ERR vs HEALTH_WARN

2018-08-23 Thread Gregory Farnum
On Thu, Aug 23, 2018 at 12:26 AM mj wrote: > Hi, > > Thanks John and Gregory for your answers. > > Gregory's answer worries us. We thought that with a 3/2 pool, and one PG > corrupted, the assumption would be: the two similar ones are correct, > and the third one needs to be adjusted. > > Can we

Re: [ceph-users] Intermittent client reconnect delay following node fail

2018-08-23 Thread John Spray
On Thu, Aug 23, 2018 at 3:01 PM William Lawton wrote: > > Hi John. > > Just picking up this thread again after coming back from leave. Our ceph > storage project has progressed and we are now making sure that the active MON > and MDS are kept on separate nodes which has helped reduce the

Re: [ceph-users] Shared WAL/DB device partition for multiple OSDs?

2018-08-23 Thread Alfredo Deza
On Thu, Aug 23, 2018 at 11:32 AM, Hervé Ballans wrote: > Le 23/08/2018 à 16:13, Alfredo Deza a écrit : > > What you mean is that, at this stage, I must directly declare the UUID paths > in value of --block.db (i.e. replace /dev/nvme0n1p1 with its PARTUUID), that > is ? > > No, this all looks

Re: [ceph-users] Ceph Testing Weekly Tomorrow — With Kubernetes/Install discussion

2018-08-23 Thread Gregory Farnum
And the recording from this session is now available at https://www.youtube.com/watch?v=0WHHTjdgarQ We didn't come to any conclusions but I think we've got a good understanding of the motivations and concerns, along with several options and some of the constraints of each. -Greg On Tue, Aug 21,

[ceph-users] Mimic prometheus plugin -no socket could be created

2018-08-23 Thread Steven Vacaroaia
Hi All, I am trying to enable prometheus plugin with no success due to "no socket could be created" The instructions for enabling the plugin are very straightforward and simple Note My ultimate goal is to use Prometheus with Cephmetrics Some of you suggested to deploy ceph-exporter but why do

Re: [ceph-users] Unexpected behaviour after monitors upgrade from Jewel to Luminous

2018-08-23 Thread Adrien Gillard
With a bit of profiling, it seems all the memory is allocated to ceph::logging::Log::create_entry (see below) Shoould this be normal ? Is it because some OSDs are down and it logs the results of its osd_ping ? The debug level of the OSD is below also. Thanks, Adrien $ pprof

Re: [ceph-users] Shared WAL/DB device partition for multiple OSDs?

2018-08-23 Thread Hervé Ballans
Le 23/08/2018 à 16:13, Alfredo Deza a écrit : What you mean is that, at this stage, I must directly declare the UUID paths in value of --block.db (i.e. replace /dev/nvme0n1p1 with its PARTUUID), that is ? No, this all looks correct. How does the ceph-volume.log and ceph-volume-systemd.log look

[ceph-users] Broken bucket problems

2018-08-23 Thread DHD.KOHA
I really have hard time to grasp and manage to Delete a bucket that is freaking me out! Even though all s3 clients claim that the bucket is empty except 2 multi part uploads that I am not able to get rid off. radosgw-admin bucket check --bucket=whatever [    

Re: [ceph-users] [question] one-way RBD mirroring doesn't work

2018-08-23 Thread Jason Dillaman
On Thu, Aug 23, 2018 at 10:56 AM sat wrote: > > Hi, > > > I'm trying to make a one-way RBD mirroed cluster between two Ceph clusters. > But it > hasn't worked yet. It seems to sucecss, but after making an RBD image from > local cluster, > it's considered as "unknown". > > ``` > $ sudo rbd

[ceph-users] [question] one-way RBD mirroring doesn't work

2018-08-23 Thread sat
Hi, I'm trying to make a one-way RBD mirroed cluster between two Ceph clusters. But it hasn't worked yet. It seems to sucecss, but after making an RBD image from local cluster, it's considered as "unknown". ``` $ sudo rbd --cluster local create rbd/local.img --size=1G

Re: [ceph-users] how can time machine know difference between cephfs fuse and kernel client?

2018-08-23 Thread Chad William Seys
Hi All, I think my problem was that I had quotas set at multiple levels of a subtree, and maybe some were conflicting. (E.g. Parent said quota=1GB, child said quota=200GB.) I could not reproduce the problem, but setting quotas only on the user's subdirectory and not elsewhere along the way

Re: [ceph-users] Unexpected behaviour after monitors upgrade from Jewel to Luminous

2018-08-23 Thread Adrien Gillard
After upgrading to luminous, we see the exact same behaviour, with OSDs eating as much as 80/90 GB of memory. We'll try some memory profiling but at this point we're a bit lost. Is there any specific logs that could help us ? On Thu, Aug 23, 2018 at 2:34 PM Adrien Gillard wrote: > Well after a

Re: [ceph-users] Shared WAL/DB device partition for multiple OSDs?

2018-08-23 Thread Alfredo Deza
On Thu, Aug 23, 2018 at 9:56 AM, Hervé Ballans wrote: > Le 23/08/2018 à 15:20, Alfredo Deza a écrit : > > Thanks Alfredo for your reply. I'm using the very last version of Luminous > (12.2.7) and ceph-deploy (2.0.1). > I have no problem in creating my OSD, that's work perfectly. > My issue only

Re: [ceph-users] Intermittent client reconnect delay following node fail

2018-08-23 Thread William Lawton
Hi John. Just picking up this thread again after coming back from leave. Our ceph storage project has progressed and we are now making sure that the active MON and MDS are kept on separate nodes which has helped reduce the incidence of delayed client reconnects on ceph node failure. We've also

Re: [ceph-users] Shared WAL/DB device partition for multiple OSDs?

2018-08-23 Thread Hervé Ballans
Le 23/08/2018 à 15:20, Alfredo Deza a écrit : Thanks Alfredo for your reply. I'm using the very last version of Luminous (12.2.7) and ceph-deploy (2.0.1). I have no problem in creating my OSD, that's work perfectly. My issue only concerns the problem of the mount names of the NVMe partitions

Re: [ceph-users] Shared WAL/DB device partition for multiple OSDs?

2018-08-23 Thread Alfredo Deza
On Thu, Aug 23, 2018 at 9:12 AM, Hervé Ballans wrote: > Le 23/08/2018 à 12:51, Alfredo Deza a écrit : >> >> On Thu, Aug 23, 2018 at 5:42 AM, Hervé Ballans >> wrote: >>> >>> Hello all, >>> >>> I would like to continue a thread that dates back to last May (sorry if >>> this >>> is not a good

Re: [ceph-users] Shared WAL/DB device partition for multiple OSDs?

2018-08-23 Thread Hervé Ballans
Le 23/08/2018 à 12:51, Alfredo Deza a écrit : On Thu, Aug 23, 2018 at 5:42 AM, Hervé Ballans wrote: Hello all, I would like to continue a thread that dates back to last May (sorry if this is not a good practice ?..) Thanks David for your usefil tips on this thread. In my side, I created my

[ceph-users] Connect client to cluster on other subnet

2018-08-23 Thread Daniel Carrasco
Hello, I've a Ceph cluster working on a subnet where clients on same subnet can connect without problem, but now I need to connect some clients that are on other subnet and I'm getting a connection timeout error. Both subnets are connected and I've disabled the FW for testing if maybe is blocker,

[ceph-users] Migrating from pre-luminous multi-root crush hierachy

2018-08-23 Thread Buchberger, Carsten
Hello, when we started with ceph we wanted to mix different disk-types per host. Since that was before device-classes were available we followed the advice to create a multi root-hierachy and disk-type-specific hosts. So currently the osd tree looks kind of like this -8 218.21320

Re: [ceph-users] Unexpected behaviour after monitors upgrade from Jewel to Luminous

2018-08-23 Thread Adrien Gillard
Well after a few hours, still nothing new in the behaviour. With half of the OSDs (so 6 per host) up and peering and the nodown flag set to limit the creation of new maps, all the memory is consumed and OSDs get killed by OOM killer. We observe a lot of threads being created for each OSDs

Re: [ceph-users] HEALTH_ERR vs HEALTH_WARN

2018-08-23 Thread mj
Hi Mark, others, I took my info from following page: https://ceph.com/geen-categorie/ceph-manually-repair-object/ where is written: "Of course the above works well when you have 3 replicas when it is easier for Ceph to compare two versions against another one." Based on that info, I assumed

Re: [ceph-users] mgr/dashboard: backporting Ceph Dashboard v2 to Luminous

2018-08-23 Thread Willem Jan Withagen
On 23/08/2018 12:47, Ernesto Puerta wrote: @Willem, given your comments come from a technical ground, let's address those technically. As you say, dashboard_v2 is already in Mimic and will be soon in Nautilus when released, so for FreeBSD the issue will anyhow be there. Let's look for a

Re: [ceph-users] Shared WAL/DB device partition for multiple OSDs?

2018-08-23 Thread Alfredo Deza
On Thu, Aug 23, 2018 at 5:42 AM, Hervé Ballans wrote: > Hello all, > > I would like to continue a thread that dates back to last May (sorry if this > is not a good practice ?..) > > Thanks David for your usefil tips on this thread. > In my side, I created my OSDs with ceph-deploy (in place of

Re: [ceph-users] mgr/dashboard: backporting Ceph Dashboard v2 to Luminous

2018-08-23 Thread Ernesto Puerta
Thanks all for sharing your views, and thanks to Lenz & Kai for the clarifications. For those, like David, not familiar with dashboard_v2 (or even with dashboard_v1), you may check this short clip (https://youtu.be/m5i3x4eR6k4), which goes through the dashboard_v2 as per this first backport

Re: [ceph-users] mgr/dashboard: backporting Ceph Dashboard v2 to Luminous

2018-08-23 Thread Willem Jan Withagen
On 23/08/2018 11:22, Lenz Grimmer wrote: On 08/22/2018 08:57 PM, David Turner wrote: My initial reaction to this PR/backport was questioning why such a major update would happen on a dot release of Luminous. Your reaction to keeping both dashboards viable goes to support that. Should we

Re: [ceph-users] HEALTH_ERR vs HEALTH_WARN

2018-08-23 Thread Mark Schouten
On Thu, 2018-08-23 at 09:26 +0200, mj wrote: > Gregory's answer worries us. We thought that with a 3/2 pool, and one > PG > corrupted, the assumption would be: the two similar ones are > correct, > and the third one needs to be adjusted. > > Can we determine from this output, if I created

Re: [ceph-users] mgr/dashboard: backporting Ceph Dashboard v2 to Luminous

2018-08-23 Thread Willem Jan Withagen
On 22/08/2018 19:42, Ernesto Puerta wrote: Thanks for your feedback, Willem! The old dashboard does not need any package fetch while building/installing. Something that is not very handy when building FreeBSD packages. And I haven't gotten around to determining how to > get around that. I

Re: [ceph-users] failing to respond to cache pressure

2018-08-23 Thread Eugen Block
Hi, I think it does have positive effect on the messages. Cause I get fewer messages than before. that's nice. I also receive definitely less cache pressure messages than before. I also started to play around with the client side cache configuration. I halved the client object cache size

Re: [ceph-users] Shared WAL/DB device partition for multiple OSDs?

2018-08-23 Thread Hervé Ballans
Hello all, I would like to continue a thread that dates back to last May (sorry if this is not a good practice ?..) Thanks David for your usefil tips on this thread. In my side, I created my OSDs with ceph-deploy (in place of ceph-volume) [1], but this is exactly the same context as this

Re: [ceph-users] mgr/dashboard: backporting Ceph Dashboard v2 to Luminous

2018-08-23 Thread Lenz Grimmer
On 08/22/2018 08:57 PM, David Turner wrote: > My initial reaction to this PR/backport was questioning why such a > major update would happen on a dot release of Luminous. Your > reaction to keeping both dashboards viable goes to support that. > Should we really be backporting features into a dot

Re: [ceph-users] Clients report OSDs down/up (dmesg) nothing in Ceph logs (flapping OSDs)

2018-08-23 Thread Eugen Block
An hour ago host5 started to report the OSDs on host4 as down (still no clue why), resulting in slow requests. This time no flapping occured, the cluster recovered a couple of minutes later. No other OSDs reported that, only those two on host5. There's nothing in the logs of the reporting

[ceph-users] radosgw: need couple of blind (indexless) buckets, how-to?

2018-08-23 Thread Konstantin Shalygin
I need bucket without index for 5000 objects, how to properly create a indexless bucket in next to indexed buckets? This is "default radosgw" Luminous instance. I was take a look to cli, as far as I understand I will need to create placement rule via "zone placement add" and add this key

Re: [ceph-users] HEALTH_ERR vs HEALTH_WARN

2018-08-23 Thread mj
Hi, Thanks John and Gregory for your answers. Gregory's answer worries us. We thought that with a 3/2 pool, and one PG corrupted, the assumption would be: the two similar ones are correct, and the third one needs to be adjusted. Can we determine from this output, if I created corruption in

Re: [ceph-users] Clients report OSDs down/up (dmesg) nothing in Ceph logs (flapping OSDs)

2018-08-23 Thread Eugen Block
Greg, thanks for your reply. So, this is actually just noisy logging from the client processing an OSDMap. That should probably be turned down, as it's not really an indicator of...anything...as far as I can tell. I usually stick with the defaults: host4:~ # ceph daemon osd.21 config show |