[ceph-users] Docs on Containerized Mon Maintenance

2021-06-15 Thread Phil Merricks
Hey folks, I'm working through some basic ops drills, and noticed what I think is an inconsistency in the Cephadm Docs. Some Googling appears to show this is a known thing, but I didn't find a clear direction on cooking up a solution yet. On a cluster with 5 mons, 2 were abruptly removed when

[ceph-users] Re: Mon crash when client mounts CephFS

2021-06-15 Thread Phil Merricks
Thanks for the replies folks. This one was resolved, I wish I could tell you I know what I changed to fix it, but there were several undocumented changes to the deployment script I'm using whilst I was distracted by something else.. Tearing down and redeploying today seems to not be suffering

[ceph-users] JSON output schema

2021-06-15 Thread Vladimir Prokofev
Good day. I'm writing some code for parsing output data for monitoring purposes. The data is that of "ceph status -f json", "ceph df -f json", "ceph osd perf -f json" and "ceph osd pool stats -f json". I also need support for all major CEPH releases, starting with Jewel till Pacific. What I've

[ceph-users] How to orch apply single site rgw with custom front-end

2021-06-15 Thread Vladimir Brik
Hello How can I use ceph orch apply to deploy single site rgw daemons with custom frontend configuration? Basically, I have three servers in a DNS round-robin, each running a 15.2.12 rgw daemon with this configuration: rgw_frontends = civetweb num_threads=5000 port=443s

[ceph-users] Ceph monitor won't start after Ubuntu update

2021-06-15 Thread Petr
Hello Ceph-users, I've upgraded my Ubuntu server from 18.04.5 LTS to Ubuntu 20.04.2 LTS via 'do-release-upgrade', during that process ceph packages were upgraded from Luminous to Octopus and now ceph-mon daemon(I have only one) won't start, log error is: "2021-06-15T20:23:41.843+

[ceph-users] Re: Issues with Ceph network redundancy using L2 MC-LAG

2021-06-15 Thread huxia...@horebdata.cn
I run 2x 10G on my hosts, and i would tolerate one bond with one link down. From what you suggest, i will check link monitoring, to make sure the failing link will be removed automatically, without the requirement for manually pulling out the cable. thanks and best regards, samuel

[ceph-users] Re: ceph PGs issues

2021-06-15 Thread Aly, Adel
Hi Reed, Thank you for getting back to us. We had indeed several disk failures at the same time. Regarding the OSD map, we have an OSD that failed and we needed to remove but we didn't update the crushmap. The question here, is it safe to update the OSD crushmap without affecting the data

[ceph-users] Re: cephfs mount problems with 5.11 kernel - not a ipv6 problem

2021-06-15 Thread Ackermann, Christoph
Hi Dan, Thanks for the hint, i'll try this tomorrow with a test bed first. This evening I had to fix some Bareos client systems to get a quiet sleep. ;-) Will give you feedback asap. Best regards, Christoph Am Di., 15. Juni 2021 um 21:03 Uhr schrieb Dan van der Ster < d...@vanderster.com>:

[ceph-users] Re: cephfs mount problems with 5.11 kernel - not a ipv6 problem

2021-06-15 Thread Dan van der Ster
Hi Christoph, What about the max osd? If "ceph osd getmaxosd" is not 76 on this cluster, then set it: `ceph osd setmaxosd 76`. -- dan On Tue, Jun 15, 2021 at 8:54 PM Ackermann, Christoph wrote: > > Dan, > > sorry, we have no gaps in osd numbering: > isceph@ceph-deploy:~$ sudo ceph osd ls |wc

[ceph-users] Re: cephfs mount problems with 5.11 kernel - not a ipv6 problem

2021-06-15 Thread Ackermann, Christoph
Dan, sorry, we have no gaps in osd numbering: isceph@ceph-deploy:~$ sudo ceph osd ls |wc -l; sudo ceph osd tree | sort -n -k1 |tail 76 [..] 73ssd0.28600 osd.73 up 1.0 1.0 74ssd0.27689 osd.74 up 1.0

[ceph-users] Re: cephfs mount problems with 5.11 kernel - not a ipv6 problem

2021-06-15 Thread Dan van der Ster
Replying to own mail... On Tue, Jun 15, 2021 at 7:54 PM Dan van der Ster wrote: > > Hi Ilya, > > We're now hitting this on CentOS 8.4. > > The "setmaxosd" workaround fixed access to one of our clusters, but > isn't working for another, where we have gaps in the osd ids, e.g. > > # ceph osd

[ceph-users] problem using gwcli; package dependancy lockout

2021-06-15 Thread Philip Brown
I'm trying to update a ceph octopus install, to add an iscsi gateway, using ceph-ansible, and gwcli wont run for me. The ansible run went well.. but when I try to actually use gwcli, I get (blahblah) ImportError: No module named rados which isnt too surprising, since "python-rados" is not

[ceph-users] Re: cephfs mount problems with 5.11 kernel - not a ipv6 problem

2021-06-15 Thread Dan van der Ster
Hi Ilya, We're now hitting this on CentOS 8.4. The "setmaxosd" workaround fixed access to one of our clusters, but isn't working for another, where we have gaps in the osd ids, e.g. # ceph osd getmaxosd max_osd = 553 in epoch 691642 # ceph osd tree | sort -n -k1 | tail 541 ssd 0.87299

[ceph-users] Re: Issues with Ceph network redundancy using L2 MC-LAG

2021-06-15 Thread Anthony D'Atri
> On Jun 15, 2021, at 10:26 AM, Andrew Walker-Brown > wrote: > > With an unstable link/port you could see the issues you describe. Ping > doesn’t have the packet rate for you to necessarily have a packet in transit > at exactly the same time as the port fails temporarily. Iperf on the

[ceph-users] Re: Issues with Ceph network redundancy using L2 MC-LAG

2021-06-15 Thread Andrew Walker-Brown
With an unstable link/port you could see the issues you describe. Ping doesn’t have the packet rate for you to necessarily have a packet in transit at exactly the same time as the port fails temporarily. Iperf on the other hand could certainly show the issue, higher packet rate and more

[ceph-users] Re: Issues with Ceph network redundancy using L2 MC-LAG

2021-06-15 Thread huxia...@horebdata.cn
When i pull out the cable, then the bond is working properly. Does it mean that the port is somehow flapping? Ping can still work, but the iperf test yields very low results. huxia...@horebdata.cn From: Serkan Çoban Date: 2021-06-15 18:47 To: huxia...@horebdata.cn CC: ceph-users Subject:

[ceph-users] Re: Failover with 2 nodes

2021-06-15 Thread Jamie Fargen
This also sounds like a possible GlusterFS use case. Regards, -Jamie On Tue, Jun 15, 2021 at 12:30 PM Burkhard Linke < burkhard.li...@computational.bio.uni-giessen.de> wrote: > Hi, > > On 15.06.21 16:15, Christoph Brüning wrote: > > Hi, > > > > That's right! > > > > We're currently evaluating a

[ceph-users] Re: Issues with Ceph network redundancy using L2 MC-LAG

2021-06-15 Thread Serkan Çoban
Do you observe the same behaviour when you pull a cable? Maybe a flapping port might cause this kind of behaviour, other than that you should't see any network disconnects. Are you sure about LACP configuration, what is the output of 'cat /proc/net/bonding/bond0' On Tue, Jun 15, 2021 at 7:19 PM

[ceph-users] Re: Issues with Ceph network redundancy using L2 MC-LAG

2021-06-15 Thread huxia...@horebdata.cn
My big worry is about, when a single link under a bond breaks, it breaks hardly such that the whole bond does not work. How to make it "failover" in such cases? best regards, samuel huxia...@horebdata.cn From: Anthony D'Atri Date: 2021-06-15 18:22 To: huxia...@horebdata.cn Subject: Re:

[ceph-users] Re: Failover with 2 nodes

2021-06-15 Thread Burkhard Linke
Hi, On 15.06.21 16:15, Christoph Brüning wrote: Hi, That's right! We're currently evaluating a similar setup with two identical HW nodes (on two different sites), with OSD, MON and MDS each, and both nodes have CephFS mounted. The goal is to build a minimal self-contained shared

[ceph-users] Re: ceph PGs issues

2021-06-15 Thread Reed Dier
Note: I am not entirely sure here, and would love other input from the ML about this, so take this with a grain of salt. You don't show any unfound objects, which I think is excellent news as far as data loss. >>96 active+clean+scrubbing+deep+repair The deep scrub + repair seems

[ceph-users] Issues with Ceph network redundancy using L2 MC-LAG

2021-06-15 Thread huxia...@horebdata.cn
Dear Cephers, I encountered the following networking issue several times, and i wonder whether there is a solution for networking HA solution. We build ceph using L2 multi chassis link aggregation group (MC-LAG ) to provide switch redundancy. On each host, we use 802.3ad, LACP mode for NIC

[ceph-users] Re: CephFS mount fails after Centos 8.4 Upgrade

2021-06-15 Thread Dan van der Ster
Looks like this: https://tracker.ceph.com/issues/51112 On Tue, Jun 15, 2021 at 5:48 PM Ackermann, Christoph wrote: > > Hello all, > > after upgrading Centos clients to version 8.4 CephFS ( Kernel > 4.18.0-305.3.1.el8 ) mount did fail. Message: *mount error 110 = > Connection timed out* >

[ceph-users] CephFS mount fails after Centos 8.4 Upgrade

2021-06-15 Thread Ackermann, Christoph
Hello all, after upgrading Centos clients to version 8.4 CephFS ( Kernel 4.18.0-305.3.1.el8 ) mount did fail. Message: *mount error 110 = Connection timed out* ..unfortunately the kernel log was flooded with zeros... :-( The monitor connection seems to be ok, but libceph said: kernel: libceph:

[ceph-users] Re: Strategy for add new osds

2021-06-15 Thread DHilsbos
Personally, when adding drives like this, I set noin (ceph osd set noin), and norebalance (ceph osd set norebalance). Like your situation, we run smaller clusters; our largest cluster only has 18 OSDs. That keeps the cluster from starting data moves until all new drives are in place. Don't

[ceph-users] Re: Failover with 2 nodes

2021-06-15 Thread nORKy
Hi, Thank you guys. I deployed a third monitor and failover works. Thanks you Le mar. 15 juin 2021 à 16:15, Christoph Brüning < christoph.bruen...@uni-wuerzburg.de> a écrit : > Hi, > > That's right! > > We're currently evaluating a similar setup with two identical HW nodes > (on two different

[ceph-users] Strategy for add new osds

2021-06-15 Thread Jorge JP
Hello, I have a ceph cluster with 5 nodes (1 hdd each node). I want to add 5 more drives (hdd) to expand my cluster. What is the best strategy for this? I will add each drive in each node but is a good strategy add one drive and wait to rebalance the data to new osd for add new osd? or maybe..

[ceph-users] Re: ceph PGs issues

2021-06-15 Thread Reed Dier
You have incomplete PGs, which means you have inactive data, because the data isn't there. This will typically only happen when you have multiple concurrent disk failures, or something like that, so I think there is some missing info. >1 osds exist in the crush map but not in the

[ceph-users] Re: Failover with 2 nodes

2021-06-15 Thread Christoph Brüning
Hi, That's right! We're currently evaluating a similar setup with two identical HW nodes (on two different sites), with OSD, MON and MDS each, and both nodes have CephFS mounted. The goal is to build a minimal self-contained shared filesystem that remains online during planned updates and

[ceph-users] NFS Ganesha ingress parameter not valid?

2021-06-15 Thread Oliver Weinmann
Dear All, I have deployed the latest CEPH Pacific release in my lab and started to check out the new ?stable? NFS Ganesha features. First of all I'm a bit confused which method to actually use to deploy the NFS cluster: cephadm or ceph nfs cluster create? I used "nfs cluster create"

[ceph-users] Re: Failover with 2 nodes

2021-06-15 Thread mhnx
It's easy. The problem ise OSD's are still up because there is not enough down mon_osd_min_down_reporters and due to this problem MDS is stucking. The solution is "mon_osd_min_down_reporters = 1" Due to "two node" cluster and "replicated 2" with "chooseleaf host" the reporter count should be set

[ceph-users] Re: Failover with 2 nodes

2021-06-15 Thread Robert Sander
On 15.06.21 15:16, nORKy wrote: > Why is there no failover ?? Because only one MON out of two is not in the majority to build a quorum. Regards -- Robert Sander Heinlein Support GmbH Schwedter Str. 8/9b, 10119 Berlin https://www.heinlein-support.de Tel: 030 / 405051-43 Fax: 030 / 405051-19

[ceph-users] Failover with 2 nodes

2021-06-15 Thread nORKy
Hi, I'm building a lab with virtual machines. I build a set up with only 2 nodes, 2 osd per nodes and I have a host that use mount.cephfs Each 2 ceph nodes runs services mon + mgr + mds and has cephadm command. If I stop a node, all commands hang. Can't use dashboard, can't use ceph -s or any

[ceph-users] Re: Ceph Month June Schedule Now Available

2021-06-15 Thread Mike Perez
Hi everyone, Here's today's schedule for Ceph Month: 9:00ET / 15:00 CEST Dashboard Update [Ernesto] 9:30 ET / 15:30 CEST [lightning] RBD latency with QD=1 bs=4k [Wido, den Hollander] 9:40 ET / 15:40 CEST [lightning] From Open Source to Open Ended in Ceph with Lua [Yuval Lifshitz] Full

[ceph-users] Re: ceph df: pool stored vs bytes_used -- raw or not?

2021-06-15 Thread Konstantin Shalygin
Fired https://tracker.ceph.com/issues/51223 k > On 9 Jun 2021, at 13:20, Igor Fedotov wrote: > > Should we fire another ticket for that? ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] ceph PGs issues

2021-06-15 Thread Aly, Adel
Dears, We have a ceph cluster with 4096 PGs out of with +100 PGs are not active+clean. On top of the ceph cluster, we have a ceph FS, with 3 active MDS servers. It seems that we can’t get all the files out of it because of the affected PGs. The object store has more than 400 million objects.

[ceph-users] Re: Module 'devicehealth' has failed:

2021-06-15 Thread Torkil Svensgaard
Hi Thanks, I guess this might have something to do with it: " Jun 15 09:44:22 dcn-ceph-01 bash[3278]: debug 2021-06-15T09:44:22.507+ 7f704e4b3700 -1 mgr notify devicehealth.notify: Jun 15 09:44:22 dcn-ceph-01 bash[3278]: debug 2021-06-15T09:44:22.507+ 7f704e4b3700 -1 mgr notify

[ceph-users] Re: Module 'devicehealth' has failed:

2021-06-15 Thread Sebastian Wagner
Hi Torkil, you should see more information in the MGR log file. Might be an idea to restart the MGR to get some recent logs. Am 15.06.21 um 09:41 schrieb Torkil Svensgaard: Hi Looking at this error in v15.2.13: " [ERR] MGR_MODULE_ERROR: Module 'devicehealth' has failed:     Module

[ceph-users] Re: Upgrading ceph to latest version, skipping minor versions?

2021-06-15 Thread Janne Johansson
Den mån 14 juni 2021 kl 22:48 skrev Matt Larson : > > Looking at the documentation for ( > https://docs.ceph.com/en/latest/cephadm/upgrade/) - I have a question on > whether you need to sequentially upgrade for each minor versions, 15.2.1 -> > 15.2.3 -> ... -> 15.2.XX? > > Can you safely upgrade

[ceph-users] Module 'devicehealth' has failed:

2021-06-15 Thread Torkil Svensgaard
Hi Looking at this error in v15.2.13: " [ERR] MGR_MODULE_ERROR: Module 'devicehealth' has failed: Module 'devicehealth' has failed: " It used to work. Since the module is always on I can't seem to restart it and I've found no clue as to why it failed. I've tried rebooting all hosts to no