[ceph-users] Re: pg stuck in unknown state

2020-08-26 Thread steven prothero
Hello, I started a new fresh ceph cluster and have the exact same problem and also the slow op warnings. I found this bug report that seems to be about this problem: https://158.69.68.89/issues/46743 "... mgr/devicehealth: device_health_metrics pool gets created even without any OSDs in the

[ceph-users] Re: Infiniband support

2020-08-26 Thread Andrei Mikhailovsky
Rafael, We've been using ceph with ipoib for over 7 years and it's been supported. However, I am not too sure of the the native rdma support. There has been discussions on/off for a while now, but I've not seen much. Perhaps others know. Cheers > From: "Rafael Quaglio" > To: "ceph-users"

[ceph-users] Re: Infiniband support

2020-08-26 Thread Paul Mezzanini
We've used RDMA via RoCEv2 on 100GbE. It ran in production that way for at least 6 months before I had to turn it off when doing some migrations using hardware that didn't support it. We noticed no performance change in our environment so once we were done I just never turned it back on. I'm

[ceph-users] Re: iSCSI gateways in nautilus dashboard in state down

2020-08-26 Thread Ricardo Marques
Hi Willi, Check the 'iscsi-gateway.cfg' file on your iSCSI gateways to make sure that the mgr IP (where the dashboard is running) is included in the 'trusted_ip_list' config. After adding the IP to the config file, you need to restart the 'rbd-target-api' service. Ricardo Marques

[ceph-users] Re: anyone using ceph csi

2020-08-26 Thread Jason Dillaman
On Wed, Aug 26, 2020 at 10:33 AM Marc Roos wrote: > > >> > >> > >> I was wondering if anyone is using ceph csi plugins[1]? I would like > to > >> know how to configure credentials, that is not really described for > >> testing on the console. > >> > >> I am running > >> ./csiceph

[ceph-users] Re: slow "rados ls"

2020-08-26 Thread Stefan Kooman
On 2020-08-26 15:20, Marcel Kuiper wrote: > Hi Vladimir, > > no it is the same on all monitors. Actually I got triggered because I got > slow responses on my rados gateway with the radosgw-admin command and > narrowed it down to slow respons for rados commands anywhere in the > cluster. Do you

[ceph-users] Fwd: Upgrade Path Advice Nautilus (CentOS 7) -> Octopus (new OS)

2020-08-26 Thread Cloud Guy
Hello, Looking for a bit of guidance / approach to upgrading from Nautilus to Octopus considering CentOS and Ceph-Ansible. We're presently running a Nautilus cluster (all nodes / daemons 14.2.11 as of this post). - There are 4 monitor-hosts with mon, mgr, and dashboard functions consolidated; -

[ceph-users] Re: anyone using ceph csi

2020-08-26 Thread Marc Roos
>> >> >> I was wondering if anyone is using ceph csi plugins[1]? I would like to >> know how to configure credentials, that is not really described for >> testing on the console. >> >> I am running >> ./csiceph --endpoint unix:///tmp/mesos-csi-XSJWlY/endpoint.sock --type >> rbd

[ceph-users] Re: slow "rados ls"

2020-08-26 Thread Wido den Hollander
On 26/08/2020 15:59, Stefan Kooman wrote: On 2020-08-26 15:20, Marcel Kuiper wrote: Hi Vladimir, no it is the same on all monitors. Actually I got triggered because I got slow responses on my rados gateway with the radosgw-admin command and narrowed it down to slow respons for rados

[ceph-users] Re: iSCSI gateways in nautilus dashboard in state down

2020-08-26 Thread Willi Schiegel
On 8/26/20 3:56 PM, Jason Dillaman wrote: On Wed, Aug 26, 2020 at 9:15 AM Willi Schiegel wrote: Hello All, I have a Nautilus (14.2.11) cluster which is running fine on CentOS 7 servers. 4 OSD nodes, 3 MON/MGR hosts. Now I wanted to enable iSCSI gateway functionality to be used by some

[ceph-users] Re: anyone using ceph csi

2020-08-26 Thread Jason Dillaman
On Wed, Aug 26, 2020 at 10:11 AM Marc Roos wrote: > > > > I was wondering if anyone is using ceph csi plugins[1]? I would like to > know how to configure credentials, that is not really described for > testing on the console. > > I am running > ./csiceph --endpoint

[ceph-users] anyone using ceph csi

2020-08-26 Thread Marc Roos
I was wondering if anyone is using ceph csi plugins[1]? I would like to know how to configure credentials, that is not really described for testing on the console. I am running ./csiceph --endpoint unix:///tmp/mesos-csi-XSJWlY/endpoint.sock --type rbd --drivername rbd.csi.ceph.com

[ceph-users] Re: iSCSI gateways in nautilus dashboard in state down

2020-08-26 Thread Jason Dillaman
On Wed, Aug 26, 2020 at 9:15 AM Willi Schiegel wrote: > > Hello All, > > I have a Nautilus (14.2.11) cluster which is running fine on CentOS 7 > servers. 4 OSD nodes, 3 MON/MGR hosts. Now I wanted to enable iSCSI > gateway functionality to be used by some Solaris and FreeBSD clients. I > followed

[ceph-users] Re: RandomCrashes on OSDs Attached to Mon Hosts with Octopus

2020-08-26 Thread Igor Fedotov
just to add a hypothesis why mon hosts are affected only  - higher memory utilization at these nodes is what causes disk reading failures to appear. RAM leakage (or excessive utilization) in MON processes or something? On 8/26/2020 4:29 PM, Igor Fedotov wrote: Hi Denis, this reminds me

[ceph-users] Re: RandomCrashes on OSDs Attached to Mon Hosts with Octopus

2020-08-26 Thread Igor Fedotov
Hi Denis, this reminds me the following ticket: https://tracker.ceph.com/issues/37282 Please note they mentioned co-location with mon in comment #29. Working hypothesis for this ticket is the interim disk read failures which cause RocksDB checksum failures. Earlier we observed such a

[ceph-users] Re: slow "rados ls"

2020-08-26 Thread Marcel Kuiper
Hi Vladimir, no it is the same on all monitors. Actually I got triggered because I got slow responses on my rados gateway with the radosgw-admin command and narrowed it down to slow respons for rados commands anywhere in the cluster. The cluster is not that busy and all osds and monitors use

[ceph-users] iSCSI gateways in nautilus dashboard in state down

2020-08-26 Thread Willi Schiegel
Hello All, I have a Nautilus (14.2.11) cluster which is running fine on CentOS 7 servers. 4 OSD nodes, 3 MON/MGR hosts. Now I wanted to enable iSCSI gateway functionality to be used by some Solaris and FreeBSD clients. I followed the instructions under

[ceph-users] Re: slow "rados ls"

2020-08-26 Thread Vladimir Sigunov
Hi Marcel, If this issue related to only one monitor? If yes, check the overall node status: average load, disk I/O, RAM consumption, swap size, etc. Could be not a ceph-related issue. Regards, Vladimir. On Wed, Aug 26, 2020 at 9:07 AM Marcel Kuiper wrote: > Hi > > One of my clusters running

[ceph-users] Re: Infiniband support

2020-08-26 Thread Fabrizio Cuseo
I used ceph with proxmox server and IP over Infiniband without any problem. - Il 26-ago-20, alle 15:08, Rafael Quaglio ha scritto: > Hi, > I could not see in the doc if Ceph has infiniband support. Is there someone > using it? > Also, is there any rdma support working natively? > Can

[ceph-users] Re: cephfs needs access from two networks

2020-08-26 Thread Janne Johansson
Den ons 26 aug. 2020 kl 14:16 skrev Simon Sutter : > Hello, > So I know, the mon services can only bind to just one ip. > But I have to make it accessible to two networks because internal and > external servers have to mount the cephfs. > The internal ip is 10.99.10.1 and the external is some

[ceph-users] Infiniband support

2020-08-26 Thread Rafael Quaglio
Hi,      I could not see in the doc if Ceph has infiniband support. Is there someone using it?      Also, is there any rdma support working natively?      Can anyoune point me where to find more information about it? Thanks, Rafael.___ ceph-users

[ceph-users] slow "rados ls"

2020-08-26 Thread Marcel Kuiper
Hi One of my clusters running nautilus 14.2.8 is very slow (13 seconds or so where my other clusters are returning almost instantanious) when doing a 'rados --pool rc3-se.rgw.buckets.index ls' from one of the monitors. I checked - ceph status => OK - routing to/from osds ok (I see a lot of

[ceph-users] Storage class usage stats

2020-08-26 Thread Tobias Urdin
Hello, I've been trying to understand if there is any way to get usage information based on storage classes for buckets. Since there is no information available from the "radosgw-admin bucket stats" commands nor any other endpoint I tried to browse the source code but couldn't find any

[ceph-users] RandomCrashes on OSDs Attached to Mon Hosts with Octopus

2020-08-26 Thread Denis Krienbühl
Hi! We've recently upgraded all our clusters from Mimic to Octopus (15.2.4). Since then, our largest cluster is experiencing random crashes on OSDs attached to the mon hosts. This is the crash we are seeing (cut for brevity, see links in post scriptum): { "ceph_version": "15.2.4",

[ceph-users] cephfs needs access from two networks

2020-08-26 Thread Simon Sutter
Hello, So I know, the mon services can only bind to just one ip. But I have to make it accessible to two networks because internal and external servers have to mount the cephfs. The internal ip is 10.99.10.1 and the external is some public-ip. I tried nat'ing it with this: "firewall-cmd

[ceph-users] can not remove orch service

2020-08-26 Thread Ml Ml
Hello, root@ceph02:~# ceph orch ps NAME HOSTSTATUS REFRESHED AGE VERSIONIMAGE NAME IMAGE ID CONTAINER ID mgr.ceph01ceph01 running (18m) 6s ago 4w 15.2.4 docker.io/ceph/ceph:v15.2.4 54fa7e66fb03 7deebe09f6fd

[ceph-users] Re: Undo ceph osd destroy?

2020-08-26 Thread Eugen Block
Hi, I don't know if the ceph version is relevant here but I could undo that quite quickly in my small test cluster (Octopus native, no docker). After the OSD was marked as "destroyed" I recreated the auth caps for that OSD_ID (marking as destroyed removes cephx keys etc.), changed the

[ceph-users] Re: Persistent problem with slow metadata

2020-08-26 Thread Eugen Block
Hi, root@cephosd01:~# ceph config get mds.cephosd01 osd_op_queue wpq root@0cephosd01:~# ceph config get mds.cephosd01 osd_op_queue_cut_off high just to make sure, I referred to OSD not MDS settings, maybe check again? I wouldn't focus too much on the MDS service, 64 GB RAM should be

[ceph-users] transit upgrade qithout mgr

2020-08-26 Thread Dzianis Kahanovich
I have production cluster under jewel with rbd & mds under Gentoo. Building luminous with mgr now problematic (mostly by dropping python 3.5 on the eclass level). But for nautilus/etc I must go over luminous as transit. Can I temporary use luminous without mgr (at least to wait for scrub)? What