[ceph-users] Re: we're living in 2005.

2021-07-26 Thread Fyodor Ustinov
Hi! > docs.ceph.io ? If there’s something that you’d like to see added there, > you’re > welcome to submit a tracker ticket, or write to me privately. It is not > uncommon for documentation enhancements to be made based on mailing list > feedback. Documentation... Try to install a completely

[ceph-users] Re: Is there any way to obtain the maximum number of node failure in ceph without data loss?

2021-07-26 Thread Josh Baergen
Hi Jerry, I think this is one of those "there must be something else going on here" situations; marking any OSD out should affect only that one "slot" in the acting set, at least until backfill completes (and in my experience has always been the case). It might be worth inspecting the cluster log

[ceph-users] Re: we're living in 2005.

2021-07-26 Thread Joshua West
I want to chime in here as well. I am a relatively new Ceph user who learned about Ceph through my use of Proxmox. I have two small 5 node Ceph/Proxmox clusters in two different locations (to play with mirroring), and a mere 300TB of combined storage. This is a hobby for me. I find Ceph really

[ceph-users] Re: we're living in 2005.

2021-07-26 Thread Anthony D'Atri
>> Are they too focused on having Ceph consultants to fix your problems or >> do they actually want to build a community to share knowledge? >> > > I am also a little worried about this strategy. You can see also that redhat > is putting it's information pages behind a login. Now they

[ceph-users] Re: we're living in 2005.

2021-07-26 Thread Виталий Филиппов
Google groups is only like 2007. Use Telegram @ceph_users and/or @ceph_ru :-) 24 июля 2021 г. 0:27:21 GMT+03:00, y...@deepunix.net пишет: >I feel like ceph is living in 2005. It's quite hard to find help on >issues related to ceph and it's almost impossible to get involved into >helping others.

[ceph-users] Re: Did standby dashboards stop redirecting to the active one?

2021-07-26 Thread Harry G. Coin
On 7/26/21 12:02 PM, Ernesto Puerta wrote: > Hi Harry, > > No, that feature is still there. There's been a recent thread in this > mailing list (please see "Pacific 16.2.5 Dashboard minor regression >

[ceph-users] Re: we're living in 2005.

2021-07-26 Thread Brad Hubbard
On Tue, Jul 27, 2021 at 3:49 AM Marc wrote: > > > > I feel like ceph is living in 2005. > > No it is just you. Why don't you start reading > https://docs.ceph.com/en/latest/ > > >It's quite hard to find help on > > issues related to ceph and it's almost impossible to get involved into > >

[ceph-users] Re: #ceph in Matrix [was: Re: we're living in 2005.]

2021-07-26 Thread Brad Hubbard
On Tue, Jul 27, 2021 at 5:53 AM Nico Schottelius wrote: > > > Good evening dear mailing list, > > while I do think we have a great mailing list (this is one of the most > helpful open source mailing lists I'm subscribed to), I do agree with > the ceph IRC channel not being so helpful. The

[ceph-users] Re: #ceph in Matrix [was: Re: we're living in 2005.]

2021-07-26 Thread Marc
> In case you don't have a matrix account yet, you can find more > information about it on https://ungleich.ch/u/projects/open-chat/. Yes! Matrix, if they finally fix the reverse proxy functionality, I will be the first to join :) ___ ceph-users

[ceph-users] #ceph in Matrix [was: Re: we're living in 2005.]

2021-07-26 Thread Nico Schottelius
Good evening dear mailing list, while I do think we have a great mailing list (this is one of the most helpful open source mailing lists I'm subscribed to), I do agree with the ceph IRC channel not being so helpful. The join/leave messages on most days significantly exceeds the number of real

[ceph-users] Re: we're living in 2005.

2021-07-26 Thread Marc
> > Hi Marc, seems like you had a bad night's sleep right? No, no, no I am really like that all the time ;) > it is really hard to find support or > solutions online. What do you even mean by this? > Also the documentation page is lacking explaining > basic knowledge about Ceph concepts

[ceph-users] Re: unable to map device with krbd on el7 with ceph nautilus

2021-07-26 Thread cek+ceph
Have found the problem. All this was caused by missing mon_host directive in ceph.conf. I have expected userspace to catch this, but it looks like it didn't care. We use DNS SRV in this cluster. With mon_host directive reinstated, it was able to connect: Jul 26 09:51:40 xx kernel: libceph:

[ceph-users] Re: we're living in 2005.

2021-07-26 Thread Yosh de Vos
Hi Marc, seems like you had a bad night's sleep right? There is just so much wrong with that reply. I think Yuri has a valid point, it is really hard to find support or solutions online. Also the documentation page is lacking explaining basic knowledge about Ceph concepts which would be helpful

[ceph-users] Re: [EXTERNAL] Re: OSDs flapping with "_open_alloc loaded 132 GiB in 2930776 extents available 113 GiB"

2021-07-26 Thread Dave Piper
Hi Igor, Thanks for your time looking into this. I've attached a 5 minute window of OSD logs, which includes several restart attempt (each one takes ~25 seconds). When I said it looked like we were starting up in a different state, I'm referring to how "Recovered from manifest file" log

[ceph-users] we're living in 2005.

2021-07-26 Thread yuri
I feel like ceph is living in 2005. It's quite hard to find help on issues related to ceph and it's almost impossible to get involved into helping others. There's a BBS aka Mailman maillist, which is from 1980 era and there's an irc channel that's dead. Why not set a Q board up or a google

[ceph-users] Re: [EXTERNAL] Re: OSDs flapping with "_open_alloc loaded 132 GiB in 2930776 extents available 113 GiB"

2021-07-26 Thread Dave Piper
Hi Igor, > So to get more verbose but less log one can set both debug-bluestore and > debug-bluefs to 1/20. ... More verbose logging attached. I've trimmed the file to a single restart attempt to keep the filesize down; let me know if there's not enough here. > It would be also great to

[ceph-users] Re: we're living in 2005.

2021-07-26 Thread Marc
> I feel like ceph is living in 2005. No it is just you. Why don't you start reading https://docs.ceph.com/en/latest/ >It's quite hard to find help on > issues related to ceph and it's almost impossible to get involved into > helping others. ???, Just click the reply button, you must be able

[ceph-users] [Kolla][wallaby] add new cinder backend

2021-07-26 Thread Ignazio Cassano
Hello All, I am playing witk kolla wallaby on ubuntu 20.04. When I add a new backend type, volume container stop to work and continue to restarting and all instances are stopped. I can solve only restarting one controller at a time. This morning I had cinder configurated for nfs netapp with 24

[ceph-users] Re: Did standby dashboards stop redirecting to the active one?

2021-07-26 Thread Ernesto Puerta
Hi Harry, No, that feature is still there. There's been a recent thread in this mailing list (please see "Pacific 16.2.5 Dashboard minor regression ") about an unrelated change in cephadm that might

[ceph-users] Re: unable to map device with krbd on el7 with ceph nautilus

2021-07-26 Thread Ilya Dryomov
On Mon, Jul 26, 2021 at 5:25 PM wrote: > > Have found the problem. All this was caused by missing mon_host directive in > ceph.conf. I have expected userspace to catch this, but it looks like it > didn't care. We should probably add an explicit check for that so that the error message is

[ceph-users] Re: octopus garbage collector makes slow ops

2021-07-26 Thread Mark Nelson
Yeah, I suspect that regular manual compaction might be the necessary work around here if tombstones are slowing down iterator performance.  If it is related to tombstones, it would be similar to what we saw when we tried to use deleterange and saw similar performance issues. I'm a little at

[ceph-users] Deployment Method of Octopus and Pacific

2021-07-26 Thread Xiaolong Jiang
Hi Ceph Users, We are currently deploying Nautilus using ceph-deploy in spinnaker. However the newer version of Ceph is not supporting ceph-deploy. Is there anyone having experience deploying Octopus/Pacific using spinnaker? -- Best regards, Xiaolong Jiang Senior Software Engineer at

[ceph-users] [ceph][cephadm] Cluster recovery after reboot 1 node

2021-07-26 Thread Gargano Andrea
Hi all, we had deployed a cluster ceph with three nodes pacific with ubuntu 20.04, after we had tryed to restart one node, but when it’s comes up we see: root@tst2-ceph01:~# ceph status cluster: id: be115adc-edf0-11eb-8509-c5c80111fd98 health: HEALTH_WARN 6 failed

[ceph-users] Re: 1/3 mons down! mon do not rejoin

2021-07-26 Thread Ansgar Jazdzewski
Yes, the empty DB told me that at this point I had no other choice than recreate the entire mon service. * remove broken mon ceph mon remove $(hostname -s) * mon preparation done rm -rf /var/lib/ceph/mon/ceph-$(hostname -s) mkdir /var/lib/ceph/mon/ceph-$(hostname -s) ceph auth get mon.

[ceph-users] Re: unable to map device with krbd on el7 with ceph nautilus

2021-07-26 Thread cek+ceph
Although I appreciate the responses, they have provided zero help solving this issue thus far. It seems like the kernel module doesn't even get to the stage where it reads the attributes/features of the device. It doesn't know where to connect and, presumably, is confused by the options passed

[ceph-users] Did standby dashboards stop redirecting to the active one?

2021-07-26 Thread Harry G. Coin
Somewhere between Nautilus and Pacific the hosts running standby managers, which previously would redirect browsers to the currently active mgr/dashboard, seem to have stopped doing that.   Is that a switch somewhere?  Or was I just happily using an undocumented feature? Thanks Harry Coin

[ceph-users] Re: unable to map device with krbd on el7 with ceph nautilus

2021-07-26 Thread Dimitri Savineau
Hi, > As Marc mentioned, you would need to disable unsupported features but > you are right that the kernel doesn't make it to that point. I remember disabling unsupported features on el7 nodes (kernel 3.10) with Nautilus. But the error on the map command is usually more obvious. $ rbd feature

[ceph-users] Re: [ceph][cephadm] Cluster recovery after reboot 1 node

2021-07-26 Thread Ignazio Cassano
We solved our issuewe got a dirty lvm configuration and cleaned it. Now it is working fine Ignazio Il giorno lun 26 lug 2021 alle ore 13:25 Ignazio Cassano < ignaziocass...@gmail.com> ha scritto: > Hello, I want to add further information I found for the issue described > by Andrea: >

[ceph-users] Re: 1/3 mons down! mon do not rejoin

2021-07-26 Thread Dan van der Ster
Your log ends with > 2021-07-25 06:46:52.078 7fe065f24700 1 mon.osd01@0(leader).osd e749666 > do_prune osdmap full prune enabled So mon.osd01 was still the leader at that time. When did it leave the cluster? > I also found that the rocksdb on osd01 is only 1MB in size and 345MB on the >

[ceph-users] R: [ceph] [pacific] cephadm cannot create OSD

2021-07-26 Thread Gargano Andrea
Hi Dimitri, that's works for me! Thank you, Andrea Da: Gargano Andrea Inviato: venerdì 23 luglio 2021 17:48 A: Dimitri Savineau Cc: ceph-users@ceph.io Oggetto: Re: [ceph-users] [ceph] [pacific] cephadm cannot create OSD Hi Dimitri, Thank you, I'll retry and I'll let you know on monday.

[ceph-users] Re: unable to map device with krbd on el7 with ceph nautilus

2021-07-26 Thread Ilya Dryomov
On Mon, Jul 26, 2021 at 12:39 PM wrote: > > Although I appreciate the responses, they have provided zero help solving > this issue thus far. > It seems like the kernel module doesn't even get to the stage where it reads > the attributes/features of the device. It doesn't know where to connect

[ceph-users] Re: ceph-users Digest, Vol 102, Issue 52

2021-07-26 Thread Eugen Block
Hi, I replied a couple of days ago (https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/IPLTK777USH2TNXM4DU4U2F2YCHWX4Z4/). Zitat von renjianxinlover : anyone has ideas? | | renjianxinlover | | renjianxinlo...@163.com | 签名由网易邮箱大师定制 On 7/22/2021 10:12,renjianxinlover wrote:

[ceph-users] Re: [EXTERNAL] Re: OSDs flapping with "_open_alloc loaded 132 GiB in 2930776 extents available 113 GiB"

2021-07-26 Thread Igor Fedotov
Dave, please see inline On 7/26/2021 1:57 PM, Dave Piper wrote: Hi Igor, So to get more verbose but less log one can set both debug-bluestore and debug-bluefs to 1/20. ... More verbose logging attached. I've trimmed the file to a single restart attempt to keep the filesize down; let me

[ceph-users] Re: ceph-users Digest, Vol 102, Issue 52

2021-07-26 Thread renjianxinlover
anyone has ideas? | | renjianxinlover | | renjianxinlo...@163.com | 签名由网易邮箱大师定制 On 7/22/2021 10:12,renjianxinlover wrote: sorry,a point was left out yesterday. Currently, the .index pool with that three OSD(.18,.19,.29) is not in use and nearly has no any data. | | renjianxinlover | |

[ceph-users] Re: RGW: LC not deleting expired files

2021-07-26 Thread Paul JURCO
Hi Vidushi, aws s3api list-object-versions shows the same files as s3cmd, so I would say versioning is not enabled. aws s3api get-bucket-versioning result is empty. Is there any other method to check if versioning is enabled? Thank you! Paul On Mon, Jul 26, 2021 at 2:42 PM Vidushi Mishra wrote:

[ceph-users] Re: 1/3 mons down! mon do not rejoin

2021-07-26 Thread Ansgar Jazdzewski
Hi Dan, Hi Folks, this is how things started, I also found that the rocksdb on osd01 is only 1MB in size and 345MB on the other mons! 2021-07-25 06:46:30.029 7fe061f1c700 0 log_channel(cluster) log [DBG] : monmap e1: 3 mons at

[ceph-users] Re: octopus garbage collector makes slow ops

2021-07-26 Thread Igor Fedotov
Hi Mahnoosh! Unfortunately I'm not an expert in RGW hence nothing to recommend from this side. Apparently your issues are caused by bulk data removal - it appears that RocksDB can hardly sustain such things and its performance degrades. We've seen that plenty of times before. So far

[ceph-users] Re: RGW: LC not deleting expired files

2021-07-26 Thread Vidushi Mishra
Hi Paul, Are these non-current versioned objects displayed in the bucket stats? Also, the LC rule applied to the bucket can only delete/expire objects for a normal bucket. In the case of a versioned bucket, the LC rule applied will expire the current version [create a delete-marker for every

[ceph-users] Re: [ceph][cephadm] Cluster recovery after reboot 1 node

2021-07-26 Thread Ignazio Cassano
Hello, I want to add further information I found for the issue described by Andrea: ephadm.log:2021-07-26 13:07:11,281 DEBUG /usr/bin/docker: stderr Error: No such object: ceph-be115adc-edf0-11eb-8509-c5c80111fd98-osd.11 cephadm.log:2021-07-26 13:07:11,654 DEBUG /usr/bin/docker: stderr Error: No

[ceph-users] RGW: LC not deleting expired files

2021-07-26 Thread Paul JURCO
Hi! I need some help understanding LC processing. On latest versions of octopus installed (tested with 15.2.13 and 15.2.8) we have at least one bucket which is not having the files removed when expiring. The size of the bucket reported with radosgw-admin compared with the one obtained with s3cmd

[ceph-users] Re: octopus garbage collector makes slow ops

2021-07-26 Thread mahnoosh shahidi
Thanks for your help. Our hdd osd have separate nvme disks for DB use. On Mon, Jul 26, 2021 at 3:49 PM Igor Fedotov wrote: > Unfortunately I'm not an expert in RGW hence nothing to recommend from > that side. > > Apparently your issues are caused by bulk data removal - it appears that > RocksDB

[ceph-users] Re: octopus garbage collector makes slow ops

2021-07-26 Thread Igor Fedotov
Unfortunately I'm not an expert in RGW hence nothing to recommend from that side. Apparently your issues are caused by bulk data removal - it appears that RocksDB can hardly sustain such things and its performance degrades. We've seen that plenty of times before. So far there are two known

[ceph-users] Re: How to set retention on a bucket?

2021-07-26 Thread Konstantin Shalygin
> On 26 Jul 2021, at 08:05, Szabo, Istvan (Agoda) > wrote: > > Haven't really found how to set the retention on a s3 bucket for a specific > day. Is there any ceph document about it? Is not possible to set retention on specific days, only at +days from putObject day. LC policy is highly

[ceph-users] Re: [EXTERNAL] Re: OSDs flapping with "_open_alloc loaded 132 GiB in 2930776 extents available 113 GiB"

2021-07-26 Thread Igor Fedotov
Hi Dave, Some notes first: 1) The following behavior is fine, BlueStore mounts in two stages - the first one is read-only and among other things it loads allocation map from DB. And that's exactly the case here. Jul 26 08:55:31 condor_sc0 docker[15282]: 2021-07-26T08:55:31.703+

[ceph-users] How to set retention on a bucket?

2021-07-26 Thread Szabo, Istvan (Agoda)
Hi, Haven't really found how to set the retention on a s3 bucket for a specific day. Is there any ceph document about it? Istvan Szabo Senior Infrastructure Engineer --- Agoda Services Co., Ltd. e:

[ceph-users] Re: Is there any way to obtain the maximum number of node failure in ceph without data loss?

2021-07-26 Thread Jerry Lee
After doing more experiments, the outcome answer some of my questions: The environment is kind of different compared to the one mentioned in previous mail. 1) the `ceph osd tree` -2 2.06516 root perf_osd -5 0.67868 host jceph-n2-perf_osd 2ssd 0.17331 osd.2

[ceph-users] Re: unable to map device with krbd on el7 with ceph nautilus

2021-07-26 Thread Ilya Dryomov
On Fri, Jul 23, 2021 at 11:58 PM wrote: > > Hi. > > I've followed the installation guide and got nautilus 14.2.22 running on el7 > via https://download.ceph.com/rpm-nautilus/el7/x86_64/ yum repo. > I'm now trying to map a device on an el7 and getting extremely weird errors: > > # rbd info

[ceph-users] Re: Installing and Configuring RGW to an existing cluster

2021-07-26 Thread Szabo, Istvan (Agoda)
You have different ways: ceph-deploy and full manual: Full manual: RGW: on all RGW yum install ceph-radosgw -y first RGW node: ceph-authtool --create-keyring /etc/ceph/ceph.client.radosgw.keyring chown ceph:ceph /etc/ceph/ceph.client.radosgw.keyring ceph-authtool

[ceph-users] Re: 1/3 mons down! mon do not rejoin

2021-07-26 Thread Dan van der Ster
Hi, Do you have ceph-mon logs from when mon.osd01 first failed before the on-call team rebooted it? They might give a clue what happened to start this problem, which maybe is still happening now. This looks similar but it was eventually found to be a network issue: