[ceph-users] Re: Natutilus - not unmapping

2021-05-06 Thread Matthias Grandl
Hi Joe, are all PGs active+clean? If not, you will only get osdmap pruning, which will try to keep only every 10th osdmap. https://docs.ceph.com/en/latest/dev/mon-osdmap-prune/ If you have remapped PGs and need to urgently get rid of osdmaps, you can try the upmap-remapped script to get to a

[ceph-users] Natutilus - not unmapping

2021-05-06 Thread Joe Comeau
Nautilus cluster is not unmapping ceph 14.2.16 ceph report |grep "osdmap_.*_committed" report 1175349142 "osdmap_first_committed": 285562, "osdmap_last_committed": 304247, we've set osd_map_cache_size = 2 but its is slowly growing to that difference as well OSD map first

[ceph-users] Stuck OSD service specification - can't remove

2021-05-06 Thread David Orman
Has anybody run into a 'stuck' OSD service specification? I've tried to delete it, but it's stuck in 'deleting' state, and has been for quite some time (even prior to upgrade, on 15.2.x). This is on 16.2.3: NAME PORTS RUNNING REFRESHED AGE PLACEMENT osd.osd_spec

[ceph-users] Re: Slow performance and many slow ops

2021-05-06 Thread codignotto
Hello Mario, yes I am using KRBD in the proxmox, ceph is a separate cluster and proxmox another cluster, I connect the CEPH to the promox using the RBD and in the configuration of the storage I select KRBD, IO and SSD Em qui., 6 de mai. de 2021 às 18:15, Mario Giammarco escreveu: > We need

[ceph-users] Re: Slow performance and many slow ops

2021-05-06 Thread Mario Giammarco
We need more details, but are you using krbd? iothread? and so on? Il giorno gio 6 mag 2021 alle ore 22:38 codignotto ha scritto: > Hello, I have 6 hosts with 12 SSD disks on each host for a total of 72 OSD, > I am using CEPH Octopos in its latest version, the deployment was done > using ceph

[ceph-users] Slow performance and many slow ops

2021-05-06 Thread codignotto
Hello, I have 6 hosts with 12 SSD disks on each host for a total of 72 OSD, I am using CEPH Octopos in its latest version, the deployment was done using ceph admin and containers according to the dosing, we are having some problems with performance of the cluster, I mount it on a proxmox cluster

[ceph-users] Re: Ceph cluster not recover after OSD down

2021-05-06 Thread Clyso GmbH - Ceph Foundation Member
Hi Andres, does the commando work with the original rule/crushmap? ___ Clyso GmbH - Ceph Foundation Member supp...@clyso.com https://www.clyso.com Am 06.05.2021 um 15:21 schrieb Andres Rojas Guerrero: Yes, my ceph version is Nautilus: # ceph -v ceph version

[ceph-users] Re: How to find out why osd crashed with cephadm/podman containers?

2021-05-06 Thread mabi
Thank you very much for the hint regarding the log files, I wasn't aware that it still saves the logs on the host although everything is running in containers nowadays. So there was nothing in the log files but I could find out that finally the host (a RasPi4) could not cope with 2 SSD

[ceph-users] v16.2.3 Pacific released

2021-05-06 Thread David Galloway
This is the third backport release in the Pacific series. We recommend all users update to this release. Notable Changes --- * This release fixes a cephadm upgrade bug that caused some systems to get stuck in a loop restarting the first mgr daemon. Getting Ceph * Git

[ceph-users] How to find out why osd crashed with cephadm/podman containers?

2021-05-06 Thread mabi
Hello, I have a small 6 nodes Octopus 15.2.11 cluster installed on bare metal with cephadm and I added a second OSD to one of my 3 OSD nodes. I started then copying data to my ceph fs mounted with kernel mount but then both OSDs on that specific nodes crashed. To this topic I have the

[ceph-users] Re: OSD lost: firmware bug in Kingston SSDs?

2021-05-06 Thread Frank Schilder
Hi Andrew, thanks, that is reassuring. To be sure, I plan to do a few power out tests with this server. Never had any issues with that so far, its the first time I see a corrupted OSD. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14

[ceph-users] Re: orch upgrade mgr starts too slow and is terminated?

2021-05-06 Thread Robert Sander
Am 06.05.21 um 17:18 schrieb Sage Weil: > I hit the same issue. This was a bug in 16.2.0 that wasn't completely > fixed, but I think we have it this time. Kicking of a 16.2.3 build > now to resolve the problem. Great. I also hit that today. Thanks for fixing it quickly. Regards -- Robert

[ceph-users] Upgrade problem with cephadm

2021-05-06 Thread fcid
Hello ceph community, I'm trying to upgrade a pacific (v16.2.0) cluster to the last version, but the upgrading process seems to be stuck. The mgr log (debug level) does not show any significant message regarding the upgrade, other than when it is started/paused/resumed/stopped.

[ceph-users] Re: orch upgrade mgr starts too slow and is terminated?

2021-05-06 Thread Sage Weil
Hi! I hit the same issue. This was a bug in 16.2.0 that wasn't completely fixed, but I think we have it this time. Kicking of a 16.2.3 build now to resolve the problem. (Basically, sometimes docker calls the image docker.io/ceph/ceph:foo and somethings it's ceph/ceph:foo, and our attempt to

[ceph-users] orch upgrade mgr starts too slow and is terminated?

2021-05-06 Thread Kai Börnert
Hi all, upon updating to 16.2.2 via cephadm  the upgrade is being stuck on the first mgr Looking into this via docker logs I see that it is still loading modules when it is apparently terminated and restarted in a loop When pausing the update, the mgr succeeds to start with the new

[ceph-users] Write Ops on CephFS Increasing exponentially

2021-05-06 Thread Kyle Dean
Hi, hoping someone could help me get to the bottom of this particular issue I'm having. I have ceph octopus installed using ceph-ansible. Currently, I have 3 MDS servers running, and one client connected to the active MDS. I'm currently storing a very large encrypted container on the CephFS

[ceph-users] Re: Out of Memory after Upgrading to Nautilus

2021-05-06 Thread Didier GAZEN
Hi Christoph, I am currently using Nautilus on a ceph cluster with osd_memory_target defined in ceph.conf on each node. By running : ceph config get osd.40 osd_memory_target you get the default value for the parameter osd_memory_target (4294967296 for nautilus) If you change the

[ceph-users] Re: How to find out why osd crashed with cephadm/podman containers?

2021-05-06 Thread David Caro
On 05/06 14:03, mabi wrote: > Hello, > > I have a small 6 nodes Octopus 15.2.11 cluster installed on bare metal with > cephadm and I added a second OSD to one of my 3 OSD nodes. I started then > copying data to my ceph fs mounted with kernel mount but then both OSDs on > that specific nodes

[ceph-users] Re: OSD lost: firmware bug in Kingston SSDs?

2021-05-06 Thread Andrew Walker-Brown
Hi Frank, I’m running the same SSDs (approx. 20) in Dell servers on HBA330’s. Haven’t had any issues and have suffered at least one power outage. Just checking the wcache setting and it shows as enabled. Running Octopus 15.1.9 and docker containers. Originally part of a Proxmox cluster but

[ceph-users] Re: Ceph cluster not recover after OSD down

2021-05-06 Thread Andres Rojas Guerrero
Yes, my ceph version is Nautilus: # ceph -v ceph version 14.2.6 (f0aa067ac7a02ee46ea48aa26c6e298b5ea272e9) nautilus (stable) First dump the crush map: # ceph osd getcrushmap -o crush_map Then, decompile the crush map: # crushtool -d crush_map -o crush_map_d Now, edit the crush rule and

[ceph-users] RGW Beast SSL version

2021-05-06 Thread Glen Baars
Hello Ceph, Can you set the SSL min version? Such as TLS1.2? Glen This e-mail is intended solely for the benefit of the addressee(s) and any other named recipient. It is confidential and may contain legally privileged or confidential information. If you are not the recipient, any use,

[ceph-users] OSD lost: firmware bug in Kingston SSDs?

2021-05-06 Thread Frank Schilder
Hi all, I lost 2 OSDs deployed on a single Kingston SSD in a rather strange way and am wondering if anyone has made similar observations or is aware of a firmware bug with these disks. Disk model: KINGSTON SEDC500M3840G (it ought to be a DC grade model with super capacitors) Smartctl does not

[ceph-users] Ceph stretch mode enabling

2021-05-06 Thread Felix O
Hello, I'm trying to deploy my test ceph cluster and enable stretch mode ( https://docs.ceph.com/en/latest/rados/operations/stretch-mode/). My problem is enabling the stretch mode. $ ceph mon enable_stretch_mode ceph-node-05 stretch_rule

[ceph-users] Re: Ceph cluster not recover after OSD down

2021-05-06 Thread Eugen Block
Interesting, I haven't had that yet with crushtool. Your ceph version is Nautilus, right? And you did decompile the binary crushmap with crushtool, correct? I don't know how to reproduce that. Zitat von Andres Rojas Guerrero : I have this error when try to show mappings with crushtool: #

[ceph-users] Re: Ceph cluster not recover after OSD down

2021-05-06 Thread Andres Rojas Guerrero
I have this error when try to show mappings with crushtool: # crushtool -i crush_map_new --test --rule 2 --num-rep 7 --show-mappings CRUSH rule 2 x 0 [-5,-45,-49,-47,-43,-41,-29] *** Caught signal (Segmentation fault) ** in thread 7f7f7a0ccb40 thread_name:crushtool El 6/5/21 a las 13:47,

[ceph-users] Re: Ceph cluster not recover after OSD down

2021-05-06 Thread Andres Rojas Guerrero
Ok, thank you very much for the answer. El 6/5/21 a las 13:47, Eugen Block escribió: > Yes it is possible, but you should validate it with crushtool before > injecting it to make sure the PGs land where they belong. > > crushtool -i crushmap.bin --test --rule 2 --num-rep 7 --show-mappings >

[ceph-users] Re: Ceph cluster not recover after OSD down

2021-05-06 Thread Eugen Block
Yes it is possible, but you should validate it with crushtool before injecting it to make sure the PGs land where they belong. crushtool -i crushmap.bin --test --rule 2 --num-rep 7 --show-mappings crushtool -i crushmap.bin --test --rule 2 --num-rep 7 --show-bad-mappings If you don't get bad

[ceph-users] Re: Ceph cluster not recover after OSD down

2021-05-06 Thread Andres Rojas Guerrero
Hi, I try to make a new crush rule (Nautilus) in order take the new correct_failure_domain to hosts: "rule_id": 2, "rule_name": "nxtcloudAFhost", "ruleset": 2, "type": 3, "min_size": 3, "max_size": 7, "steps": [ {

[ceph-users] Re: Out of Memory after Upgrading to Nautilus

2021-05-06 Thread Christoph Adomeit
It looks that I have solved the issue. I tried: ceph.conf [osd] osd_memory_target = 1073741824 systemctl restart ceph-osd.target when i run ceph config get osd.40 osd_memory_target it returns: 4294967296 so this did not work. Next I tried: ceph tell osd.* injectargs