[ceph-users] Re: upgrade 17.2.6 to 17.2.7 , any issues?

2023-11-02 Thread Dmitry Melekhov
03.11.2023 04:33, Reto Gysi пишет: Hi I had 2 issues: 1. I got hit by https://tracker.ceph.com/issues/63118 which also happened with multi-arch deployment upgrade from 17.2.5 to 17.2.7. The workaround worked for me. 2. I got some BLUEFS_SPILLOVER warnings after the upgrade. So I

[ceph-users] Re: 17.2.7 quincy dashboard issues

2023-11-02 Thread Matthew Darwin
In my case I'm adding a label that is unique to each ceph cluster and then can filter on that.  In my ceph dashboard in grafana I've added a pull-down list to check each different ceph cluster. You need a way for me to configure what labels to filter on so I can match it up with how I

[ceph-users] Re: Ceph OSD reported Slow operations

2023-11-02 Thread V A Prabha
Is it possible to move the OSDs safe (making the OSDs out and move the content to other OSDs and remove it and map it fresh to other nodes which is less loaded) as I have very critical production workloads ( Government applications) ? Please guide what is the safer means to stabilize the

[ceph-users] Re: RGW access logs with bucket name

2023-11-02 Thread Dan van der Ster
Using the ops log is a good option -- I had missed that it can now log to a file. In Quincy: # ceph config set global rgw_ops_log_rados false # ceph config set global rgw_ops_log_file_path '/var/log/ceph/ops-log-$cluster-$name.log' # ceph config set global rgw_enable_ops_log true Then restart

[ceph-users] resharding RocksDB after upgrade to Pacific breaks OSDs

2023-11-02 Thread Denis Polom
Hi we upgraded our Ceph cluster from latest Octopus to Pacific 16.2.14 and then we followed the docs (https://docs.ceph.com/en/latest/rados/configuration/bluestore-config-ref/#rocksdb-sharding ) to

[ceph-users] Re: Emergency, I lost 4 monitors but all osd disk are safe

2023-11-02 Thread David C.
Hi, I've just checked with the team and the situation is much more serious than it seems: the lost disks contained the MONs AND OSDs databases (5 servers down out of 8, replica 3). It seems that the team fell victim to a bad batch of Samsung 980 Pros (I'm not a big fan of this "Pro" range, but

[ceph-users] Re: Emergency, I lost 4 monitors but all osd disk are safe

2023-11-02 Thread Anthony D'Atri
This admittedly is the case throughout the docs. > On Nov 2, 2023, at 07:27, Joachim Kraftmayer - ceph ambassador > wrote: > > Hi, > > another short note regarding the documentation, the paths are designed for a > package installation. > > the paths for container installation look a bit

[ceph-users] Re: Emergency, I lost 4 monitors but all osd disk are safe

2023-11-02 Thread David C.
Hi Mohamed, I understand there's one operational monitor, isn't there? If so, you need to reprovision the other monitors on an empty base so that they synchronize with the only remaining monitor. Cordialement, *David CASIER*

[ceph-users] Re: Emergency, I lost 4 monitors but all osd disk are safe

2023-11-02 Thread Robert Sander
Hi, On 11/2/23 13:05, Mohamed LAMDAOUAR wrote: when I ran this command, I got this error (because the database of the osd was on the boot disk) The RocksDB part of the OSD was on the failed SSD? Then the OSD is lost and cannot be recovered. The RocksDB contains the information where each

[ceph-users] Re: Emergency, I lost 4 monitors but all osd disk are safe

2023-11-02 Thread Malte Stroem
Hey Mohamed, just send us the output of ceph -s and ceph mon dump please. Best, Malte On 02.11.23 13:05, Mohamed LAMDAOUAR wrote: Hi robert, when I ran this command, I got this error (because the database of the osd was on the boot disk) ceph-objectstore-tool \ --type bluestore \

[ceph-users] Re: Emergency, I lost 4 monitors but all osd disk are safe

2023-11-02 Thread Boris Behrens
Hi, follow these instructions: https://docs.ceph.com/en/quincy/rados/operations/add-or-rm-mons/#removing-monitors-from-an-unhealthy-cluster As you are using containers, you might need to specify the --mon-data directory (/var/lib/CLUSTER_UUID/mon.MONNAME) (actually I never did this in an

[ceph-users] Re: Emergency, I lost 4 monitors but all osd disk are safe

2023-11-02 Thread Mohamed LAMDAOUAR
Hi robert, when I ran this command, I got this error (because the database of the osd was on the boot disk) ceph-objectstore-tool \ > --type bluestore \ > --data-path /var/lib/ceph/c80891ba-55f3-11ed-9389-919f4368965c/osd.9 \ > --op update-mon-db \ > --mon-store-path

[ceph-users] Re: Emergency, I lost 4 monitors but all osd disk are safe

2023-11-02 Thread Robert Sander
On 11/2/23 12:48, Mohamed LAMDAOUAR wrote: I reinstalled the OS on a new SSD disk. How can I rebuild my cluster with only one mons. If there is one MON still operating you can try to extract its monmap and remove all the other MONs from it with the monmaptool:

[ceph-users] Re: Emergency, I lost 4 monitors but all osd disk are safe

2023-11-02 Thread Mohamed LAMDAOUAR
Thanks Joachim for the clarification ;) 8 rue greneta, 75003, Paris, FRANCE enyx.com

[ceph-users] Re: Emergency, I lost 4 monitors but all osd disk are safe

2023-11-02 Thread Mohamed LAMDAOUAR
Thanks Robert, I tried this but I'm stuck. If you have some time, do help me with that I will be very happy because I'm lost :( 8 rue greneta, 75003, Paris, FRANCE enyx.com

[ceph-users] Re: Emergency, I lost 4 monitors but all osd disk are safe

2023-11-02 Thread Mohamed LAMDAOUAR
Hello Boris, I have one server monitor up and two other servers of the cluster are also up (These two servers are not monitors ) . I have four other servers down (the boot disk is out) but the osd data disks are safe. I reinstalled the OS on a new SSD disk. How can I rebuild my cluster with only

[ceph-users] Re: Emergency, I lost 4 monitors but all osd disk are safe

2023-11-02 Thread Joachim Kraftmayer - ceph ambassador
Hi, another short note regarding the documentation, the paths are designed for a package installation. the paths for container installation look a bit different e.g.: /var/lib/ceph//osd.y/ Joachim ___ ceph ambassador DACH ceph consultant since 2012 Clyso

[ceph-users] Re: ceph fs (meta) data inconsistent

2023-11-02 Thread Frank Schilder
Hi all, the problem re-appeared in the following way. After moving the problematic folder out and copying it back, all files showed the correct sizes. Today, we observe that the issue is back in the copy that was fine yesterday: [user1@host11 h2lib]$ ls -l total 37198 -rw-rw 1 user1 user1

[ceph-users] Re: Emergency, I lost 4 monitors but all osd disk are safe

2023-11-02 Thread Robert Sander
Hi, On 11/2/23 11:28, Mohamed LAMDAOUAR wrote: I have 7 machines on CEPH cluster, the service ceph runs on a docker container. Each machine has 4 hdd of data (available) and 2 nvme sssd (bricked) During a reboot, the ssd bricked on 4 machines, the data are available on the HDD disk but

[ceph-users] Re: Emergency, I lost 4 monitors but all osd disk are safe

2023-11-02 Thread Boris Behrens
Hi Mohamed, are all mons down, or do you still have at least one that is running? AFAIK: the mons save their DB on the normal OS disks, and not within the ceph cluster. So if all mons are dead, which mean the disks which contained the mon data are unrecoverable dead, you might need to bootstrap a

[ceph-users] Emergency, I lost 4 monitors but all osd disk are safe

2023-11-02 Thread Mohamed LAMDAOUAR
Hello, I have 7 machines on CEPH cluster, the service ceph runs on a docker container. Each machine has 4 hdd of data (available) and 2 nvme sssd (bricked) During a reboot, the ssd bricked on 4 machines, the data are available on the HDD disk but the nvme is bricked and the system is not

[ceph-users] Re: Setting S3 bucket policies with multi-tenants

2023-11-02 Thread Janne Johansson
Den ons 1 nov. 2023 kl 17:51 skrev Thomas Bennett : > > To update my own question, it would seem that Principle should be > defined like this: > >- "Principal": {"AWS": ["arn:aws:iam::Tenant1:user/readwrite"]} > > And resource should: > "Resource": [ "arn:aws:s3:::backups"] > > Is it

[ceph-users] Re: "cephadm version" in reef returns "AttributeError: 'CephadmContext' object has no attribute 'fsid'"

2023-11-02 Thread Eugen Block
There are a couple of examples in the docs [2], so in your case it probably would be something rather simple like: service_type: osd service_id: osd_spec_default placement: host_pattern: '*' spec: data_devices: rotational: 1 db_devices: rotational: 0 You can apply that config to

[ceph-users] Re: Ceph OSD reported Slow operations

2023-11-02 Thread Zakhar Kirpichenko
>1. The calculated IOPS is for the rw operation right ? Total drive IOPS, read or write. Depending on the exact drive models, it may be lower or higher than 200. I took the average for a smaller sized 7.2k rpm SAS drive. Modern drives usually deliver lower read IOPS and higher write IOPS. >2.

[ceph-users] diskprediction_local module and trained models

2023-11-02 Thread Can Özyurt
Hi everyone, We have recently noticed diskprediction_local module only works for a set of manufacturers. Hence we have the following questions: Are there any plans to support more manufacturers in the near future? Can we contribute to the process of training new models and how? Can the existing

[ceph-users] Re: Ceph OSD reported Slow operations

2023-11-02 Thread V A Prabha
Thanks for your prompt reply .. But the query is 1.The calculated IOPS is for the rw operation right ? 2. Cluster is very busy? Is there any misconfiguration or missing tuning paramater that makes the cluster busy? 3. Nodes are not balanced? you mean to say that the count of OSDs in each server

[ceph-users] CephFS scrub causing MDS OOM-kill

2023-11-02 Thread Denis Polom
Hi, I did setup CephFS forward scrub by executing cmd # ceph tell mds.cephfs:0 scrub start / recursive { "return_code": 0, "scrub_tag": "37a67f72-89a3-474e-8f8b-1e55cb979e14", "mode": "asynchronous" } But immediately after it started, memory usage on MDS that keeps rank 0 increased

[ceph-users] Re: Ceph OSD reported Slow operations

2023-11-02 Thread Zakhar Kirpichenko
Sure, it's 36 OSDs at 200 IOPS each (tops, likely lower), I assume size=3 replication so 1/3 of the total performance, and some 30%-ish OSD overhead. (36 x 200) * 1/3 * 0.7 = 1680. That's how many IOPS you can realistically expect from your cluster. You get more than that, but the cluster is very