[ceph-users] Why is min_size of erasure pools set to k+1
Could someone help me understand why it's a bad idea to set min_size of erasure-coded pools to k? >From what I've read, the argument for k+1 is that if min_size is k and you lose an OSD during recovery after a failure of m OSDs, data will become unavailable. But how does setting min_size to k+1 help? If m=2, if you experience a double failure followed by another failure during recovery you still lost 3 OSDs and therefore your data because the pool wasn't set up to handle 3 concurrent failures, and the value of min_size is irrelevant. https://github.com/ceph/ceph/pull/8008 mentions inability to peer if min_size = k, but I don't understand why. Does that mean that if min_size=k and I lose m OSDs, and then an OSD is restarted during recovery, PGs will not peer even after the restarted OSD comes back online? Vlad ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: OSD tries (and fails) to scrub the same PGs over and over
> what's the cluster status? Is there recovery or backfilling > going on? No. Everything is good except this PG is not getting scrubbed. Vlad On 7/21/23 01:41, Eugen Block wrote: Hi, what's the cluster status? Is there recovery or backfilling going on? Zitat von Vladimir Brik : I have a PG that hasn't been scrubbed in over a month and not deep-scrubbed in over two months. I tried forcing with `ceph pg (deep-)scrub` but with no success. Looking at the logs of that PG's primary OSD it looks like every once in a while it attempts (and apparently fails) to scrub that PG, along with two others, over and over. For example: 2023-07-19T16:26:07.082 ... 24.3ea scrub starts 2023-07-19T16:26:10.284 ... 27.aae scrub starts 2023-07-19T16:26:11.169 ... 24.aa scrub starts 2023-07-19T16:26:12.153 ... 24.3ea scrub starts 2023-07-19T16:26:13.346 ... 27.aae scrub starts 2023-07-19T16:26:16.239 ... 24.aa scrub starts ... Lines like that are repeated throughout the log file. Has anyone seen something similar? How can I debug this? I am running 17.2.5 Vlad ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] OSD tries (and fails) to scrub the same PGs over and over
I have a PG that hasn't been scrubbed in over a month and not deep-scrubbed in over two months. I tried forcing with `ceph pg (deep-)scrub` but with no success. Looking at the logs of that PG's primary OSD it looks like every once in a while it attempts (and apparently fails) to scrub that PG, along with two others, over and over. For example: 2023-07-19T16:26:07.082 ... 24.3ea scrub starts 2023-07-19T16:26:10.284 ... 27.aae scrub starts 2023-07-19T16:26:11.169 ... 24.aa scrub starts 2023-07-19T16:26:12.153 ... 24.3ea scrub starts 2023-07-19T16:26:13.346 ... 27.aae scrub starts 2023-07-19T16:26:16.239 ... 24.aa scrub starts ... Lines like that are repeated throughout the log file. Has anyone seen something similar? How can I debug this? I am running 17.2.5 Vlad ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Enable Centralized Logging in Dashboard.
How do I create a user name and password that I could use to log in to grafana? Vlad On 11/16/22 08:42, E Taka wrote: Thank you, Nizam. I wasn't aware that the Dashboard login is not the same as the grafana login. Now I have accass to the logfiles. Am Mi., 16. Nov. 2022 um 15:06 Uhr schrieb Nizamudeen A : Hi, Did you login to the grafana dashboard? For centralized logging you'll need to login to the grafana using your grafana username and password. If you do that and refresh the dashboard, I think the Loki page should be visible from the Daemon Logs page. Regards, Nizam On Wed, Nov 16, 2022 at 7:31 PM E Taka <0eta...@gmail.com> wrote: Ceph: 17.2.5, dockerized with Ubuntu 20.04 Hi all, I try to enable the Centralized Logging in Dashboard as described in https://docs.ceph.com/en/quincy/cephadm/services/monitoring/#cephadm-monitoring-centralized-logs Logging inti files is enabled: ceph config set global log_to_file true ceph config set global mon_cluster_log_to_file true Loki is deployed at one host, promtail on every host: service_type: loki service_name: loki placement: hosts: - ceph00 --- service_type: promtail service_name: promtail placement: host_pattern: '*' After applying the YAML above the log messages in »ceph -W cephadm« look good (deploying loki+promtail and reconfiguring grafana). But the Dashboard "Cluster → Logs → Daemon Logs" just shows a standard grafana page without any buttons for the Ceph Cluster. Its URL is https://ceph00. [mydomain]:3000/explore?orgId=1=["now-1h","now","Loki",{"refId":"A"}] Did I miss something for the centralized Logging in the Dashboard? Thank! ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Any issues with podman 4.2 and Quincy?
Has anybody run into issues with Quincy and podman 4.2? 4x podman series are not mentioned in https://docs.ceph.com/en/quincy/cephadm/compatibility/ but podman 3x is no longer available in Alma Linux Vlad ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: What happens when a DB/WAL device runs out of space?
> The DB uses "fixed" sizes like 3,30,300G for different levels of > data, and when it needs to start fill a new level and it doesn't fit, > this level moves over to the data device. I thought this no longer applied since the changes in Pacific that Nathan mentioned? Vlad On 12/13/22 12:46, Janne Johansson wrote: Den tis 13 dec. 2022 kl 17:47 skrev Vladimir Brik : Hello I have a bunch of HDD OSDs with DB/WAL devices on SSD. If the current trends continue, the DB/WAL devices will become full before the HDDs completely fill up (e.g. a 50% full HDD has DB/WAL device that is about 65% full). Will anything terrible happen when DB/WAL devices fill up? Will RocksDB just put whatever doesn't fit in DB/WAL SSD on the HDD? The DB uses "fixed" sizes like 3,30,300G for different levels of data, and when it needs to start fill a new level and it doesn't fit, this level moves over to the data device. While one can change DB sizes, at least previously, the 3,30,300 sizes were what RocksDB used so DB sizes in between these sizes would just leave unused space. WAL data is more temporary in nature and will probably both be somewhat small compared to 30-300G DB, and will probably just force writes directly to the main device in case it runs full. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] What happens when a DB/WAL device runs out of space?
Hello I have a bunch of HDD OSDs with DB/WAL devices on SSD. If the current trends continue, the DB/WAL devices will become full before the HDDs completely fill up (e.g. a 50% full HDD has DB/WAL device that is about 65% full). Will anything terrible happen when DB/WAL devices fill up? Will RocksDB just put whatever doesn't fit in DB/WAL SSD on the HDD? Thanks Vlad ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: cephfs-top doesn't work
It looks like my cluster is too old. I am getting "perf stats version mismatch!" Vlad On 10/5/22 08:37, Jos Collin wrote: This issue is fixed in https://github.com/ceph/ceph/pull/48090 <https://github.com/ceph/ceph/pull/48090>. Could you please check it out and let me know? Thanks. On Tue, 19 Apr 2022 at 01:14, Vladimir Brik <mailto:vladimir.b...@icecube.wisc.edu>> wrote: Does anybody know why cephfs-top may only display header lines (date, client types, metric names) but no actual data? When I run it, cephfs-top consumes quite a bit of the CPU and generates quite a bit of network traffic, but it doesn't actually display the data. I poked around in the source code and it seems like it might be curses issue, but I am not sure. Vlad ___ ceph-users mailing list -- ceph-users@ceph.io <mailto:ceph-users@ceph.io> To unsubscribe send an email to ceph-users-le...@ceph.io <mailto:ceph-users-le...@ceph.io> ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] How to report a potential security issue
Hello I think I may have run into a bug in cephfs that has security implications. I am not sure it's a good idea to send the details to the public mailing list or create a public ticket for it. How should I proceed? Thanks Vlad ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] How to orch apply single site rgw with custom front-end
Hello How can I use ceph orch apply to deploy single site rgw daemons with custom frontend configuration? Basically, I have three servers in a DNS round-robin, each running a 15.2.12 rgw daemon with this configuration: rgw_frontends = civetweb num_threads=5000 port=443s ssl_certificate=/etc/ceph/rgw.crt I would like to deploy 16.2.4 rgw daemons, but I don't know how to configure them. When I used "ceph orch apply rgw ", it created a new entry in the monitor configuration database instead of using existing rgw_frontends entry. I am guessing that I need to name the config db entry correctly, but I don't know what name to use. Currently I have $ ceph config get client rgw_frontends civetweb num_threads=5000 port=443s ssl_certificate=/etc/ceph/rgw.crt Can anybody help? Thanks, Vlad ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Is it safe to mix Octopus and Pacific mons?
Hello My attempt to upgrade from Octopus to Pacific ran into issues, and I currently have one 16.2.4 mon and two 15.2.12 mons. Is this safe to run the cluster like this or should I shut down the 16.2.4 mon until I figure out what to do next with the upgrade? Thanks, Vlad ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Upgrade to 16 failed: wrong /sys/fs/cgroup path
Hello My upgrade from 15.2.12 to 16.2.4 is stuck because a mon daemon failed to upgrade. Systemctl status of the mon showed this error: Error: open /sys/fs/cgroup/cpuacct,cpu/system.slice/... It turns out there is no /sys/fs/cgroup/cpuacct,cpu directory on my system. Instead, I have /sys/fs/cgroup/cpu,cpuacct. Symlinking them appears to have solved the immediate problem, but if I proceed with the upgrade to 16.2.4, after reboot, all ceph daemons will probably fail to start. Is this an issue with ceph, podman (2.1.1), or the fact that I am running Centos7? Is it possible to upgrade from 15.2.12 to 16.2.4 on Centos7? I thought that installing a version of podman that is compatible with both would suffice, but apparently not... Vlad ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Stray hosts and daemons
I am not sure how to interpret CEPHADM_STRAY_HOST and CEPHADM_STRAY_DAEMON warnings. They seem to be inconsistent. I converted my cluster to be managed by cephadm by adopting mon and all other daemons, and they show up in ceph orch ps, but ceph health says mons are stray: [WRN] CEPHADM_STRAY_HOST: 6 stray host(s) with 6 daemon(s) not managed by cephadm stray host ceph-6.icecube.wisc.edu has 1 stray daemons: ['mon.ceph-6'] ... At the same time, mon.ceph-6 is not mentioned in the CEPHADM_STRAY_DAEMON section, which seems to contradict the message about mon.ceph-6 being stray. Any suggestions how to fix this? Thanks, Vlad ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Balancer not balancing (14.2.7, crush-compat)
One possibly relevant detail: the cluster has 8 nodes, and the new pool I created uses k5 m2 erasure coding. Vlad On 4/9/20 11:28 AM, Vladimir Brik wrote: Hello I am running ceph 14.2.7 with balancer in crush-compat mode (needed because of old clients), but it's doesn't seem to be doing anything. It used to work in the past. I am not sure what changed. I created a big pool, ~285TB stored, and it doesn't look like it ever got balanced: pool 43 'fs-data-k5m2-hdd' erasure size 7 min_size 6 crush_rule 7 object_hash rjenkins pg_num 2048 pgp_num 2048 autoscale_mode warn last_change 48647 lfor 0/42080/42102 flags hashpspool,ec_overwrites,nearfull stripe_width 20480 application cephfs OSD utilization varies between ~50% and about ~80%, with about 60% raw used. I am using a mixture of 9TB and 14TB drives. Number of PGs/drive varies 103 and 207. # ceph osd df | grep hdd | sort -k 17 | (head -n 2; tail -n 2) 160 hdd 12.53519 1.0 13 TiB 6.0 TiB 5.9 TiB 74 KiB 12 GiB 6.6 TiB 47.74 0.79 120 up 146 hdd 12.53519 1.0 13 TiB 6.0 TiB 6.0 TiB 51 MiB 13 GiB 6.5 TiB 48.17 0.80 119 up 79 hdd 8.99799 1.0 9.0 TiB 7.3 TiB 7.2 TiB 42 KiB 16 GiB 1.7 TiB 80.91 1.34 186 up 62 hdd 8.99799 1.0 9.0 TiB 7.3 TiB 7.2 TiB 112 KiB 16 GiB 1.7 TiB 81.44 1.35 189 up # ceph balancer status { "last_optimize_duration": "0:00:00.339635", "plans": [], "mode": "crush-compat", "active": true, "optimize_result": "Some osds belong to multiple subtrees: {0: ['default', 'default~hdd'], ... "last_optimize_started": "Thu Apr 9 11:17:40 2020" } Does anybody know how to debug this? Thanks, Vlad ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Balancer not balancing (14.2.7, crush-compat)
Hello I am running ceph 14.2.7 with balancer in crush-compat mode (needed because of old clients), but it's doesn't seem to be doing anything. It used to work in the past. I am not sure what changed. I created a big pool, ~285TB stored, and it doesn't look like it ever got balanced: pool 43 'fs-data-k5m2-hdd' erasure size 7 min_size 6 crush_rule 7 object_hash rjenkins pg_num 2048 pgp_num 2048 autoscale_mode warn last_change 48647 lfor 0/42080/42102 flags hashpspool,ec_overwrites,nearfull stripe_width 20480 application cephfs OSD utilization varies between ~50% and about ~80%, with about 60% raw used. I am using a mixture of 9TB and 14TB drives. Number of PGs/drive varies 103 and 207. # ceph osd df | grep hdd | sort -k 17 | (head -n 2; tail -n 2) 160 hdd 12.53519 1.0 13 TiB 6.0 TiB 5.9 TiB 74 KiB 12 GiB 6.6 TiB 47.74 0.79 120 up 146 hdd 12.53519 1.0 13 TiB 6.0 TiB 6.0 TiB 51 MiB 13 GiB 6.5 TiB 48.17 0.80 119 up 79 hdd 8.99799 1.0 9.0 TiB 7.3 TiB 7.2 TiB 42 KiB 16 GiB 1.7 TiB 80.91 1.34 186 up 62 hdd 8.99799 1.0 9.0 TiB 7.3 TiB 7.2 TiB 112 KiB 16 GiB 1.7 TiB 81.44 1.35 189 up # ceph balancer status { "last_optimize_duration": "0:00:00.339635", "plans": [], "mode": "crush-compat", "active": true, "optimize_result": "Some osds belong to multiple subtrees: {0: ['default', 'default~hdd'], ... "last_optimize_started": "Thu Apr 9 11:17:40 2020" } Does anybody know how to debug this? Thanks, Vlad ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] A fast tool to export/copy a pool
I am wondering if there exists a tool, faster than "rados export", that can copy and restore read-only pools (to/from another pool or file system). It looks like "rados export" is very slow because it is single threaded (the best I can tell, --workers doesn't make a difference). Vlad ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Migrating data to a more efficient EC pool
Hello I have ~300TB of data in default.rgw.buckets.data k2m2 pool and I would like to move it to a new k5m2 pool. I found instructions using cache tiering[1], but they come with a vague scary warning, and it looks like EC-EC may not even be possible [2] (is it still the case?). Can anybody recommend a safe procedure to copy an EC pool's data to another pool with a more efficient erasure coding? Perhaps there is a tool out there that could do it? A few days of downtime would be tolerable, if it will simplify things. Also, I have enough free space to temporarily store the k2m2 data in a replicated pool (if EC-EC tiering is not possible, but EC-replicated and replicated-EC tiering is possible). Is there a tool or some efficient way to verify that the content of two pools is the same? Thanks, Vlad [1] https://ceph.io/geen-categorie/ceph-pool-migration/ [2] http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-February/016109.html ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io