[ceph-users] Why is min_size of erasure pools set to k+1

2023-11-20 Thread Vladimir Brik
Could someone help me understand why it's a bad idea to set min_size of erasure-coded pools to k? >From what I've read, the argument for k+1 is that if min_size is k and you lose an OSD during recovery after a failure of m OSDs, data will become unavailable. But how does setting min_size to k+1 he

[ceph-users] Re: OSD tries (and fails) to scrub the same PGs over and over

2023-07-21 Thread Vladimir Brik
> what's the cluster status? Is there recovery or backfilling > going on? No. Everything is good except this PG is not getting scrubbed. Vlad On 7/21/23 01:41, Eugen Block wrote: Hi, what's the cluster status? Is there recovery or backfilling going on? Zitat von Vladimir

[ceph-users] OSD tries (and fails) to scrub the same PGs over and over

2023-07-19 Thread Vladimir Brik
I have a PG that hasn't been scrubbed in over a month and not deep-scrubbed in over two months. I tried forcing with `ceph pg (deep-)scrub` but with no success. Looking at the logs of that PG's primary OSD it looks like every once in a while it attempts (and apparently fails) to scrub that PG

[ceph-users] Re: Enable Centralized Logging in Dashboard.

2023-05-17 Thread Vladimir Brik
How do I create a user name and password that I could use to log in to grafana? Vlad On 11/16/22 08:42, E Taka wrote: Thank you, Nizam. I wasn't aware that the Dashboard login is not the same as the grafana login. Now I have accass to the logfiles. Am Mi., 16. Nov. 2022 um 15:06 Uhr schrieb N

[ceph-users] Any issues with podman 4.2 and Quincy?

2023-02-13 Thread Vladimir Brik
Has anybody run into issues with Quincy and podman 4.2? 4x podman series are not mentioned in https://docs.ceph.com/en/quincy/cephadm/compatibility/ but podman 3x is no longer available in Alma Linux Vlad ___ ceph-users mailing list -- ceph-users@

[ceph-users] Re: What happens when a DB/WAL device runs out of space?

2022-12-13 Thread Vladimir Brik
Vlad On 12/13/22 12:46, Janne Johansson wrote: Den tis 13 dec. 2022 kl 17:47 skrev Vladimir Brik : Hello I have a bunch of HDD OSDs with DB/WAL devices on SSD. If the current trends continue, the DB/WAL devices will become full before the HDDs completely fill up (e.g. a 50% full HDD has DB/WAL

[ceph-users] What happens when a DB/WAL device runs out of space?

2022-12-13 Thread Vladimir Brik
Hello I have a bunch of HDD OSDs with DB/WAL devices on SSD. If the current trends continue, the DB/WAL devices will become full before the HDDs completely fill up (e.g. a 50% full HDD has DB/WAL device that is about 65% full). Will anything terrible happen when DB/WAL devices fill up? Will

[ceph-users] Re: cephfs-top doesn't work

2022-10-05 Thread Vladimir Brik
w? Thanks. On Tue, 19 Apr 2022 at 01:14, Vladimir Brik <mailto:vladimir.b...@icecube.wisc.edu>> wrote: Does anybody know why cephfs-top may only display header lines (date, client types, metric names) but no actual data? When I run it, cephfs-top consumes quite a bit of

[ceph-users] How to report a potential security issue

2022-10-04 Thread Vladimir Brik
Hello I think I may have run into a bug in cephfs that has security implications. I am not sure it's a good idea to send the details to the public mailing list or create a public ticket for it. How should I proceed? Thanks Vlad ___ ceph-users ma

[ceph-users] cephadm shell fails to start due to missing config files?

2021-07-02 Thread Vladimir Brik
Hello I am getting an error on one node in my cluster (other nodes are fine) when trying to run "cephadm shell". Historically this machine has been used as the primary Ceph management host, so it would be nice if this could be fixed. ceph-1 ~ # cephadm -v shell container_init=False Inferring

[ceph-users] How to orch apply single site rgw with custom front-end

2021-06-15 Thread Vladimir Brik
Hello How can I use ceph orch apply to deploy single site rgw daemons with custom frontend configuration? Basically, I have three servers in a DNS round-robin, each running a 15.2.12 rgw daemon with this configuration: rgw_frontends = civetweb num_threads=5000 port=443s ssl_certificate=/etc/

[ceph-users] Is it safe to mix Octopus and Pacific mons?

2021-06-09 Thread Vladimir Brik
Hello My attempt to upgrade from Octopus to Pacific ran into issues, and I currently have one 16.2.4 mon and two 15.2.12 mons. Is this safe to run the cluster like this or should I shut down the 16.2.4 mon until I figure out what to do next with the upgrade? Thanks, Vlad __

[ceph-users] Upgrade to 16 failed: wrong /sys/fs/cgroup path

2021-06-09 Thread Vladimir Brik
Hello My upgrade from 15.2.12 to 16.2.4 is stuck because a mon daemon failed to upgrade. Systemctl status of the mon showed this error: Error: open /sys/fs/cgroup/cpuacct,cpu/system.slice/... It turns out there is no /sys/fs/cgroup/cpuacct,cpu directory on my system. Instead, I have /sys/f

[ceph-users] Stray hosts and daemons

2021-05-20 Thread Vladimir Brik
I am not sure how to interpret CEPHADM_STRAY_HOST and CEPHADM_STRAY_DAEMON warnings. They seem to be inconsistent. I converted my cluster to be managed by cephadm by adopting mon and all other daemons, and they show up in ceph orch ps, but ceph health says mons are stray: [WRN] CEPHADM_STRAY

[ceph-users] Re: Balancer not balancing (14.2.7, crush-compat)

2020-04-09 Thread Vladimir Brik
One possibly relevant detail: the cluster has 8 nodes, and the new pool I created uses k5 m2 erasure coding. Vlad On 4/9/20 11:28 AM, Vladimir Brik wrote: Hello I am running ceph 14.2.7 with balancer in crush-compat mode (needed because of old clients), but it's doesn't seem t

[ceph-users] Balancer not balancing (14.2.7, crush-compat)

2020-04-09 Thread Vladimir Brik
Hello I am running ceph 14.2.7 with balancer in crush-compat mode (needed because of old clients), but it's doesn't seem to be doing anything. It used to work in the past. I am not sure what changed. I created a big pool, ~285TB stored, and it doesn't look like it ever got balanced: pool 43 'f

[ceph-users] A fast tool to export/copy a pool

2020-03-09 Thread Vladimir Brik
I am wondering if there exists a tool, faster than "rados export", that can copy and restore read-only pools (to/from another pool or file system). It looks like "rados export" is very slow because it is single threaded (the best I can tell, --workers doesn't make a difference). Vlad ___

[ceph-users] Migrating data to a more efficient EC pool

2020-02-24 Thread Vladimir Brik
Hello I have ~300TB of data in default.rgw.buckets.data k2m2 pool and I would like to move it to a new k5m2 pool. I found instructions using cache tiering[1], but they come with a vague scary warning, and it looks like EC-EC may not even be possible [2] (is it still the case?). Can anybody