[ceph-users] Re: Monitoring ceph cluster

2022-01-25 Thread Michel Niyoyita
Thank you for your email Szabo, these can be helpful , can you provide links then I start to work on it. Michel. On Tue, 25 Jan 2022, 18:51 Szabo, Istvan (Agoda), wrote: > Which monitoring tool? Like prometheus or nagios style thing? > We use sensu for keepalive and ceph health reporting +

[ceph-users] problems with snap-schedule on 16.2.7

2022-01-25 Thread Kyriazis, George
Hello Ceph users, I have a problem with scheduled snapshots on ceph 16.2.7 (in a Proxmox install). While trying to understand how snap schedules work, I created more schedules than I needed to: root@vis-mgmt:~# ceph fs snap-schedule list /backups/nassie/NAS /backups/nassie/NAS 1h 24h7d8w12m

[ceph-users] Re: 3 OSDs can not be started after a server reboot - rocksdb Corruption

2022-01-25 Thread Sebastian Mazza
Hey Igor, thank you for your response! >> >> Do you suggest to disable the HDD write-caching and / or the >> bluefs_buffered_io for productive clusters? >> > Generally upstream recommendation is to disable disk write caching, there > were multiple complains it might negatively impact the

[ceph-users] Re: Disk Failure Predication cloud module?

2022-01-25 Thread Marc
Is there also (going to be) something available that works 'offline'? ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Fwd: Lots of OSDs crashlooping (DRAFT - feedback?)

2022-01-25 Thread Benjamin Staffin
Thank you for your responses! Since yesterday we found that several OSD pods still had memory limits set, and in fact some of them (but far from all) were getting OOM killed, so we have fully removed those limits again. Unfortunately this hasn't helped much and there are still 50ish OSDs down.

[ceph-users] Re: Disk Failure Predication cloud module?

2022-01-25 Thread Yaarit Hatuka
Hi Jake, Many thanks for contributing the data. Indeed, our data scientists use the data from Backblaze too. Have you found strong correlations between device health metrics (such as reallocated sector count, or any combination of attributes) and read/write errors in /var/log/messages from what

[ceph-users] Re: Moving all s3 objects from an ec pool to a replicated pool using storage classes.

2022-01-25 Thread Frédéric Nass
Le 25/01/2022 à 18:28, Casey Bodley a écrit : On Tue, Jan 25, 2022 at 11:59 AM Frédéric Nass wrote: Le 25/01/2022 à 14:48, Casey Bodley a écrit : On Tue, Jan 25, 2022 at 4:49 AM Frédéric Nass wrote: Hello, I've just heard about storage classes and imagined how we could use them to

[ceph-users] Re: Multipath and cephadm

2022-01-25 Thread Thomas Roth
Would like to know that as well. I have the same setup - cephadm, Pacific, CentOS8, and a host with a number of HDDs which are all connect by 2 paths. No way to use these without multipath > ceph orch daemon add osd serverX:/dev/sdax > Cannot update volume group

[ceph-users] Re: Moving all s3 objects from an ec pool to a replicated pool using storage classes.

2022-01-25 Thread Casey Bodley
On Tue, Jan 25, 2022 at 11:59 AM Frédéric Nass wrote: > > > Le 25/01/2022 à 14:48, Casey Bodley a écrit : > > On Tue, Jan 25, 2022 at 4:49 AM Frédéric Nass > > wrote: > >> Hello, > >> > >> I've just heard about storage classes and imagined how we could use them > >> to migrate all S3 objects

[ceph-users] Re: Moving all s3 objects from an ec pool to a replicated pool using storage classes.

2022-01-25 Thread Frédéric Nass
Le 25/01/2022 à 14:48, Casey Bodley a écrit : On Tue, Jan 25, 2022 at 4:49 AM Frédéric Nass wrote: Hello, I've just heard about storage classes and imagined how we could use them to migrate all S3 objects within a placement pool from an ec pool to a replicated pool (or vice-versa) for data

[ceph-users] Re: Fwd: Lots of OSDs crashlooping (DRAFT - feedback?)

2022-01-25 Thread Dan van der Ster
On Tue, Jan 25, 2022 at 4:07 PM Frank Schilder wrote: > > Hi Dan, > > in several threads I have now seen statements like "Does your cluster have > the pglog_hardlimit set?". In this context, I would be grateful if you could > shed some light on the following: > > 1) How do I check that? > >

[ceph-users] Monitoring ceph cluster

2022-01-25 Thread Michel Niyoyita
Hello team, I would like to monitor my ceph cluster using one of the monitoring tool, does someone has a help on that ? Michel ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] January Ceph Science Virtual User Group

2022-01-25 Thread Kevin Hrpcek
Hey all, Sorry for the late notice. We will be having a Ceph science/research/big cluster call on Wednesday January 26th. If anyone wants to discuss something specific they can add it to the pad linked below. If you have questions or comments you can contact me. This is an informal open

[ceph-users] Re: Moving all s3 objects from an ec pool to a replicated pool using storage classes.

2022-01-25 Thread Casey Bodley
On Tue, Jan 25, 2022 at 4:49 AM Frédéric Nass wrote: > > Hello, > > I've just heard about storage classes and imagined how we could use them > to migrate all S3 objects within a placement pool from an ec pool to a > replicated pool (or vice-versa) for data resiliency reasons, not to save > space.

[ceph-users] Re: Using s3website with ceph orch?

2022-01-25 Thread Manuel Holtgrewe
Thanks, I had another review of the configuration and it appears that the configuration *is* properly propagated to the daemon (also visible in my second link). I traced down my issues further and it looks like I have first tripped over the following issue again...

[ceph-users] Re: switch restart facilitating cluster/client network.

2022-01-25 Thread Tyler Stachecki
I would still set noout on relevant parts of the cluster in case something goes south and it does take longer than 2 minutes. Otherwise OSDs will start outing themselves after 10 minutes or so by default and then you have a lot of churn going on. The monitors monitors will be fine unless you lose

[ceph-users] Re: CephFS keyrings for K8s

2022-01-25 Thread Frédéric Nass
Le 25/01/2022 à 12:09, Frédéric Nass a écrit : Hello Michal, With cephfs and a single filesystem shared across multiple k8s clusters, you should subvolumegroups to limit data exposure. You'll find an example of how to use subvolumegroups in the ceph-csi-cephfs helm chart [1]. Essentially

[ceph-users] Re: CephFS keyrings for K8s

2022-01-25 Thread Frédéric Nass
Hello Michal, With cephfs and a single filesystem shared across multiple k8s clusters, you should subvolumegroups to limit data exposure. You'll find an example of how to use subvolumegroups in the ceph-csi-cephfs helm chart [1]. Essentially you just have to set the subvolumeGroup to whatever

[ceph-users] Re: Fwd: Lots of OSDs crashlooping (DRAFT - feedback?)

2022-01-25 Thread Dan van der Ster
Hi Benjamin, Apologies that I can't help for the bluestore issue. But that huge 100GB OSD consumption could be related to similar reports linked here: https://tracker.ceph.com/issues/53729 Does your cluster have the pglog_hardlimit set? # ceph osd dump | grep pglog flags

[ceph-users] Re: switch restart facilitating cluster/client network.

2022-01-25 Thread Janne Johansson
If you can stop VMs it will help, even if the cluster recovers quickly, VMs take great offense if a write does not finish within 120s, and many will put filesystems in readonly-mode if writes are delayed for so long, so if there is a 120s outage of IO, the VMs will be stuck/useless anyhow so you

[ceph-users] Moving all s3 objects from an ec pool to a replicated pool using storage classes.

2022-01-25 Thread Frédéric Nass
Hello, I've just heard about storage classes and imagined how we could use them to migrate all S3 objects within a placement pool from an ec pool to a replicated pool (or vice-versa) for data resiliency reasons, not to save space. It looks possible since ; 1. data pools are associated to

[ceph-users] How to remove stuck daemon?

2022-01-25 Thread Fyodor Ustinov
Hi! I have Ceph cluster version 16.2.7 with this error: root@s-26-9-19-mon-m1:~# ceph health detail HEALTH_WARN 1 failed cephadm daemon(s) [WRN] CEPHADM_FAILED_DAEMON: 1 failed cephadm daemon(s) daemon osd.91 on s-26-8-2-1 is in error state But I don't have that osd anymore. I deleted it.

[ceph-users] switch restart facilitating cluster/client network.

2022-01-25 Thread Marc
If the switch needs an update and needs to be restarted (expected 2 minutes). Can I just leave the cluster as it is, because ceph will handle this correctly? Or should I eg. put some vm's I am running in pause mode, or even stop them. What happens to the monitors? Can they handle this, or