[ceph-users] Re: Help with deep scrub warnings (probably a bug ... set on pool for effect)

2024-03-05 Thread Peter Maloney
I had the same problem as you The only solution that worked for me is to set it on the pools:     for pool in $(ceph osd pool ls); do     ceph osd pool set "$pool" scrub_max_interval "$smaxi"     ceph osd pool set "$pool" scrub_min_interval "$smini"     ceph osd pool set "$pool"

[ceph-users] Re: Help with deep scrub warnings

2024-03-05 Thread Nicola Mori
Hi Anthony, thanks for the tips. I reset all the values but osd_deep_scrub_interval to their defaults as reported at https://docs.ceph.com/en/latest/rados/configuration/osd-config-ref/ : # ceph config set osd osd_scrub_sleep 0.0 # ceph config set osd osd_scrub_load_threshold 0.5 # ceph

[ceph-users] Re: Help with deep scrub warnings

2024-03-05 Thread Anthony D'Atri
* Try applying the settings to global so that mons/mgrs get them. * Set your shallow scrub settings back to the default. Shallow scrubs take very few resources * Set your randomize_ratio back to the default, you’re just bunching them up * Set the load threshold back to the default, I can’t

[ceph-users] Re: Help: Balancing Ceph OSDs with different capacity

2024-02-07 Thread Jasper Tan
Hi Anthony and everyone else We have found the issue. Because the new 20x 14 TiB OSDs were onboarded onto a single node, there was not only an imbalance in the capacity of each OSD but also between the nodes (other nodes each have around 15x 1.7TiB). Furthermore, CRUSH rule sets default failure

[ceph-users] Re: Help: Balancing Ceph OSDs with different capacity

2024-02-07 Thread Anthony D'Atri
> I have recently onboarded new OSDs into my Ceph Cluster. Previously, I had > 44 OSDs of 1.7TiB each and was using it for about a year. About 1 year ago, > we onboarded an additional 20 OSDs of 14TiB each. That's a big difference in size. I suggest increasing mon_max_pg_per_osd to 1000 --

[ceph-users] Re: Help: Balancing Ceph OSDs with different capacity

2024-02-07 Thread Dan van der Ster
Hi Jasper, I suggest to disable all the crush-compat and reweighting approaches. They rarely work out. The state of the art is: ceph balancer on ceph balancer mode upmap ceph config set mgr mgr/balancer/upmap_max_deviation 1 Cheers, Dan -- Dan van der Ster CTO Clyso GmbH p: +49 89 215252722

[ceph-users] Re: Help on rgw metrics (was rgw_user_counters_cache)

2024-01-31 Thread Casey Bodley
On Wed, Jan 31, 2024 at 3:43 AM garcetto wrote: > > good morning, > i was struggling trying to understand why i cannot find this setting on > my reef version, is it because is only on latest dev ceph version and not > before? that's right, this new feature will be part of the squid release. we

[ceph-users] Re: Help needed with Grafana password

2023-11-10 Thread Sake Ceph
Thank you Eugen! This worked :) > Op 09-11-2023 14:55 CET schreef Eugen Block : > > > It's the '#' character, everything after (including '#' itself) is cut > off. I tried with single and double quotes which also failed. But as I > already said, use a simple password and then change it

[ceph-users] Re: Help needed with Grafana password

2023-11-09 Thread Eugen Block
It's the '#' character, everything after (including '#' itself) is cut off. I tried with single and double quotes which also failed. But as I already said, use a simple password and then change it within grafana. That way you also don't have the actual password lying around in clear text

[ceph-users] Re: Help needed with Grafana password

2023-11-09 Thread Eugen Block
I just tried it on a 17.2.6 test cluster, although I don't have a stack trace the complicated password doesn't seem to be applied (don't know why yet). But since it's an "initial" password you can choose something simple like "admin", and during the first login you are asked to change it

[ceph-users] Re: Help needed with Grafana password

2023-11-09 Thread Sake Ceph
I tried everything at this point, even waited a hour, still no luck. Got it 1 time accidentally working, but with a placeholder for a password. Tried with correct password, nothing and trying again with the placeholder didn't work anymore. So I thought to switch the manager, maybe something

[ceph-users] Re: Help needed with Grafana password

2023-11-09 Thread Eugen Block
Usually, removing the grafana service should be enough. I also have this directory (custom_config_files/grafana.) but it's empty. Can you confirm that after running 'ceph orch rm grafana' the service is actually gone ('ceph orch ls grafana')? The directory underneath

[ceph-users] Re: Help needed with Grafana password

2023-11-09 Thread Sake Ceph
Using podman version 4.4.1 on RHEL 8.8, Ceph 17.2.7 I used 'podman system prune -a -f' and 'podman volume prune -f' to cleanup files, but this leaves a lot of files over in /var/lib/containers/storage/overlay and a empty folder /var/lib/ceph//custom_config_files/grafana.. Found those files

[ceph-users] Re: Help needed with Grafana password

2023-11-09 Thread Eugen Block
What doesn't work exactly? For me it did... Zitat von Sake Ceph : To bad, that doesn't work :( Op 09-11-2023 09:07 CET schreef Sake Ceph : Hi, Well to get promtail working with Loki, you need to setup a password in Grafana. But promtail wasn't working with the 17.2.6 release, the URL was

[ceph-users] Re: Help needed with Grafana password

2023-11-09 Thread Sake Ceph
To bad, that doesn't work :( > Op 09-11-2023 09:07 CET schreef Sake Ceph : > > > Hi, > > Well to get promtail working with Loki, you need to setup a password in > Grafana. > But promtail wasn't working with the 17.2.6 release, the URL was set to > containers.local. So I stopped using it,

[ceph-users] Re: Help needed with Grafana password

2023-11-09 Thread Sake Ceph
Hi, Well to get promtail working with Loki, you need to setup a password in Grafana. But promtail wasn't working with the 17.2.6 release, the URL was set to containers.local. So I stopped using it, but forgot to click on save in KeePass :( I didn't configure anything special in Grafana, the

[ceph-users] Re: Help needed with Grafana password

2023-11-08 Thread Eugen Block
Hi, you mean you forgot your password? You can remove the service with 'ceph orch rm grafana', then re-apply your grafana.yaml containing the initial password. Note that this would remove all of the grafana configs or custom dashboards etc., you would have to reconfigure them. So before

[ceph-users] Re: help, ceph fs status stuck with no response

2023-08-14 Thread Patrick Donnelly
On Tue, Aug 8, 2023 at 1:18 AM Zhang Bao wrote: > > Hi, thanks for your help. > > I am using ceph Pacific 16.2.7. > > Before my Ceph stuck at `ceph fs status fsname`, one of my cephfs became > readonly. Probably the ceph-mgr is stuck (the "volumes" plugin) somehow talking to the read-only

[ceph-users] Re: help, ceph fs status stuck with no response

2023-08-07 Thread Patrick Donnelly
On Mon, Aug 7, 2023 at 6:12 AM Zhang Bao wrote: > > Hi, > > I have a ceph stucked at `ceph --verbose stats fs fsname`. And in the > monitor log, I can found something like `audit [DBG] from='client.431973 -' > entity='client.admin' cmd=[{"prefix": "fs status", "fs": "fsname", > "target":

[ceph-users] Re: Help needed to configure erasure coding LRC plugin

2023-06-30 Thread Eugen Block
I created a tracker issue, maybe that will get some attention: https://tracker.ceph.com/issues/61861 Zitat von Michel Jouvin : Hi Eugen, Thank you very much for these detailed tests that match what I observed and reported earlier. I'm happy to see that we have the same understanding of

[ceph-users] Re: Help needed to configure erasure coding LRC plugin

2023-06-19 Thread Eugen Block
Hi, adding the dev mailing list, hopefully someone there can chime in. But apparently the LRC code hasn't been maintained for a few years (https://github.com/ceph/ceph/tree/main/src/erasure-code/lrc). Let's see... Zitat von Michel Jouvin : Hi Eugen, Thank you very much for these

[ceph-users] Re: Help needed to configure erasure coding LRC plugin

2023-06-19 Thread Michel Jouvin
Hi Eugen, Thank you very much for these detailed tests that match what I observed and reported earlier. I'm happy to see that we have the same understanding of how it should work (based on the documentation). Is there any other way that this list to enter in contact with the plugin

[ceph-users] Re: Help needed to configure erasure coding LRC plugin

2023-06-19 Thread Eugen Block
Hi, I have a real hardware cluster for testing available now. I'm not sure whether I'm completely misunderstanding how it's supposed to work or if it's a bug in the LRC plugin. This cluster has 18 HDD nodes available across 3 rooms (or DCs), I intend to use 15 nodes to be able to recover if

[ceph-users] Re: Help needed to configure erasure coding LRC plugin

2023-05-26 Thread Michel Jouvin
Hi,  I realize that the crushmap I attached to one of my email, probably required to understand the discussion here, has been stripped down by mailman. To avoid poluting the thread with a long output, I put it on at https://box.in2p3.fr/index.php/s/J4fcm7orfNE87CX. Download it if you are

[ceph-users] Re: [Help appreciated] ceph mds damaged

2023-05-26 Thread Justin Li
Hi Patrick, The disaster recovery process with cephfs-data-scan tool didn't fix our MDS issue. It still kept crashing. I've uploaded a detailed MDS log with below ID. The restore procedure below didn't get it working either. Should I set mds_go_bad_corrupt_dentry to false alongside with

[ceph-users] Re: [Help appreciated] ceph mds damaged

2023-05-24 Thread Justin Li
Hi Patrick, Thanks for the instructions. We started the MDS recovery scan with below cmds following the link below. The first bit of scan extens has finished and we're waiting on scan inodes. Probably we shouldn't interrupt the process. Once this procedure failed, I'll follow your steps and

[ceph-users] Re: [Help appreciated] ceph mds damaged

2023-05-24 Thread Patrick Donnelly
Hello Justin, Please do: ceph config set mds debug_mds 20 ceph config set mds debug_ms 1 Then wait for a crash. Please upload the log. To restore your file system: ceph config set mds mds_abort_on_newly_corrupt_dentry false Let the MDS purge the strays and then try: ceph config set mds

[ceph-users] Re: [Help appreciated] ceph mds damaged

2023-05-23 Thread Justin Li
Hi Patrick, Sorry for keeping bothering you but I found that MDS service kept crashing even cluster shows MDS is up. I attached another log of MDS server - eowyn at below. Look forward to hearing more insights. Thanks a lot.

[ceph-users] Re: [Help appreciated] ceph mds damaged

2023-05-23 Thread Justin Li
Sorry Patrick, last email was restricted as attachment size. I attached a link for you to download the log. Thanks. https://drive.google.com/drive/folders/1bV_X7vyma_-gTfLrPnEV27QzsdmgyK4g?usp=sharing Justin Li Senior Technical Officer School of Information Technology Faculty of Science,

[ceph-users] Re: [Help appreciated] ceph mds damaged

2023-05-23 Thread Justin Li
Thanks Patrick. We're making progress! After issuing below cmd (ceph config) you gave me, ceph cluster health shows HEALTH_WARN and mds is back up. However, cephfs can't be mounted showing below error. Ceph mgr portal also show 500 internal error when I try to browse the cephfs folder. I'll be

[ceph-users] Re: [Help appreciated] ceph mds damaged

2023-05-23 Thread Patrick Donnelly
Hello Justin, On Tue, May 23, 2023 at 4:55 PM Justin Li wrote: > > Dear All, > > After a unsuccessful upgrade to pacific, MDS were offline and could not get > back on. Checked the MDS log and found below. See cluster info from below as > well. Appreciate it if anyone can point me to the right

[ceph-users] Re: [Help appreciated] ceph mds damaged

2023-05-23 Thread Justin Li
Thanks for replying, Greg. I'll give you a detailed sequence I did on the upgrade at below. Step 1: upgrade ceph mgr and Monitor --- reboot. Then mgr and mon are all up running. Step 2: upgrade one OSD node --- reboot and OSDs are all up. Step 3: upgrade a second OSD node named OSD-node2. I

[ceph-users] Re: [Help appreciated] ceph mds damaged

2023-05-23 Thread Gregory Farnum
On Tue, May 23, 2023 at 1:55 PM Justin Li wrote: > > Dear All, > > After a unsuccessful upgrade to pacific, MDS were offline and could not get > back on. Checked the MDS log and found below. See cluster info from below as > well. Appreciate it if anyone can point me to the right direction.

[ceph-users] Re: Help needed to configure erasure coding LRC plugin

2023-05-21 Thread Michel Jouvin
Hi Eugen, My LRC pool is also somewhat experimental so nothing really urgent. If you manage to do some tests that help me to understand the problem I remain interested. I propose to keep this thread for that. Zitat, I shared my crush map in the email you answered if the attachment was not

[ceph-users] Re: Help needed to configure erasure coding LRC plugin

2023-05-18 Thread Eugen Block
Hi, I don’t have a good explanation for this yet, but I’ll soon get the opportunity to play around with a decommissioned cluster. I’ll try to get a better understanding of the LRC plugin, but it might take some time, especially since my vacation is coming up. :-) I have some thoughts about

[ceph-users] Re: Help needed to configure erasure coding LRC plugin

2023-05-17 Thread Curt
Hi, I've been following this thread with interest as it seems like a unique use case to expand my knowledge. I don't use LRC or anything outside basic erasure coding. What is your current crush steps rule? I know you made changes since your first post and had some thoughts I wanted to share,

[ceph-users] Re: Help needed to configure erasure coding LRC plugin

2023-05-16 Thread Michel Jouvin
Hi Eugen, Yes, sure, no problem to share it. I attach it to this email (as it may clutter the discussion if inline). If somebody on the list has some clue on the LRC plugin, I'm still interested by understand what I'm doing wrong! Cheers, Michel Le 04/05/2023 à 15:07, Eugen Block a écrit 

[ceph-users] Re: Help needed to configure erasure coding LRC plugin

2023-05-04 Thread Frank Schilder
Subject: [ceph-users] Re: Help needed to configure erasure coding LRC plugin Hi, I don't think you've shared your osd tree yet, could you do that? Apparently nobody else but us reads this thread or nobody reading this uses the LRC plugin. ;-) Thanks, Eugen Zitat von Michel Jouvin : > Hi, > &

[ceph-users] Re: Help needed to configure erasure coding LRC plugin

2023-05-04 Thread Eugen Block
Hi, I don't think you've shared your osd tree yet, could you do that? Apparently nobody else but us reads this thread or nobody reading this uses the LRC plugin. ;-) Thanks, Eugen Zitat von Michel Jouvin : Hi, I had to restart one of my OSD server today and the problem showed up

[ceph-users] Re: Help needed to configure erasure coding LRC plugin

2023-05-04 Thread Michel Jouvin
Hi, I had to restart one of my OSD server today and the problem showed up again. This time I managed to capture "ceph health detail" output showing the problem with the 2 PGs: [WRN] PG_AVAILABILITY: Reduced data availability: 2 pgs inactive, 2 pgs down     pg 56.1 is down, acting

[ceph-users] Re: Help needed to configure erasure coding LRC plugin

2023-05-03 Thread Eugen Block
I think I got it wrong with the locality setting, I'm still limited by the number of hosts I have available in my test cluster, but as far as I got with failure-domain=osd I believe k=6, m=3, l=3 with locality=datacenter could fit your requirement, at least with regards to the recovery

[ceph-users] Re: Help needed to configure erasure coding LRC plugin

2023-05-02 Thread Eugen Block
Hi, disclaimer: I haven't used LRC in a real setup yet, so there might be some misunderstandings on my side. But I tried to play around with one of my test clusters (Nautilus). Because I'm limited in the number of hosts (6 across 3 virtual DCs) I tried two different profiles with lower

[ceph-users] Re: Help needed to configure erasure coding LRC plugin

2023-04-29 Thread Michel Jouvin
Hi, No... our current setup is 3 datacenters with the same configuration, i.e. 1 mon/mgr + 4 OSD servers with 16 OSDs each. Thus the total of 12 OSDs servers. As with LRC plugin, k+m must be a multiple of l, I found that k=9/m=66/l=5 with crush-locality=datacenter was achieving my goal of

[ceph-users] Re: Help needed to configure erasure coding LRC plugin

2023-04-29 Thread Curt
Hello, What is your current setup, 1 server pet data center with 12 osd each? What is your current crush rule and LRC crush rule? On Fri, Apr 28, 2023, 12:29 Michel Jouvin wrote: > Hi, > > I think I found a possible cause of my PG down but still understand why. > As explained in a previous

[ceph-users] Re: Help needed to configure erasure coding LRC plugin

2023-04-28 Thread Michel Jouvin
Hi, I think I found a possible cause of my PG down but still understand why. As explained in a previous mail, I setup a 15-chunk/OSD EC pool (k=9, m=6) but I have only 12 OSD servers in the cluster. To workaround the problem I defined the failure domain as 'osd' with the reasoning that as I

[ceph-users] Re: Help needed to configure erasure coding LRC plugin

2023-04-24 Thread Michel Jouvin
Hi, I'm still interesting by getting feedback from those using the LRC plugin about the right way to configure it... Last week I upgraded from Pacific to Quincy (17.2.6) with cephadm which is doing the upgrade host by host, checking if an OSD is ok to stop before actually upgrading it. I had

[ceph-users] Re: Help needed to configure erasure coding LRC plugin

2023-04-06 Thread Michel Jouvin
Hi, Is somebody using LRC plugin ? I came to the conclusion that LRC  k=9, m=3, l=4 is not the same as jerasure k=9, m=6 in terms of protection against failures and that I should use k=9, m=6, l=5 to get a level of resilience >= jerasure k=9, m=6. The example in the documentation (k=4, m=2,

[ceph-users] Re: Help needed to configure erasure coding LRC plugin

2023-04-04 Thread Michel Jouvin
Answering to myself, I found the reason for 2147483647: it's documented as a failure to find enough OSD (missing OSDs). And it is normal as I selected different hosts for the 15 OSDs but I have only 12 hosts! I'm still interested by an "expert" to confirm that LRC  k=9, m=3, l=4 configuration

[ceph-users] Re: HELP NEEDED : cephadm adopt osd crash

2022-11-08 Thread Eugen Block
You can either provide an image with the adopt command (—image) or you configure it globally with ceph config set (I don’t have the exact command right now). Which image does it fail to pull? You should see that in cephadm.log. Does that node with osd.17 have access to the image repo?

[ceph-users] Re: [Help] Does MSGR2 protocol use openssl for encryption

2022-09-02 Thread Gregory Farnum
We partly rolled our own with AES-GCM. See https://docs.ceph.com/en/quincy/rados/configuration/msgr2/#connection-modes and https://docs.ceph.com/en/quincy/dev/msgr2/#frame-format -Greg On Wed, Aug 24, 2022 at 4:50 PM Jinhao Hu wrote: > > Hi, > > I have a question about the MSGR protocol Ceph

[ceph-users] Re: Help needed picking the right amount of PGs for (Cephfs) metadata pool

2022-06-02 Thread Ramana Venkatesh Raja
On Thu, Jun 2, 2022 at 11:40 AM Stefan Kooman wrote: > > Hi, > > We have a CephFS filesystem holding 70 TiB of data in ~ 300 M files and > ~ 900 M sub directories. We currently have 180 OSDs in this cluster. > > POOL ID PGS STORED (DATA) (OMAP) OBJECTS USED > (DATA)

[ceph-users] Re: HELP! Upgrading monitors from 14.2.22 to 16.2.7 immediately crashes in FSMap::decode()

2022-03-20 Thread Tyler Stachecki
What does 'ceph mon dump | grep min_mon_release' say? You're running msgrv2 and all Ceph daemons are talking on v2, since you're on Nautilus, right? Was the cluster conceived on Nautilus, or something earlier? Tyler On Sun, Mar 20, 2022 at 10:30 PM Clippinger, Sam wrote: > > Hello! > > I need

[ceph-users] Re: Help Removing Failed Cephadm Daemon(s) - MDS Deployment Issue

2022-01-20 Thread Adam King
Hi Michael, To clarify a bit further "ceph orch rm" works for removing services and "ceph orch daemon rm" works for removing daemons. In the command you ran [ceph: root@osd16 /]# ceph orch rm "mds.cephmon03.local osd16.local osd17.local osd18.local.onl26.drymjr" the name you've given there is

[ceph-users] Re: Help Removing Failed Cephadm Daemon(s) - MDS Deployment Issue

2022-01-19 Thread Adam King
Hello Michael, If you're trying to remove all the mds daemons in this mds "cephmon03.local osd16.local osd17.local osd18.local" I think the command would be "ceph orch rm "mds.cephmon03.local osd16.local osd17.local osd18.local"" (note the quotes around that mds.cepmon . . . since cephadm thinks

[ceph-users] Re: Help - Multiple OSD's Down

2022-01-06 Thread Lee
I tried with disk based swap on a SATA SSD. I think that might be the last option. I have exported already all the down PG's from the OSD that they are waiting for. Kind Regards Lee On Thu, 6 Jan 2022 at 20:00, Alexander E. Patrakov wrote: > пт, 7 янв. 2022 г. в 00:50, Alexander E. Patrakov

[ceph-users] Re: Help - Multiple OSD's Down

2022-01-06 Thread Alexander E. Patrakov
пт, 7 янв. 2022 г. в 00:50, Alexander E. Patrakov : > чт, 6 янв. 2022 г. в 12:21, Lee : > >> I've tried add a swap and that fails also. >> > > How exactly did it fail? Did you put it on some disk, or in zram? > > In the past I had to help a customer who hit memory over-use when > upgrading Ceph

[ceph-users] Re: Help - Multiple OSD's Down

2022-01-06 Thread Alexander E. Patrakov
чт, 6 янв. 2022 г. в 12:21, Lee : > I've tried add a swap and that fails also. > How exactly did it fail? Did you put it on some disk, or in zram? In the past I had to help a customer who hit memory over-use when upgrading Ceph (due to shallow_fsck), and we were able to fix it by adding 64 GB

[ceph-users] Re: Help - Multiple OSD's Down

2022-01-06 Thread Marc
> I assume the huge memory consumption is temporary. Once the OSD is up and > stable, it would release the memory. > > So how about allocate a large swap temporarily just to let the OSD up. I > remember that someone else on the list have resolved a similar issue with > swap. But is this

[ceph-users] Re: Help - Multiple OSD's Down

2022-01-06 Thread Marc
Running your osd's with resource limitations is not so straightforward. I can guess that if you are running close to full resource utilization on your nodes, it makes more sense to make sure everything stays as much within their specified limits. (Aside from the question if you would even want

[ceph-users] Re: Help - Multiple OSD's Down

2022-01-05 Thread Igor Fedotov
Hi Lee, could you please raise debug-bluestore and debug-osd to 20 (via ceph tell osd.N injectargs command) when OSD starts to eat up the RAM. Then drop it back to defaults after a few seconds (10s is enough) to avoid huge log size and share the resulting OSD log. Also I'm curious if you

[ceph-users] Re: Help - Multiple OSD's Down

2022-01-05 Thread Lee
For Example top - 22:53:47 up 1:29, 2 users, load average: 2.23, 2.08, 1.92 Tasks: 255 total, 2 running, 253 sleeping, 0 stopped, 0 zombie %Cpu(s): 4.2 us, 4.5 sy, 0.0 ni, 91.1 id, 0.1 wa, 0.0 hi, 0.1 si, 0.0 st MiB Mem : 161169.7 total, 23993.9 free, 132036.5 used, 5139.3

[ceph-users] Re: Help - Multiple OSD's Down

2022-01-05 Thread Lee
The first OSD took 156Gb of Ram to boot.. :( Is there a easy way to stop Mempool pulling so much memory. On Wed, 5 Jan 2022 at 22:12, Mazzystr wrote: > and that is exactly why I run osds containerized with limited cpu and > memory as well as "bluestore cache size", "osd memory target", and

[ceph-users] Re: Help - Multiple OSD's Down

2022-01-05 Thread Mazzystr
and that is exactly why I run osds containerized with limited cpu and memory as well as "bluestore cache size", "osd memory target", and "mds cache memory limit". Osd processes have become noisy neighbors in the last few versions. On Wed, Jan 5, 2022 at 1:47 PM Lee wrote: > I'm not rushing,

[ceph-users] Re: Help - Multiple OSD's Down

2022-01-05 Thread mhnx
It's nice to hear that. You can also decrease the osd ram usage from 4gb to 2gb. If you have enough spare ram go for it. Good luck. Lee , 6 Oca 2022 Per, 00:46 tarihinde şunu yazdı: > > I'm not rushing, > > I have found the issue, Im am getting OOM errors as the OSD boots, basically > is starts

[ceph-users] Re: Help - Multiple OSD's Down

2022-01-05 Thread Lee
I'm not rushing, I have found the issue, Im am getting OOM errors as the OSD boots, basically is starts to process the PG's and then the node runs out of memory and the daemon kill's 2022-01-05 20:09:08 bb-ceph-enc-rm63-osd03-31 osd.51 2022-01-05T20:09:01.024+ 7fce3c6bc700 10 osd.51 24448261

[ceph-users] Re: Help - Multiple OSD's Down

2022-01-05 Thread mhnx
First of all, do not rush into bad decisions. Production is down and you wanna make it online but you should fix the problem and be sure first. If a second crash occurs in a healing state you will lose metadata. You don't need to debug first! You didn't mention your cluster status and we don't

[ceph-users] Re: Help needed to recover 3node-cluster

2022-01-03 Thread Michael Moyles
You should prioritise recovering quorum of your monitors. Cephs documentation can help here https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-mon/ Check to see if the failed mon is still part of the monmap on the other nodes, if it is you might need to remove it manually

[ceph-users] Re: Help

2020-08-17 Thread DHilsbos
Randy; Nextcloud is easy, it has a "standard" S3 client capability, though it also has Swift client capability. As a S3 client, it does look for the older path style (host/bucket), rather than Amazons newer DNS style (bucket.host). You can find information on configuring Nextcloud's primary

[ceph-users] Re: Help

2020-08-17 Thread Jarett DeAngelis
Configuring it with respect to what about these applications? What are you trying to do? Do you have existing installations of any of these? We need a little more about your requirements. > On Apr 17, 2020, at 1:14 PM, Randy Morgan wrote: > > We are seeking information on configuring Ceph to

[ceph-users] Re: help me enable ceph iscsi gatewaty in ceph octopus

2020-08-06 Thread Ricardo Marques
())' From: Sharad Mehrotra Sent: Thursday, August 6, 2020 1:03 AM To: ceph-users@ceph.io Subject: [ceph-users] Re: help me enable ceph iscsi gatewaty in ceph octopus Adding some additional context for my question below. I am following the directions here: https

[ceph-users] Re: help me enable ceph iscsi gatewaty in ceph octopus

2020-08-05 Thread Sharad Mehrotra
Adding some additional context for my question below. I am following the directions here: https://docs.ceph.com/docs/master/rbd/iscsi-target-cli/, but am getting stuck on step #3 of the "Configuring" section, similar to the issue reported above that you worked on. FYI, I installed my ceph-iscsi

[ceph-users] Re: help me enable ceph iscsi gatewaty in ceph octopus

2020-08-05 Thread Sharad Mehrotra
Sebastian et al: How did you solve the "The first gateway defined must be the local machine" issue that I asked about on another thread? I am deploying ceph-iscsi manually as described in the link that you sent out (https://docs.ceph.com/docs/master/rbd/iscsi-target-cli/). Thank you! On Wed,

[ceph-users] Re: help me enable ceph iscsi gatewaty in ceph octopus

2020-08-05 Thread Sebastian Wagner
Till iscsi is fully working in cephadm, you can install ceph-iscsi manually as described here: https://docs.ceph.com/docs/master/rbd/iscsi-target-cli/ Am 05.08.20 um 11:44 schrieb Hoài Thương: > hello swagner, > Can you give me document , i use cephadm -- SUSE Software Solutions Germany

[ceph-users] Re: help me enable ceph iscsi gatewaty in ceph octopus

2020-08-05 Thread Hoài Thương
hello swagner, Can you give me document , i use cephadm ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: help me enable ceph iscsi gatewaty in ceph octopus

2020-08-05 Thread Sebastian Wagner
hi David, hi Ricardo, I think we first have to clarify, if that was actually a cephadm deployment (and not ceph-ansible). If you install Ceph using ceph-ansible, then please refer to the ceph-ansible docs. If we're actually talking about cephadm here (which is not clear to me): iSCSI for

[ceph-users] Re: help me enable ceph iscsi gatewaty in ceph octopus

2020-08-05 Thread Ricardo Marques
Hi David, I was able to configure iSCSI gateways on my local test environment using the following spec: ``` # tail -14 service_spec_gw.yml --- service_type: iscsi service_id: iscsi_service placement: hosts: - 'node1' - 'node2' spec: pool: rbd trusted_ip_list:

[ceph-users] Re: Help add node to cluster using cephadm

2020-07-22 Thread Hoài Thương
it working Vào Th 4, 22 thg 7, 2020 vào lúc 14:41 David Thuong < davidthuong2...@gmail.com> đã viết: > tks you, after install docker for new node, i can add node > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to

[ceph-users] Re: Help add node to cluster using cephadm

2020-07-22 Thread David Thuong
tks you, after install docker for new node, i can add node ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Help add node to cluster using cephadm

2020-07-22 Thread Hoài Thương
Will do, thanks! Vào Th 4, 22 thg 7, 2020 vào lúc 12:27 steven prothero < ste...@marimo-tech.com> đã viết: > Hello, > > Yes, make sure docker & ntp is setup on the new node first. > Also, make sure the public key is added on the new node and firewall > is allowing it through >

[ceph-users] Re: Help add node to cluster using cephadm

2020-07-21 Thread steven prothero
Hello, Yes, make sure docker & ntp is setup on the new node first. Also, make sure the public key is added on the new node and firewall is allowing it through ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to

[ceph-users] Re: Help add node to cluster using cephadm

2020-07-21 Thread davidthuong2424
hello, i use docker, i will check ntp, Do new node need to be installed? ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Help add node to cluster using cephadm

2020-07-21 Thread steven prothero
Hello, is podman installed on the new node? also make sure the NTP time sync is on for new node. The ceph orch checks those on the new node and then dies if not ready with an error like you see. ___ ceph-users mailing list -- ceph-users@ceph.io To

[ceph-users] Re: help with failed osds after reboot

2020-06-15 Thread Paul Emmerich
On Mon, Jun 15, 2020 at 7:01 PM wrote: > Ceph version 10.2.7 > > ceph.conf > [global] > fsid = 75d6dba9-2144-47b1-87ef-1fe21d3c58a8 > (...) > mount_activate: Failed to activate > ceph-disk: Error: No cluster conf found in /etc/ceph with fsid > e1d7b4ae-2dcd-40ee-bea5-d103fe1fa9c9 > -- Paul

[ceph-users] Re: help with failed osds after reboot

2020-06-15 Thread seth . duncan2
Ceph version 10.2.7 ceph.conf [global] fsid = 75d6dba9-2144-47b1-87ef-1fe21d3c58a8 mon_initial_members = chad, jesse, seth mon_host = 192.168.10.41,192.168.10.40,192.168.10.39 mon warn on legacy crush tunables = false auth_cluster_required = cephx auth_service_required = cephx

[ceph-users] Re: help with failed osds after reboot

2020-06-12 Thread Marc Roos
Maybe you have the same issue? https://tracker.ceph.com/issues/44102#change-167531 In my case an update(?) disabled osd runlevels. systemctl is-enabled ceph-osd@0 -Original Message- To: ceph-users@ceph.io Subject: [ceph-users] Re: help with failed osds after reboot Hi, which ceph

[ceph-users] Re: help with failed osds after reboot

2020-06-12 Thread Eugen Block
Hi, which ceph release are you using? You mention ceph-disk so your OSDs are not LVM based, I assume? I've seen these messages a lot when testing in my virtual lab environment although I don't believe it's the cluster's fsid but the OSD's fsid that's in the error message (the OSDs have

[ceph-users] Re: Help! ceph-mon is blocked after shutting down and ip address changed

2020-06-03 Thread Zhenshi Zhou
did you change mon_host in ceph.conf while you set the ip back to 192.168.0.104. I did a monitor ip changing in a live cluster. But I had 3 mon and I modified only 1 ip and then submitted the monmap. 于2020年5月29日周五 下午11:55写道: > ceph version 14.2.4 (75f4de193b3ea58512f204623e6c5a16e6c1e1ba)

[ceph-users] Re: HELP! Ceph( v 14.2.8) bucket notification dose not work!

2020-04-17 Thread gwampole
Hello, Has the resolution for this issue been released in Nautilus? I'm still experiencing this on 14.2.9 though I noticed the PR (https://github.com/ceph/ceph/pull/33978) seemed to be merged in. Thanks! -Garrett ___ ceph-users mailing list --

[ceph-users] Re: Help: corrupt pg

2020-03-27 Thread Jake Grimmett
Hi Greg, Yes, this was caused by a chain of event. As a cautionary tale, the main ones were: 1) minor nautilus release upgrade, followed by a rolling node restart script that mistakenly relied on "ceph -s" for cluster health info, i.e. it didn't wait for the cluster to return to health

[ceph-users] Re: Help: corrupt pg

2020-03-26 Thread Gregory Farnum
On Wed, Mar 25, 2020 at 5:19 AM Jake Grimmett wrote: > > Dear All, > > We are "in a bit of a pickle"... > > No reply to my message (23/03/2020), subject "OSD: FAILED > ceph_assert(clone_size.count(clone))" > > So I'm presuming it's not possible to recover the crashed OSD From your later email

[ceph-users] Re: Help: corrupt pg

2020-03-25 Thread Jake Grimmett
Hi Eugen, Many thanks for your reply. The other two OSD's are up and running, and being used by other pgs with no problem, for some reason this pg refuses to use these OSD's. The other two OSDs that are missing from this pg crashed at different times last month, each OSD crashed when we

[ceph-users] Re: Help: corrupt pg

2020-03-25 Thread Eugen Block
Hi, is there any chance to recover the other failing OSDs that seem to have one chunk of this PG? Do the other OSDs fail with the same error? Zitat von Jake Grimmett : Dear All, We are "in a bit of a pickle"... No reply to my message (23/03/2020),  subject  "OSD: FAILED

[ceph-users] Re: HELP! Ceph( v 14.2.8) bucket notification dose not work!

2020-03-15 Thread Yuval Lifshitz
yes, this is a regression issue with the new version: https://tracker.ceph.com/issues/44614 On Thu, Mar 12, 2020 at 8:44 PM 曹 海旺 wrote: > I think it is a bug . I reinstall the cluster . The response of create > topic still 405 .methodnotallowed, anynoe konw why? Thank you very much ! > >

[ceph-users] Re: HELP! Ceph( v 14.2.8) bucket notification dose not work!

2020-03-12 Thread 曹 海旺
I think it is a bug . I reinstall the cluster . The response of create topic still 405 .methodnotallowed, anynoe konw why? Thank you very much ! 2020年3月12日 下午6:53,曹 海旺 mailto:caohaiw...@hotmail.com>> 写道: Hi, I upgrade the ceph from 14.2.7 to the new version 14.2.8 . The bucket

[ceph-users] Re: help

2019-08-30 Thread Amudhan P
my cluster health status went to warning mode only after running mkdir of 1000's of folders with multiple subdirectories. if this has made OSD crash does it really takes that long to heal empty directories. On Fri, Aug 30, 2019 at 3:12 PM Janne Johansson wrote: > Den fre 30 aug. 2019 kl 10:49

[ceph-users] Re: help

2019-08-30 Thread Janne Johansson
Den fre 30 aug. 2019 kl 10:49 skrev Amudhan P : > After leaving 12 hours time now cluster status is healthy, but why did it > take such a long time for backfill? > How do I fine-tune? if in case of same kind error pop-out again. > > The backfilling is taking a while because max_backfills = 1 and

[ceph-users] Re: help

2019-08-30 Thread Amudhan P
After leaving 12 hours time now cluster status is healthy, but why did it take such a long time for backfill? How do I fine-tune? if in case of same kind error pop-out again. On Thu, Aug 29, 2019 at 6:52 PM Caspar Smit wrote: > Hi, > > This output doesn't show anything 'wrong' with the

[ceph-users] Re: help

2019-08-29 Thread Caspar Smit
Hi, This output doesn't show anything 'wrong' with the cluster. It's just still recovering (backfilling) from what seems like one of your OSD's crashed and restarted. The backfilling is taking a while because max_backfills = 1 and you only have 3 OSD's total so the backfilling per PG has to have

[ceph-users] Re: help

2019-08-29 Thread Burkhard Linke
Hi, ceph uses a pseudo random distribution within crush to select the target hosts. As a result, the algorithm might not be able to select three different hosts out of three hosts in the configured number of tries. The affected PGs will be shown as undersized and only list two OSDs instead

[ceph-users] Re: help

2019-08-29 Thread Amudhan P
output from "ceph osd pool ls detail" pool 1 'cephfs_data' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 last_change 74 lfor 0/64 flags hashpspool stripe_width 0 application cephfs pool 2 'cephfs_metadata' replicated size 3 min_size 2 crush_rule 0

  1   2   >