I had the same problem as you
The only solution that worked for me is to set it on the pools:
for pool in $(ceph osd pool ls); do
ceph osd pool set "$pool" scrub_max_interval "$smaxi"
ceph osd pool set "$pool" scrub_min_interval "$smini"
ceph osd pool set "$pool"
Hi Anthony,
thanks for the tips. I reset all the values but osd_deep_scrub_interval
to their defaults as reported at
https://docs.ceph.com/en/latest/rados/configuration/osd-config-ref/ :
# ceph config set osd osd_scrub_sleep 0.0
# ceph config set osd osd_scrub_load_threshold 0.5
# ceph
* Try applying the settings to global so that mons/mgrs get them.
* Set your shallow scrub settings back to the default. Shallow scrubs take
very few resources
* Set your randomize_ratio back to the default, you’re just bunching them up
* Set the load threshold back to the default, I can’t
Hi Anthony and everyone else
We have found the issue. Because the new 20x 14 TiB OSDs were onboarded
onto a single node, there was not only an imbalance in the capacity of each
OSD but also between the nodes (other nodes each have around 15x 1.7TiB).
Furthermore, CRUSH rule sets default failure
> I have recently onboarded new OSDs into my Ceph Cluster. Previously, I had
> 44 OSDs of 1.7TiB each and was using it for about a year. About 1 year ago,
> we onboarded an additional 20 OSDs of 14TiB each.
That's a big difference in size. I suggest increasing mon_max_pg_per_osd to
1000 --
Hi Jasper,
I suggest to disable all the crush-compat and reweighting approaches.
They rarely work out.
The state of the art is:
ceph balancer on
ceph balancer mode upmap
ceph config set mgr mgr/balancer/upmap_max_deviation 1
Cheers, Dan
--
Dan van der Ster
CTO
Clyso GmbH
p: +49 89 215252722
On Wed, Jan 31, 2024 at 3:43 AM garcetto wrote:
>
> good morning,
> i was struggling trying to understand why i cannot find this setting on
> my reef version, is it because is only on latest dev ceph version and not
> before?
that's right, this new feature will be part of the squid release. we
Thank you Eugen! This worked :)
> Op 09-11-2023 14:55 CET schreef Eugen Block :
>
>
> It's the '#' character, everything after (including '#' itself) is cut
> off. I tried with single and double quotes which also failed. But as I
> already said, use a simple password and then change it
It's the '#' character, everything after (including '#' itself) is cut
off. I tried with single and double quotes which also failed. But as I
already said, use a simple password and then change it within grafana.
That way you also don't have the actual password lying around in clear
text
I just tried it on a 17.2.6 test cluster, although I don't have a
stack trace the complicated password doesn't seem to be applied (don't
know why yet). But since it's an "initial" password you can choose
something simple like "admin", and during the first login you are
asked to change it
I tried everything at this point, even waited a hour, still no luck. Got it 1
time accidentally working, but with a placeholder for a password. Tried with
correct password, nothing and trying again with the placeholder didn't work
anymore.
So I thought to switch the manager, maybe something
Usually, removing the grafana service should be enough. I also have
this directory (custom_config_files/grafana.) but it's
empty. Can you confirm that after running 'ceph orch rm grafana' the
service is actually gone ('ceph orch ls grafana')? The directory
underneath
Using podman version 4.4.1 on RHEL 8.8, Ceph 17.2.7
I used 'podman system prune -a -f' and 'podman volume prune -f' to cleanup
files, but this leaves a lot of files over in
/var/lib/containers/storage/overlay and a empty folder
/var/lib/ceph//custom_config_files/grafana..
Found those files
What doesn't work exactly? For me it did...
Zitat von Sake Ceph :
To bad, that doesn't work :(
Op 09-11-2023 09:07 CET schreef Sake Ceph :
Hi,
Well to get promtail working with Loki, you need to setup a
password in Grafana.
But promtail wasn't working with the 17.2.6 release, the URL was
To bad, that doesn't work :(
> Op 09-11-2023 09:07 CET schreef Sake Ceph :
>
>
> Hi,
>
> Well to get promtail working with Loki, you need to setup a password in
> Grafana.
> But promtail wasn't working with the 17.2.6 release, the URL was set to
> containers.local. So I stopped using it,
Hi,
Well to get promtail working with Loki, you need to setup a password in
Grafana.
But promtail wasn't working with the 17.2.6 release, the URL was set to
containers.local. So I stopped using it, but forgot to click on save in KeePass
:(
I didn't configure anything special in Grafana, the
Hi,
you mean you forgot your password? You can remove the service with
'ceph orch rm grafana', then re-apply your grafana.yaml containing the
initial password. Note that this would remove all of the grafana
configs or custom dashboards etc., you would have to reconfigure them.
So before
On Tue, Aug 8, 2023 at 1:18 AM Zhang Bao wrote:
>
> Hi, thanks for your help.
>
> I am using ceph Pacific 16.2.7.
>
> Before my Ceph stuck at `ceph fs status fsname`, one of my cephfs became
> readonly.
Probably the ceph-mgr is stuck (the "volumes" plugin) somehow talking
to the read-only
On Mon, Aug 7, 2023 at 6:12 AM Zhang Bao wrote:
>
> Hi,
>
> I have a ceph stucked at `ceph --verbose stats fs fsname`. And in the
> monitor log, I can found something like `audit [DBG] from='client.431973 -'
> entity='client.admin' cmd=[{"prefix": "fs status", "fs": "fsname",
> "target":
I created a tracker issue, maybe that will get some attention:
https://tracker.ceph.com/issues/61861
Zitat von Michel Jouvin :
Hi Eugen,
Thank you very much for these detailed tests that match what I
observed and reported earlier. I'm happy to see that we have the
same understanding of
Hi,
adding the dev mailing list, hopefully someone there can chime in. But
apparently the LRC code hasn't been maintained for a few years
(https://github.com/ceph/ceph/tree/main/src/erasure-code/lrc). Let's
see...
Zitat von Michel Jouvin :
Hi Eugen,
Thank you very much for these
Hi Eugen,
Thank you very much for these detailed tests that match what I observed
and reported earlier. I'm happy to see that we have the same
understanding of how it should work (based on the documentation). Is
there any other way that this list to enter in contact with the plugin
Hi, I have a real hardware cluster for testing available now. I'm not
sure whether I'm completely misunderstanding how it's supposed to work
or if it's a bug in the LRC plugin.
This cluster has 18 HDD nodes available across 3 rooms (or DCs), I
intend to use 15 nodes to be able to recover if
Hi,
I realize that the crushmap I attached to one of my email, probably
required to understand the discussion here, has been stripped down by
mailman. To avoid poluting the thread with a long output, I put it on at
https://box.in2p3.fr/index.php/s/J4fcm7orfNE87CX. Download it if you are
Hi Patrick,
The disaster recovery process with cephfs-data-scan tool didn't fix our MDS
issue. It still kept crashing. I've uploaded a detailed MDS log with below ID.
The restore procedure below didn't get it working either. Should I set
mds_go_bad_corrupt_dentry to false alongside with
Hi Patrick,
Thanks for the instructions. We started the MDS recovery scan with below cmds
following the link below. The first bit of scan extens has finished and we're
waiting on scan inodes. Probably we shouldn't interrupt the process. Once this
procedure failed, I'll follow your steps and
Hello Justin,
Please do:
ceph config set mds debug_mds 20
ceph config set mds debug_ms 1
Then wait for a crash. Please upload the log.
To restore your file system:
ceph config set mds mds_abort_on_newly_corrupt_dentry false
Let the MDS purge the strays and then try:
ceph config set mds
Hi Patrick,
Sorry for keeping bothering you but I found that MDS service kept crashing even
cluster shows MDS is up. I attached another log of MDS server - eowyn at below.
Look forward to hearing more insights. Thanks a lot.
Sorry Patrick, last email was restricted as attachment size. I attached a link
for you to download the log. Thanks.
https://drive.google.com/drive/folders/1bV_X7vyma_-gTfLrPnEV27QzsdmgyK4g?usp=sharing
Justin Li
Senior Technical Officer
School of Information Technology
Faculty of Science,
Thanks Patrick. We're making progress! After issuing below cmd (ceph config)
you gave me, ceph cluster health shows HEALTH_WARN and mds is back up. However,
cephfs can't be mounted showing below error. Ceph mgr portal also show 500
internal error when I try to browse the cephfs folder. I'll be
Hello Justin,
On Tue, May 23, 2023 at 4:55 PM Justin Li wrote:
>
> Dear All,
>
> After a unsuccessful upgrade to pacific, MDS were offline and could not get
> back on. Checked the MDS log and found below. See cluster info from below as
> well. Appreciate it if anyone can point me to the right
Thanks for replying, Greg. I'll give you a detailed sequence I did on the
upgrade at below.
Step 1: upgrade ceph mgr and Monitor --- reboot. Then mgr and mon are all up
running.
Step 2: upgrade one OSD node --- reboot and OSDs are all up.
Step 3: upgrade a second OSD node named OSD-node2. I
On Tue, May 23, 2023 at 1:55 PM Justin Li wrote:
>
> Dear All,
>
> After a unsuccessful upgrade to pacific, MDS were offline and could not get
> back on. Checked the MDS log and found below. See cluster info from below as
> well. Appreciate it if anyone can point me to the right direction.
Hi Eugen,
My LRC pool is also somewhat experimental so nothing really urgent. If you
manage to do some tests that help me to understand the problem I remain
interested. I propose to keep this thread for that.
Zitat, I shared my crush map in the email you answered if the attachment
was not
Hi, I don’t have a good explanation for this yet, but I’ll soon get
the opportunity to play around with a decommissioned cluster. I’ll try
to get a better understanding of the LRC plugin, but it might take
some time, especially since my vacation is coming up. :-)
I have some thoughts about
Hi,
I've been following this thread with interest as it seems like a unique use
case to expand my knowledge. I don't use LRC or anything outside basic
erasure coding.
What is your current crush steps rule? I know you made changes since your
first post and had some thoughts I wanted to share,
Hi Eugen,
Yes, sure, no problem to share it. I attach it to this email (as it may
clutter the discussion if inline).
If somebody on the list has some clue on the LRC plugin, I'm still
interested by understand what I'm doing wrong!
Cheers,
Michel
Le 04/05/2023 à 15:07, Eugen Block a écrit
Subject: [ceph-users] Re: Help needed to configure erasure coding LRC plugin
Hi,
I don't think you've shared your osd tree yet, could you do that?
Apparently nobody else but us reads this thread or nobody reading this
uses the LRC plugin. ;-)
Thanks,
Eugen
Zitat von Michel Jouvin :
> Hi,
>
&
Hi,
I don't think you've shared your osd tree yet, could you do that?
Apparently nobody else but us reads this thread or nobody reading this
uses the LRC plugin. ;-)
Thanks,
Eugen
Zitat von Michel Jouvin :
Hi,
I had to restart one of my OSD server today and the problem showed
up
Hi,
I had to restart one of my OSD server today and the problem showed up
again. This time I managed to capture "ceph health detail" output
showing the problem with the 2 PGs:
[WRN] PG_AVAILABILITY: Reduced data availability: 2 pgs inactive, 2 pgs down
pg 56.1 is down, acting
I think I got it wrong with the locality setting, I'm still limited by
the number of hosts I have available in my test cluster, but as far as
I got with failure-domain=osd I believe k=6, m=3, l=3 with
locality=datacenter could fit your requirement, at least with regards
to the recovery
Hi,
disclaimer: I haven't used LRC in a real setup yet, so there might be
some misunderstandings on my side. But I tried to play around with one
of my test clusters (Nautilus). Because I'm limited in the number of
hosts (6 across 3 virtual DCs) I tried two different profiles with
lower
Hi,
No... our current setup is 3 datacenters with the same configuration,
i.e. 1 mon/mgr + 4 OSD servers with 16 OSDs each. Thus the total of 12
OSDs servers. As with LRC plugin, k+m must be a multiple of l, I found
that k=9/m=66/l=5 with crush-locality=datacenter was achieving my goal
of
Hello,
What is your current setup, 1 server pet data center with 12 osd each? What
is your current crush rule and LRC crush rule?
On Fri, Apr 28, 2023, 12:29 Michel Jouvin
wrote:
> Hi,
>
> I think I found a possible cause of my PG down but still understand why.
> As explained in a previous
Hi,
I think I found a possible cause of my PG down but still understand why.
As explained in a previous mail, I setup a 15-chunk/OSD EC pool (k=9,
m=6) but I have only 12 OSD servers in the cluster. To workaround the
problem I defined the failure domain as 'osd' with the reasoning that as
I
Hi,
I'm still interesting by getting feedback from those using the LRC
plugin about the right way to configure it... Last week I upgraded from
Pacific to Quincy (17.2.6) with cephadm which is doing the upgrade host
by host, checking if an OSD is ok to stop before actually upgrading it.
I had
Hi,
Is somebody using LRC plugin ?
I came to the conclusion that LRC k=9, m=3, l=4 is not the same as
jerasure k=9, m=6 in terms of protection against failures and that I
should use k=9, m=6, l=5 to get a level of resilience >= jerasure k=9,
m=6. The example in the documentation (k=4, m=2,
Answering to myself, I found the reason for 2147483647: it's documented
as a failure to find enough OSD (missing OSDs). And it is normal as I
selected different hosts for the 15 OSDs but I have only 12 hosts!
I'm still interested by an "expert" to confirm that LRC k=9, m=3, l=4
configuration
You can either provide an image with the adopt command (—image) or you
configure it globally with ceph config set (I don’t have the exact
command right now). Which image does it fail to pull? You should see
that in cephadm.log. Does that node with osd.17 have access to the
image repo?
We partly rolled our own with AES-GCM. See
https://docs.ceph.com/en/quincy/rados/configuration/msgr2/#connection-modes
and https://docs.ceph.com/en/quincy/dev/msgr2/#frame-format
-Greg
On Wed, Aug 24, 2022 at 4:50 PM Jinhao Hu wrote:
>
> Hi,
>
> I have a question about the MSGR protocol Ceph
On Thu, Jun 2, 2022 at 11:40 AM Stefan Kooman wrote:
>
> Hi,
>
> We have a CephFS filesystem holding 70 TiB of data in ~ 300 M files and
> ~ 900 M sub directories. We currently have 180 OSDs in this cluster.
>
> POOL ID PGS STORED (DATA) (OMAP) OBJECTS USED
> (DATA)
What does 'ceph mon dump | grep min_mon_release' say? You're running
msgrv2 and all Ceph daemons are talking on v2, since you're on
Nautilus, right?
Was the cluster conceived on Nautilus, or something earlier?
Tyler
On Sun, Mar 20, 2022 at 10:30 PM Clippinger, Sam
wrote:
>
> Hello!
>
> I need
Hi Michael,
To clarify a bit further "ceph orch rm" works for removing services and
"ceph orch daemon rm" works for removing daemons. In the command you ran
[ceph: root@osd16 /]# ceph orch rm "mds.cephmon03.local osd16.local
osd17.local osd18.local.onl26.drymjr"
the name you've given there is
Hello Michael,
If you're trying to remove all the mds daemons in this mds "cephmon03.local
osd16.local osd17.local osd18.local" I think the command would be "ceph
orch rm "mds.cephmon03.local osd16.local osd17.local osd18.local"" (note
the quotes around that mds.cepmon . . . since cephadm thinks
I tried with disk based swap on a SATA SSD.
I think that might be the last option. I have exported already all the down
PG's from the OSD that they are waiting for.
Kind Regards
Lee
On Thu, 6 Jan 2022 at 20:00, Alexander E. Patrakov
wrote:
> пт, 7 янв. 2022 г. в 00:50, Alexander E. Patrakov
пт, 7 янв. 2022 г. в 00:50, Alexander E. Patrakov :
> чт, 6 янв. 2022 г. в 12:21, Lee :
>
>> I've tried add a swap and that fails also.
>>
>
> How exactly did it fail? Did you put it on some disk, or in zram?
>
> In the past I had to help a customer who hit memory over-use when
> upgrading Ceph
чт, 6 янв. 2022 г. в 12:21, Lee :
> I've tried add a swap and that fails also.
>
How exactly did it fail? Did you put it on some disk, or in zram?
In the past I had to help a customer who hit memory over-use when upgrading
Ceph (due to shallow_fsck), and we were able to fix it by adding 64 GB
> I assume the huge memory consumption is temporary. Once the OSD is up and
> stable, it would release the memory.
>
> So how about allocate a large swap temporarily just to let the OSD up. I
> remember that someone else on the list have resolved a similar issue with
> swap.
But is this
Running your osd's with resource limitations is not so straightforward. I can
guess that if you are running close to full resource utilization on your nodes,
it makes more sense to make sure everything stays as much within their
specified limits. (Aside from the question if you would even want
Hi Lee,
could you please raise debug-bluestore and debug-osd to 20 (via ceph
tell osd.N injectargs command) when OSD starts to eat up the RAM. Then
drop it back to defaults after a few seconds (10s is enough) to avoid
huge log size and share the resulting OSD log.
Also I'm curious if you
For Example
top - 22:53:47 up 1:29, 2 users, load average: 2.23, 2.08, 1.92
Tasks: 255 total, 2 running, 253 sleeping, 0 stopped, 0 zombie
%Cpu(s): 4.2 us, 4.5 sy, 0.0 ni, 91.1 id, 0.1 wa, 0.0 hi, 0.1 si,
0.0 st
MiB Mem : 161169.7 total, 23993.9 free, 132036.5 used, 5139.3
The first OSD took 156Gb of Ram to boot.. :(
Is there a easy way to stop Mempool pulling so much memory.
On Wed, 5 Jan 2022 at 22:12, Mazzystr wrote:
> and that is exactly why I run osds containerized with limited cpu and
> memory as well as "bluestore cache size", "osd memory target", and
and that is exactly why I run osds containerized with limited cpu and
memory as well as "bluestore cache size", "osd memory target", and "mds
cache memory limit". Osd processes have become noisy neighbors in the last
few versions.
On Wed, Jan 5, 2022 at 1:47 PM Lee wrote:
> I'm not rushing,
It's nice to hear that. You can also decrease the osd ram usage from
4gb to 2gb. If you have enough spare ram go for it.
Good luck.
Lee , 6 Oca 2022 Per, 00:46 tarihinde şunu yazdı:
>
> I'm not rushing,
>
> I have found the issue, Im am getting OOM errors as the OSD boots, basically
> is starts
I'm not rushing,
I have found the issue, Im am getting OOM errors as the OSD boots,
basically is starts to process the PG's and then the node runs out of
memory and the daemon kill's
2022-01-05 20:09:08 bb-ceph-enc-rm63-osd03-31 osd.51
2022-01-05T20:09:01.024+ 7fce3c6bc700 10 osd.51 24448261
First of all, do not rush into bad decisions.
Production is down and you wanna make it online but you should fix the
problem and be sure first. If a second crash occurs in a healing state
you will lose metadata.
You don't need to debug first!
You didn't mention your cluster status and we don't
You should prioritise recovering quorum of your monitors. Cephs
documentation can help here
https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-mon/
Check to see if the failed mon is still part of the monmap on the other
nodes, if it is you might need to remove it manually
Randy;
Nextcloud is easy, it has a "standard" S3 client capability, though it also has
Swift client capability. As a S3 client, it does look for the older path style
(host/bucket), rather than Amazons newer DNS style (bucket.host).
You can find information on configuring Nextcloud's primary
Configuring it with respect to what about these applications? What are you
trying to do? Do you have existing installations of any of these? We need a
little more about your requirements.
> On Apr 17, 2020, at 1:14 PM, Randy Morgan wrote:
>
> We are seeking information on configuring Ceph to
())'
From: Sharad Mehrotra
Sent: Thursday, August 6, 2020 1:03 AM
To: ceph-users@ceph.io
Subject: [ceph-users] Re: help me enable ceph iscsi gatewaty in ceph octopus
Adding some additional context for my question below.
I am following the directions here:
https
Adding some additional context for my question below.
I am following the directions here:
https://docs.ceph.com/docs/master/rbd/iscsi-target-cli/, but am getting
stuck on step #3 of the "Configuring" section, similar to the issue
reported above that you worked on.
FYI, I installed my ceph-iscsi
Sebastian et al:
How did you solve the "The first gateway
defined must be the local machine" issue that I asked about on another
thread?
I am deploying ceph-iscsi manually as described in the link that you sent
out (https://docs.ceph.com/docs/master/rbd/iscsi-target-cli/).
Thank you!
On Wed,
Till iscsi is fully working in cephadm, you can install ceph-iscsi
manually as described here:
https://docs.ceph.com/docs/master/rbd/iscsi-target-cli/
Am 05.08.20 um 11:44 schrieb Hoài Thương:
> hello swagner,
> Can you give me document , i use cephadm
--
SUSE Software Solutions Germany
hello swagner,
Can you give me document , i use cephadm
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
hi David, hi Ricardo,
I think we first have to clarify, if that was actually a cephadm
deployment (and not ceph-ansible).
If you install Ceph using ceph-ansible, then please refer to the
ceph-ansible docs.
If we're actually talking about cephadm here (which is not clear to me):
iSCSI for
Hi David,
I was able to configure iSCSI gateways on my local test environment using the
following spec:
```
# tail -14 service_spec_gw.yml
---
service_type: iscsi
service_id: iscsi_service
placement:
hosts:
- 'node1'
- 'node2'
spec:
pool: rbd
trusted_ip_list:
it working
Vào Th 4, 22 thg 7, 2020 vào lúc 14:41 David Thuong <
davidthuong2...@gmail.com> đã viết:
> tks you, after install docker for new node, i can add node
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to
tks you, after install docker for new node, i can add node
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
Will do, thanks!
Vào Th 4, 22 thg 7, 2020 vào lúc 12:27 steven prothero <
ste...@marimo-tech.com> đã viết:
> Hello,
>
> Yes, make sure docker & ntp is setup on the new node first.
> Also, make sure the public key is added on the new node and firewall
> is allowing it through
>
Hello,
Yes, make sure docker & ntp is setup on the new node first.
Also, make sure the public key is added on the new node and firewall
is allowing it through
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to
hello,
i use docker, i will check ntp,
Do new node need to be installed?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
Hello,
is podman installed on the new node? also make sure the NTP time sync
is on for new node. The ceph orch checks those on the new node and
then dies if not ready with an error like you see.
___
ceph-users mailing list -- ceph-users@ceph.io
To
On Mon, Jun 15, 2020 at 7:01 PM wrote:
> Ceph version 10.2.7
>
> ceph.conf
> [global]
> fsid = 75d6dba9-2144-47b1-87ef-1fe21d3c58a8
>
(...)
> mount_activate: Failed to activate
> ceph-disk: Error: No cluster conf found in /etc/ceph with fsid
> e1d7b4ae-2dcd-40ee-bea5-d103fe1fa9c9
>
--
Paul
Ceph version 10.2.7
ceph.conf
[global]
fsid = 75d6dba9-2144-47b1-87ef-1fe21d3c58a8
mon_initial_members = chad, jesse, seth
mon_host = 192.168.10.41,192.168.10.40,192.168.10.39
mon warn on legacy crush tunables = false
auth_cluster_required = cephx
auth_service_required = cephx
Maybe you have the same issue?
https://tracker.ceph.com/issues/44102#change-167531
In my case an update(?) disabled osd runlevels.
systemctl is-enabled ceph-osd@0
-Original Message-
To: ceph-users@ceph.io
Subject: [ceph-users] Re: help with failed osds after reboot
Hi,
which ceph
Hi,
which ceph release are you using? You mention ceph-disk so your OSDs
are not LVM based, I assume?
I've seen these messages a lot when testing in my virtual lab
environment although I don't believe it's the cluster's fsid but the
OSD's fsid that's in the error message (the OSDs have
did you change mon_host in ceph.conf while you set the ip back to
192.168.0.104.
I did a monitor ip changing in a live cluster. But I had 3 mon and I
modified only 1 ip and then submitted the monmap.
于2020年5月29日周五 下午11:55写道:
> ceph version 14.2.4 (75f4de193b3ea58512f204623e6c5a16e6c1e1ba)
Hello,
Has the resolution for this issue been released in Nautilus?
I'm still experiencing this on 14.2.9 though I noticed the PR
(https://github.com/ceph/ceph/pull/33978) seemed to be merged in.
Thanks!
-Garrett
___
ceph-users mailing list --
Hi Greg,
Yes, this was caused by a chain of event. As a cautionary tale, the main
ones were:
1) minor nautilus release upgrade, followed by a rolling node restart
script that mistakenly relied on "ceph -s" for cluster health info,
i.e. it didn't wait for the cluster to return to health
On Wed, Mar 25, 2020 at 5:19 AM Jake Grimmett wrote:
>
> Dear All,
>
> We are "in a bit of a pickle"...
>
> No reply to my message (23/03/2020), subject "OSD: FAILED
> ceph_assert(clone_size.count(clone))"
>
> So I'm presuming it's not possible to recover the crashed OSD
From your later email
Hi Eugen,
Many thanks for your reply.
The other two OSD's are up and running, and being used by other pgs with
no problem, for some reason this pg refuses to use these OSD's.
The other two OSDs that are missing from this pg crashed at different
times last month, each OSD crashed when we
Hi,
is there any chance to recover the other failing OSDs that seem to
have one chunk of this PG? Do the other OSDs fail with the same error?
Zitat von Jake Grimmett :
Dear All,
We are "in a bit of a pickle"...
No reply to my message (23/03/2020), subject "OSD: FAILED
yes, this is a regression issue with the new version:
https://tracker.ceph.com/issues/44614
On Thu, Mar 12, 2020 at 8:44 PM 曹 海旺 wrote:
> I think it is a bug . I reinstall the cluster . The response of create
> topic still 405 .methodnotallowed, anynoe konw why? Thank you very much !
>
>
I think it is a bug . I reinstall the cluster . The response of create topic
still 405 .methodnotallowed, anynoe konw why? Thank you very much !
2020年3月12日 下午6:53,曹 海旺 mailto:caohaiw...@hotmail.com>>
写道:
Hi,
I upgrade the ceph from 14.2.7 to the new version 14.2.8 . The bucket
my cluster health status went to warning mode only after running mkdir of
1000's of folders with multiple subdirectories. if this has made OSD crash
does it really takes that long to heal empty directories.
On Fri, Aug 30, 2019 at 3:12 PM Janne Johansson wrote:
> Den fre 30 aug. 2019 kl 10:49
Den fre 30 aug. 2019 kl 10:49 skrev Amudhan P :
> After leaving 12 hours time now cluster status is healthy, but why did it
> take such a long time for backfill?
> How do I fine-tune? if in case of same kind error pop-out again.
>
> The backfilling is taking a while because max_backfills = 1 and
After leaving 12 hours time now cluster status is healthy, but why did it
take such a long time for backfill?
How do I fine-tune? if in case of same kind error pop-out again.
On Thu, Aug 29, 2019 at 6:52 PM Caspar Smit wrote:
> Hi,
>
> This output doesn't show anything 'wrong' with the
Hi,
This output doesn't show anything 'wrong' with the cluster. It's just still
recovering (backfilling) from what seems like one of your OSD's crashed and
restarted.
The backfilling is taking a while because max_backfills = 1 and you only
have 3 OSD's total so the backfilling per PG has to have
Hi,
ceph uses a pseudo random distribution within crush to select the target
hosts. As a result, the algorithm might not be able to select three
different hosts out of three hosts in the configured number of tries.
The affected PGs will be shown as undersized and only list two OSDs
instead
output from "ceph osd pool ls detail"
pool 1 'cephfs_data' replicated size 3 min_size 2 crush_rule 0 object_hash
rjenkins pg_num 32 pgp_num 32 last_change 74 lfor 0/64 flags hashpspool
stripe_width 0 application cephfs
pool 2 'cephfs_metadata' replicated size 3 min_size 2 crush_rule 0
1 - 100 of 102 matches
Mail list logo