If you ask me or Joachim, we'll tell you to disable autoscaler. ;-) It
doesn't seem mature enough yet, especially with many pools. There have
been multiple threads in the past discussing this topic, I'd suggest
to leave it disabled. Or you could help improving it, maybe create a
tracker
.
Eugen Block , 25 Oca 2024 Per, 19:06 tarihinde şunu yazdı:
I understand that your MDS shows a high CPU usage, but other than that
what is your performance issue? Do users complain? Do some operations
take longer than expected? Are OSDs saturated during those phases?
Because the cache pressure
the dashboard right now, that I will definitely do tomorrow.
Good night!
Zitat von Eugen Block :
Yeah, it's mentioned in the upgrade docs [2]:
Monitoring & Alerting
Ceph-exporter: Now the performance metrics for Ceph daemons
are exported by ceph-exporter, which deploys on each
ttps://docs.ceph.com/en/latest/releases/reef/#major-changes-from-quincy
Zitat von Eugen Block :
Hi,
I got those metrics back after setting:
reef01:~ # ceph config set mgr mgr/prometheus/exclude_perf_counters false
reef01:~ # curl http://localhost:9283/metrics | grep ceph_osd_op | head
% T
Hi,
I got those metrics back after setting:
reef01:~ # ceph config set mgr mgr/prometheus/exclude_perf_counters false
reef01:~ # curl http://localhost:9283/metrics | grep ceph_osd_op | head
% Total% Received % Xferd Average Speed TimeTime
Time Current
GiB 531 GiB 1.1 GiB 3.3
GiB 1.3 TiB 28.76 0.93 53 up osd.79
TOTAL 146 TiB 45 TiB 44 TiB 119 GiB 333
GiB 101 TiB 30.81
MIN/MAX VAR: 0.91/1.08 STDDEV: 1.90
Eugen Block , 25 Oca 2024 Per, 16:52 tarihinde şunu yazdı:
Ther
There is no definitive answer wrt mds tuning. As it is everywhere
mentioned, it's about finding the right setup for your specific
workload. If you can synthesize your workload (maybe scale down a bit)
try optimizing it in a test cluster without interrupting your
developers too much.
But
ttr layout/parent on the data pool).
Cordialement,
*David CASIER*
Le jeu. 25 janv. 2024 à 13:03, Eugen Block a écrit :
I'm not sure if using EC as default data pool for cephf
8
CC'ing Zac here to hopefully clear that up.
Zitat von "David C." :
Albert,
Never used EC for (root) data pool.
Le jeu. 25 janv. 2024 à 12:08, Albert Shih a écrit :
Le 25/01/2024 à 08:42:19+0000, Eugen Block a écrit
> Hi,
>
> it's really as easy as it sounds (fresh tes
(and pgp_num) while you create a pool:
ceph:~ # ceph osd pool create testpool 16 16 replicated
pool 'testpool' created
Zitat von Albert Shih :
Le 25/01/2024 à 08:42:19+, Eugen Block a écrit
Hi,
it's really as easy as it sounds (fresh test cluster on 18.2.1 without any
pools yet):
ceph:~ # ceph fs
Hi,
it's really as easy as it sounds (fresh test cluster on 18.2.1 without
any pools yet):
ceph:~ # ceph fs volume create cephfs
(wait a minute or two)
ceph:~ # ceph fs status
cephfs - 0 clients
==
RANK STATE MDS ACTIVITY DNSINOS
DIRS CAPS
0
t 05:52, Ilya
Dryomov < wrote:
On Wed, Jan 24, 2024 at 7:31 PM Eugen Block wrote:
We do like the separation of nova pools as well, and we also heavily
use ephemeral disks instead of boot-from-volume instances. One of the
reasons being that you can't detach a root volume from an insta
We do like the separation of nova pools as well, and we also heavily
use ephemeral disks instead of boot-from-volume instances. One of the
reasons being that you can't detach a root volume from an instances.
It helps in specific maintenance cases, so +1 for keeping it in the
docs.
Zitat
Hi,
this topic pops up every now and then, and although I don't have
definitive proof for my assumptions I still stand with them. ;-)
As the docs [2] already state, it's expected that PGs become degraded
after some sort of failure (setting an OSD "out" falls into that
category IMO):
It
Hi,
ceph.conf is not used anymore the way it was before cephadm. Just add
the config to the config store (see my previous example) and it should
be applied to all OSDs.
Regards
Eugen
Zitat von Alam Mohammad :
Hi Eugen,
We are planning to build a cluster with an erasure-coded (EC) pool
Oh that does sound strange indeed. I don't have a good idea right now,
hopefully someone from the dev team can shed some light on this.
Zitat von Robert Sander :
Hi,
more strang behaviour:
When I isssue "ceph mgr fail" a backup MGR takes over and updates
all config files on all hosts
Hi,
I checked two production clusters which don't use RGW too heavily,
both on Pacific though. There's no latency increase visible there. How
is the data growth in your cluster? Is the pool size rather stable or
is it constantly growing?
Thanks,
Eugen
Zitat von Roman Pashin :
Hello
Hi,
I checked the behaviour on Octopus, Pacific and Quincy, I can confirm.
I don't have the time to dig deeper right now, but I'd suggest to open
a tracker issue.
Thanks,
Eugen
Zitat von Jan Kasprzak :
Hello, Ceph users,
what is the correct location of keyring for ceph-crash?
I tried
-conf
Zitat von Robert Sander :
Hi,
On 1/18/24 14:07, Eugen Block wrote:
I just tired that in my test cluster, removed the ceph.conf and
admin keyring from /etc/ceph and then added the _admin label to the
host via 'ceph orch' and both were created immediately.
This is strange, I only get
Hi,
I just tired that in my test cluster, removed the ceph.conf and admin
keyring from /etc/ceph and then added the _admin label to the host via
'ceph orch' and both were created immediately.
# no label for quincy-2
quincy-1:~ # ceph orch host ls
HOST ADDR LABELS STATUS
Oh, I missed that line with the mclock profile, sorry.
Zitat von Eugen Block :
Hi,
what is your current mclock profile? The default is "balanced":
quincy-1:~ # ceph config get osd osd_mclock_profile
balanced
You could try setting it to high_recovery_ops [1], or disable it
allt
Hi,
what is your current mclock profile? The default is "balanced":
quincy-1:~ # ceph config get osd osd_mclock_profile
balanced
You could try setting it to high_recovery_ops [1], or disable it
alltogether [2]:
quincy-1:~ # ceph config set osd osd_op_queue wpq
[1]
I'm glad to hear (or read) that it worked for you as well. :-)
Zitat von Torkil Svensgaard :
On 18/01/2024 09:30, Eugen Block wrote:
Hi,
[ceph: root@lazy /]# ceph-conf --show-config | egrep
osd_max_pg_per_osd_hard_ratio
osd_max_pg_per_osd_hard_ratio = 3.00
I don't think
depth have you tried?
Zitat von Torkil Svensgaard :
On 18/01/2024 07:48, Eugen Block wrote:
Hi,
-3281> 2024-01-17T14:57:54.611+ 7f2c6f7ef540 0 osd.431
2154828 load_pgs opened 750 pgs <---
I'd say that's close enough to what I suspected. ;-) Not sure why
the "maybe
Hi,
the mgr caps for profile-rbd were introduced in Nautilus:
The MGR now accepts profile rbd and profile rbd-read-only user caps.
These caps can be used to provide users access to MGR-based RBD
functionality such as rbd perf image iostat an rbd perf image iotop.
So if you don't need that
r_osd_hard_ratio.
Zitat von Torkil Svensgaard :
On 17-01-2024 22:20, Eugen Block wrote:
Hi,
Hi
this sounds a bit like a customer issue we had almost two years
ago. Basically, it was about mon_max_pg_per_osd (default 250) which
was exceeded during the first activating OSD (and the last
remain
Hi,
this sounds a bit like a customer issue we had almost two years ago.
Basically, it was about mon_max_pg_per_osd (default 250) which was
exceeded during the first activating OSD (and the last remaining
stopping OSD). You can read all the details in the lengthy thread [1].
But if this
ure the system
for this case.
I have 80+ clients and via all of these clients my users are requesting
Read a range of objects and compare them in GPU, they generate new data and
Write the new data back in the cluster.
So it means my clients usually reads objects only one time and do not read
the sa
Hi,
there have been a few threads with this topic, one of them is this one
[1]. The issue there was that different ceph container images were in
use. Can you check your container versions? If you don't configure a
global image for all ceph daemons, e.g.:
quincy-1:~ # ceph config set
Hi,
I don't really have an answer, I just wanted to mention that I created
a tracker issue [1] because I believe there's a bug in the LRC plugin.
But there hasn't been any response yet.
[1] https://tracker.ceph.com/issues/61861
Zitat von Ansgar Jazdzewski :
hi folks,
I currently test
Hi,
could you provide more details what exactly you tried and which
configs you set? Which compression mode are you running?
In a small Pacific test cluster I just set the mode to "force"
(default "none"):
storage:~ # ceph config set osd bluestore_compression_mode force
And then after a
Hi,
I have dealt with this topic multiple times, the SUSE team helped
understanding what's going on under the hood. The summary can be found
in this thread [1].
What helped in our case was to reduce the mds_recall_max_caps from 30k
(default) to 3k. We tried it in steps of 1k IIRC. So I
sd.1/config:/etc/ceph/ceph.conf:z -v
/dev:/dev -v /run/udev:/run/udev -v /sys:/sys -v /run/lvm:/run/lvm -v
/run/lock/lvm:/run/lock/lvm -v /:/rootfs -v /etc/hosts:/etc/hosts:ro
quay.io/ceph/ceph:v18.2.1 activ
ate --osd-id 1 --osd-uuid cdd02721-6876-4db8-bdb2-12ac6c70127c --no-systemd
--no-tmpfs
/usr/bin/pod
Hi,
I don't really have any solution, but it appears to require rwx
permissions at least for the rgw tag:
caps osd = "allow rwx tag rgw *=*
This was the only way I got the radosgw-admin commands to work in my
limited test attempts. Maybe someone else has more insights. My
interpretation
Hi,
I don't really have any advice but I'm curious how the LV tags look
like (lvs -o lv_tags). Do they point to the correct LVs for the
block.db? Does the 'ceph osd metadata ' show anything weird? Is
there something useful in the ceph-volume.log
(/var/log/ceph/{FSID}/ceph-volume.log)?
Hi,
I don't use rook but I haven't seen this issue yet in any of my test
clusters (from octopus to reef). Althouth I don't redeploy OSDs all
the time, I do set up fresh (single-node) clusters once or twice a
week with different releases without any ceph-volume issues. Just to
confirm I
n the logs but the osds were down all the time. Actually I was looking for
a fast fail procedure for these kind of situation cause any manual action
would take time and can causes major incidents.
Best Regards,
Mahnoosh
On Mon, 8 Jan 2024, 11:47 Eugen Block, wrote:
Hi,
just to get a better und
Hi,
you probably have empty OSD nodes in your crush tree. Can you send the
output of 'ceph osd tree'?
Thanks,
Eugen
Zitat von Jan Kasprzak :
Hello, Ceph users!
I have recently noticed that when I reboot a single ceph node,
ceph -s reports "5 hosts down" instead of one. The following
is
Hi,
just to get a better understanding, when you write
Although the OSDs were correctly marked as down in the monitor, slow
ops persisted until we resolved the network issue.
do you mean that the MONs marked the OSDs as down (temporarily) or did
you do that? Because if the OSDs "flap"
Hi,
I just did the same in my lab environment and the config got applied
to the daemon after a restart:
pacific:~ # ceph tell osd.0 config show | grep
bluestore_volume_selection_policy
"bluestore_volume_selection_policy": "rocksdb_original",
This is also a (tiny single-node) cluster
Hi,
we need more information about your cluster (ceph osd tree) and the
applied crush rule for this pool. What ceph version is this?
Regards,
Eugen
Zitat von Phong Tran Thanh :
Hi community.
I' am running ceph cluster with 10 node and 180 osds, and i create an pool
erasure code 4+2 with
Hi,
you can skip Quincy (17.2.X) entirely, Ceph supports upgrading over
two versions. Check out the upgrade docs [1] for more details.
It also shouldn't be necessary to upgrade to the latest Pacific first,
and you can go directly to latest Reef (18.2.1).
Regards,
Eugen
[1]
Hi,
just omit the ".*" from your command:
ceph config set osd osd_deep_scrub_interval 1209600
The asterisk (*) can be used for the 'ceph tell command'. Check out
the docs [1] for more infos about the runtime configuration.
Regards,
Eugen
[1]
Hi,
in such a setup I also prefer option 2, we've done this since lvm came
into play with OSDs, just not with cephadm yet. But we have a similar
configuration and one OSD starts to fail as well. I'm just waiting for
the replacement drive to arrive. ;-)
Regards,
Eugen
Zitat von "Robert
Thanks! Nevertheless, IMO the "Reef" branch should also contain the
latest Reef release notes (18.2.1).
Zitat von Robert Sander :
Hi Eugen,
the release info is current only in the latest branch of the
documentation: https://docs.ceph.com/en/latest/releases/
Regards
--
Robert Sander
Hello and a happy new year!
I'm wondering if there are some structural changes or something
regarding the release page [1]. It still doesn't contain version
18.2.1 (Reef) and the latest two Quincy releases (17.2.6, 17.2.7) are
missing as well. And for Pacific it's even worse, the latest
Hi,
I'm not entirely sure if it's the same thing I debugged a few weeks
ago, but I found an error in the frontent parsing for the rgw_client
which was fixed with 18.2.1. I suggest to upgrade and see if it's
resolved.
Regards,
Eugen
Zitat von Владимир Клеусов :
Hi,
When trying to log
Hi,
you can switch back to the old dashboard by issuing:
ceph dashboard feature disable dashboard
This will bring back the previous landing page where you can select
your desired refresh interval.
Regards,
Eugen
Zitat von Alam Mohammad :
Hello,
We've been using Ceph for managing our
Hi,
I'm not sure for how long your iscsi gateways will work as it has been
deprecated [1]:
The iSCSI gateway is in maintenance as of November 2022. This means
that it is no longer in active development and will not be updated
to add new features.
Some more information were provided in
That's good to know, I have the same in mind for one of the clusters
but didn't have the time to test it yet.
Zitat von Robert Sander :
On 21.12.23 22:27, Anthony D'Atri wrote:
It's been claimed to me that almost nobody uses podman in
production, but I have no empirical data.
I even
Just to add a bit more information, the 'ceph daemon' command is still
valid, it just has to be issued inside of the containers:
quincy-1:~ # cephadm enter --name osd.0
Inferring fsid 1e6e5cb6-73e8-11ee-b195-fa163ee43e22
[ceph: root@quincy-1 /]# ceph daemon osd.0 config diff | head
{
Right, that makes sense.
Zitat von Matthew Vernon :
On 19/12/2023 06:37, Eugen Block wrote:
Hi,
I thought the fix for that would have made it into 18.2.1. It was
marked as resolved two months ago
(https://tracker.ceph.com/issues/63150,
https://github.com/ceph/ceph/pull/53922
Hi,
first, I'd recommend to use drivegroups [1] to apply OSD
specifications to entire hosts instead of manually adding an OSD
daemon. If you run 'ceph orch daemon add osd hostname:/dev/nvme0n1'
then the OSD is already fully deployed, meaning wal, db and data
device are all on the same
Hi,
I don't have an answer for the SNMP part, I guess you could just bring
up your own snmp daemon and configure it to your needs. As for the
orchestrator backend you have these three options (I don't know what
"test_orchestrator" does but it doesn't sound like it should be used
in
Hi,
I thought the fix for that would have made it into 18.2.1. It was
marked as resolved two months ago
(https://tracker.ceph.com/issues/63150,
https://github.com/ceph/ceph/pull/53922).
Zitat von "Robert W. Eckert" :
Hi- I tried to start the upgrade using
ceph orch upgrade
The option ‚—image‘ is to be used for the cephadm command, not the
bootstrap command. So it should be like this:
cephadm —image bootstrap…
This is also covered by the link I provided (isolated environment).
Zitat von farhad kh :
hi,thank you for guidance
There is no ability to change the
Hi,
you don't need to change the cephadm file to point to your registry,
just set the respective config value:
ceph config set global container_image
or per daemon:
ceph config set mon container_image
ceph config set rgw container_image
which usually wouldn't be necessary since it's
Ah of course, thanks for pointing that out, I somehow didn't think of
the remaining clones.
Thanks a lot!
Zitat von Ilya Dryomov :
On Fri, Dec 15, 2023 at 12:52 PM Eugen Block wrote:
Hi,
I've been searching and trying things but to no avail yet.
This is uncritical because it's a test
Hi,
I've been searching and trying things but to no avail yet.
This is uncritical because it's a test cluster only, but I'd still
like to have a solution in case this somehow will make it into our
production clusters.
It's an Openstack Victoria Cloud with Ceph backend. If one tries to
Your disk is most likely slowly failing, you should check smart values
and dmesg output for read/write errors and probably replace the disk.
Zitat von zxcs :
Also osd frequently report these ERROR logs, lead this osd has slow
request. how to stop these log ?
“full object read crc *** !=
2023-12-08T15:35:57.691+0200 7f331621e0c0 -1 Missing object 500.0dc6
Overall journal integrity: DAMAGED
Objects missing:
0xdc6
Corrupt regions:
0x3718522e9-
---snip---
Zitat von Patrick Donnelly :
On Mon, Dec 11, 2023 at 6:38 AM Eugen Block wrote:
Hi,
I'm trying
Can you restart the primary MDS (not sure which one it currently is,
should be visible from the mds daemon log) and see if this resolves at
least temporarily? Because after we recovered the cluster and cephfs
we did have output in 'ceph fs status' and I can't remember seeing
these error
--rank=cephfs:0 --journal=purge_queue journal reset
Zitat von Eugen Block :
Hi,
I'm trying to help someone with a broken CephFS. We managed to
recover basic ceph functionality but the CephFS is still
inaccessible (currently read-only). We went through the disaster
recovery steps
in, Mykola!
Eugen
Zitat von Eugen Block :
So we did walk through the advanced recovery page but didn't really
succeed. The CephFS is still going to readonly because of the
purge_queue error. Is there any chance to recover from that or
should we try to recover with an empty metadata pool next?
Hi,
I'm trying to help someone with a broken CephFS. We managed to recover
basic ceph functionality but the CephFS is still inaccessible
(currently read-only). We went through the disaster recovery steps but
to no avail. Here's a snippet from the startup logs:
---snip---
mds.0.41
. ;-)
Zitat von Eugen Block :
Some more information on the damaged CephFS, apparently the journal
is damaged:
---snip---
# cephfs-journal-tool --rank=storage:0 --journal=mdlog journal inspect
2023-12-08T15:35:22.922+0200 7f834d0320c0 -1 Missing object 200.000527c4
2023-12-08T15:35:22.938+0200
-recovery-experts
Zitat von Eugen Block :
I was able to (almost) reproduce the issue in a (Pacific) test
cluster. I rebuilt the monmap from the OSDs, brought everything back
up, started the mds recovery like described in [1]:
ceph fs new--force --recover
Then I added two mds daemons which
n't necessary,
but are they in this case? I'm still trying to find an explanation for
the purge_queue errors.
Zitat von Eugen Block :
Hi,
following up on the previous thread (After hardware failure tried to
recover ceph and followed instructions for recovery using OSDS), we
were able to
Hi,
following up on the previous thread (After hardware failure tried to
recover ceph and followed instructions for recovery using OSDS), we
were able to get ceph back into a healthy state (including the unfound
object). Now the CephFS needs to be recovered and I'm having trouble
to
Hi, did you unmount your clients after the cluster poweroff? You could
also enable debug logs in mds to see more information. Are there any
blocked requests? You can query the mds daemon via cephadm shell or
with ad admin keyring like this:
# ceph tell mds.cephfs.storage.lgmyqv
ool(s) nearfull
pool '.mgr' is nearfull
pool 'cephfs.storage.meta' is nearfull
pool 'cephfs.storage.data' is nearfull
Any ideas ?
Thanks,
Manolis Daramas
-Original Message-
From: Eugen Block
Sent: Tuesday, November 21, 2023 1:10 PM
To: ceph-users@ceph.io
Subject: [ceph-users
And the second issue is with k4 m2 you'll have min_size = 5 which
means if one host is down your PGs become inactive, which is what you
most likely experienced.
Zitat von David Rivera :
First problem here is you are using crush-failure-domain=osd when you
should use
Hi,
I'm not familiar with Cloudstack, I was just wondering if it tries to
query the pool "rbd"? Some tools refer to a default pool "rbd" if no
pool is specified. Do you have an "rbd" pool in that cluster?
Another thought are namespaces, do you have those defined? Can you
increase the debug
Hi,
I can't comment on why #3 shouldn't be used, but a quick test shows
that the image is not really usable in that case. I created a
partition on the src-image (1 GB), filled it up with around 500 MB of
data and then did the same export you did:
rbd export --export-format 2 src-image -
Maybe I misunderstand, but isn’t ’rbd du‘ what you're looking for?
Zitat von Tony Liu :
Hi,
Other than get all objects of the pool and filter by image ID,
is there any easier way to get the number of allocated objects for
a RBD image?
What I really want to know is the actual usage of an
Hi,
I don't have an idea yet why that happens, but could you increase the
debug level to see why it stops? What is the current ceph status?
Thanks,
Eugen
Zitat von Denis Polom :
Hi
running Ceph Pacific 16.2.13.
we had full CephFS filesystem and after adding new HW we tried to
start it
Hi,
basically, with EC pools you usually have a min_size of k + 1 to
prevent data loss. There was a thread about that just a few days ago
on this list. So in your case your min_size is probably 9, which makes
IO pause in case two chunks become unavailable. If your crush failure
domain is
se numbers you have seem relatively small.
Zitat von Zakhar Kirpichenko :
> I've disabled the progress module entirely and will see how it goes.
> Otherwise, mgr memory usage keeps increasing slowly, from past experience
> it will stabilize at around 1.5-1.6 GB. Other than this event warni
usage keeps increasing slowly, from past experience
it will stabilize at around 1.5-1.6 GB. Other than this event warning, it's
unclear what could have caused random memory ballooning.
/Z
On Wed, 22 Nov 2023 at 13:07, Eugen Block wrote:
I see these progress messages all the time, I don't think
Hi,
we've seen this a year ago in a Nautilus cluster with multi-active MDS
as well. It turned up only once within several years and we decided
not to look too closely at that time. How often do you see it? Is it
reproducable? In that case I'd recommend to create a tracker issue.
Regards,
t exist
I tried clearing them but they keep showing up. I am wondering if these
missing events can cause memory leaks over time.
/Z
On Wed, 22 Nov 2023 at 11:12, Eugen Block wrote:
Do you have the full stack trace? The pastebin only contains the
"tcmalloc: large alloc" messages (sa
:
Thanks, Eugen. It is similar in the sense that the mgr is getting
OOM-killed.
It started happening in our cluster after the upgrade to 16.2.14. We
haven't had this issue with earlier Pacific releases.
/Z
On Tue, 21 Nov 2023, 21:53 Eugen Block, wrote:
Just checking it on the phone, but isn’t
Just checking it on the phone, but isn’t this quite similar?
https://tracker.ceph.com/issues/45136
Zitat von Zakhar Kirpichenko :
Hi,
I'm facing a rather new issue with our Ceph cluster: from time to time
ceph-mgr on one of the two mgr nodes gets oom-killed after consuming over
100 GB RAM:
Hi,
were you able to resolve that situation in the meantime? If not, I
would probably try to 'umount -l' and see if that helps. If not, you
can check if the client is still blacklisted:
ceph osd blocklist ls (or blacklist)
If it's still blocklisted, you could try to remove it:
ceph osd
Hi,
I guess you could just redeploy the third MON which fails to start
(after the orchestrator is responding again) unless you figured it out
already. What is it logging?
1 osds exist in the crush map but not in the osdmap
This could be due to the input/output error, but it's just a
ot;bluefs_check_for_zeros": "false",
"bluefs_compact_log_sync": "false",
"bluefs_log_compact_min_ratio": "5.00",
"bluefs_log_compact_min_size": "16777216",
"bluefs_max_log_runway": "4194304",
Do you have a large block.db size defined in the ceph.conf (or config store)?
Zitat von Debian :
thx for your reply, it shows nothing,... there are no pgs on the osd,...
best regards
On 17.11.23 23:09, Eugen Block wrote:
After you create the OSD, run ‚ceph pg ls-by-osd {OSD}‘, it should
After you create the OSD, run ‚ceph pg ls-by-osd {OSD}‘, it should
show you which PGs are created there and then you’ll know which pool
they belong to, then check again the crush rule for that pool. You can
paste the outputs here.
Zitat von Debian :
Hi,
after a massive
to be sure:
quincy-1:~ # ceph mgr fail
quincy-1:~ # ceph config-key get mgr/dashboard/crt
Error ENOENT:
And then I configured the previous key, did a mgr fail and now the
dashboard is working again.
Zitat von Eugen Block :
Hi,
did you get your dashboard back in the meantime? I don't have
Hi,
did you get your dashboard back in the meantime? I don't have an
answer regarding the certificate based on elliptic curves but since
you wrote:
So we tried to go back to the original state by removing CRT anf KEY but
without success. The new key seems to stuck into the config
how
Hi,
can you share the auth caps for your k8s client?
ceph auth get client.
And maybe share the yaml files as well (redact sensitive data) so we
can get a full picture.
Zitat von Kushagr Gupta :
Hi Team,
Components:
Kubernetes, Ceph
Problem statement:
We are trying to integrate Ceph
Hi,
if you could share some more info about your cluster you might get a
better response. For example, 'ceph osd df tree' could be helpful to
get an impression how many PGs you currently have. You can inspect the
'ceph pg dump' output and look for the column "BYTES" which tells you
how
Hi,
I don't have a solution for you, I just wanted to make you aware of
this note in the docs:
Warning
The iSCSI gateway is in maintenance as of November 2022. This means
that it is no longer in active development and will not be updated
to add new features.
Here's some more
AM, Eugen Block wrote:
Hi,
AFAIU, you can’t migrate back to the slow device. It’s either
migrating from the slow device to a fast device or remove between
fast devices. I’m not aware that your scenario was considered in
that tool. The docs don’t specifically say that, but they also
don’t
No it’s not too late, it will take some time till we get there. So
thanks for the additional input, I am aware of the MON communication.
Zitat von Sake Ceph :
Don't forget with stretch mode, osds only communicate with mons in
the same DC and the tiebreaker only communicate with the other
Hi,
AFAIU, you can’t migrate back to the slow device. It’s either
migrating from the slow device to a fast device or remove between fast
devices. I’m not aware that your scenario was considered in that tool.
The docs don’t specifically say that, but they also don’t mention
going back to
re working on it or want to work on it to revert
from a stretched cluster, because of the reason you mention: if the
other datacenter is totally burned down, you maybe want for the time
being switch to one datacenter setup.
Best regards,
Sake
Op 09-11-2023 11:18 CET schreef Eugen Block :
Is this the same cluster as the one your reported down OSDs for? Can
you share the logs from before the "probing" status? You may have to
increase the log level to something like debug_mon = 20. But be
cautious and monitor the used disk space, it can increase quite a lot.
Did you have any
Hi,
can you share the following output:
ceph -s
ceph health detail
ceph versions
ceph osd df tree
ceph osd dump
I see this line in the logs:
check_osdmap_features require_osd_release unknown -> octopus
which makes me wonder if you really run a Nautilus cluster.
Are your OSDs saturated?
this thread related:
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/B7K6B5VXM3I7TODM4GRF3N7S254O5ETY/
Does it have something to do with mds_max_caps_per_client? Given the volume
contains 100K dir/files.
Thanks
Eugen Block 于2023年11月9日周四 19:02写道:
Do you see a high disk utilization
201 - 300 of 1344 matches
Mail list logo