Hi all,
The single active MDS on one of our Ceph clusters is close to running out of
RAM.
MDS total system RAM = 528GB
MDS current free system RAM = 4GB
mds_cache_memory_limit = 451GB
current mds cache usage = 426GB
Presumably we need to reduce our mds_cache_memory_limit and/or
On 5/27/20 8:43 PM, Andreas Schiefer wrote:
if I understand correctly:
if we upgrade from an running nautilus cluster to octopus we have a
downtime on an update of MDS.
Is this correct?
This is always when upgrade major or minor version for MDS. It's hang
for restart, actually clients
FYI. Hope to see some awesome CephFS submissions for our virtual IO500 BoF!
Thanks,
John
-- Forwarded message -
From: committee--- via IO-500
Date: Fri, May 22, 2020 at 1:53 PM
Subject: [IO-500] IO500 ISC20 Call for Submission
To:
*Deadline*: 08 June 2020 AoE
The IO500
I noticed the luks volumes were open, even though luksOpen hung. I killed
cryptsetup (once per disk) and ceph-volume continued and eventually created the
osd's for the host (yes, this node will be slated for another reinstall when
cephadm is stabilized).
Is there a way to remove an osd service
Hello,
if I understand correctly:
if we upgrade from an running nautilus cluster to octopus we have a
downtime on an update of MDS.
Is this correct?
Mit freundlichen Grüßen / Kind regards
Andreas Schiefer
Leiter Systemadministration / Head of systemadministration
---
HOME OF LOYALTY
CRM-
Hi, trying to migrate a second ceph cluster to Cephadm. All the host
successfully migrated from "legacy" except one of the OSD hosts (cephadm kept
duplicating osd ids e.g. two "osd.5", still not sure why). To make things
easier, we re-provisioned the node (reinstalled from netinstall, applied
On Wed, May 27, 2020 at 5:28 AM Daniel Aberger - Profihost AG <
d.aber...@profihost.ag> wrote:
> Hi,
>
> (un)fortunately I can't test it because I managed to repair the pg.
>
> snaptrim and snaptrim_wait have been a part of this particular pg's
> status. As I was trying to look deeper into the
Hi,
I'm not sure if the repair waits for snaptrim; but it does need a
scrub reservation on all the related OSDs, hence our script. And I've
also observed that the repair req isn't queued up -- if the OSDs are
busy with other scrubs, the repair req is forgotten.
-- Dan
On Wed, May 27, 2020 at
Hallo Dan, all.
My attempt with ceph-bluestore-tool did not lead to a working OSD.
So I decided to re-create all OSDs, as they were quite many and my
cluster was rather unbalanced.
Too bad I could not get any insight as to what caused the issue on the
OSDs for object storage: however, I will
Common problem for FileStore and really no point in debugging this: upgrade
everything to a recent version and migrate to BlueStore.
99% of random latency spikes are just fixed by doing that.
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit
Hi,
since this bug may lead to data loss when several OSDs crash at the same
time (e.g., after a power outage): can we pull the release from the mirrors
and docker hub?
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH
Freseniusstr. 31h
Hi,
We experienced random and relative high latency spikes (around 0.5-10 sec)
in our ceph cluster which consists 6 osd nodes, all osd nodes have 6 osd-s.
One osd built with one spinning disk and two nvme device.
We use a bcache device for osd back end (mixed with hdd and an nvme
partition as
hi there -
On 5/19/20 3:11 PM, thoralf schulze wrote:
> […] and report back …
i tried to reproduce the issue with osds each using 37gb of ssd storage
for db and wal. everything went fine - so yes, spillovers are to be avoided.
thank you very much & with kind regards,
thoralf.
signature.asc
Anyone can share their table with other MTU values?
Also interested into Switch CPU load
KR,
Manuel
-Mensaje original-
De: Marc Roos
Enviado el: miércoles, 27 de mayo de 2020 12:01
Para: chris.palmer ; paul.emmerich
CC: amudhan83 ; anthony.datri ;
ceph-users ; doustar ; kdhall
;
Interesting table. I have this on a production cluster 10gbit at a
datacenter (obviously doing not that much).
[@]# iperf3 -c 10.0.0.13 -P 1 -M 9000
Connecting to host 10.0.0.13, port 5201
[ 4] local 10.0.0.14 port 52788 connected to 10.0.0.13 port 5201
[ ID] Interval Transfer
Hi,
(un)fortunately I can't test it because I managed to repair the pg.
snaptrim and snaptrim_wait have been a part of this particular pg's
status. As I was trying to look deeper into the case I had a watch on
ceph health detail and noticed that snaptrim/snaptrim_wait was suddenly
not a part of
To elaborate on some aspects that have been mentioned already and add
some others::
* Test using iperf3.
* Don't try to use jumbos on networks where you don't have complete
control over every host. This usually includes the main ceph
network. It's just too much grief. You can consider
I would not call a ceph page, a random tuning tip. At least I hope they
are not. NVMe-only with 100Gbit is not really a standard setup. I assume
with such setup you have the luxury to not notice many optimizations.
What I mostly read is that changing to mtu 9000 will allow you to better
Awesome, thanks!
Martin Verges 于2020年5月27日周三 下午2:04写道:
> Hello,
>
> as I find it a good idea and couldn't find another, I just created
> https://t.me/ceph_users.
> Please feel free to join and let's see to get this channel startet ;)
>
> --
> Martin Verges
> Managing director
>
> Mobile: +49
Hello,
as I find it a good idea and couldn't find another, I just created
https://t.me/ceph_users.
Please feel free to join and let's see to get this channel startet ;)
--
Martin Verges
Managing director
Mobile: +49 174 9335695
E-Mail: martin.ver...@croit.io
Chat: https://t.me/MartinVerges
20 matches
Mail list logo