El 2020-10-20 17:57, Ing. Luis Felipe Domínguez Vega escribió:
Hi, today mi Infra provider has a blackout, then the Ceph was try to
recover but are in an inconsistent state because many OSD can recover
itself because the kernel kill it by OOM. Even now one OSD that was
OK, go down by OOM killed.
On 10/21/20 6:47 AM, Frank Schilder wrote:
Hi Michael,
some quick thoughts.
That you can create a pool with 1 PG is a good sign, the crush rule is OK. That
pg query says it doesn't have PG 1.0 points in the right direction. There is an
inconsistency in the cluster. This is also indicated by
Peter;
Look into bucket sharding.
Thank you,
Dominic L. Hilsbos, MBA
Director – Information Technology
Perform Air International Inc.
dhils...@performair.com
www.PerformAir.com
From: Peter Eisch [mailto:peter.ei...@virginpulse.com]
Sent: Wednesday, October 21, 2020 12:39 PM
To: ceph-users@
As an example, here's the acting and up set of one of the PG's:
*up: 0: 1131: 1382: 303: 1324: 1055: 576: 1067: 1408: 161acting: 0: 721:
1502: 21474836473: 21474836474: 245: 486: 327: 1578: 103*
So obviously there's a lot of backfilling there... but it seems it's not
making an
We recently did some work on the Ceph cluster, and a few disks ended up
offline at the same time. There are now 6 PG's that are stuck in a
"remapped" state, and this is all of their recovery states:
*recovery_state: 0: name: Started/Primary/WaitActingChangeenter_time:
2020-10-21 18:48:02.0
Hi,
My rgw.buckets.index has the cluster in WARN. I'm either not understanding the
real issue or I'm making it worse, or both.
OMAP_BYTES: 70461524
OMAP_KEYS: 250874
I thought I'd head this off by deleting rgw objects which would normally get
deleted in the near future but this only seemed to
Hello,
I am struggling to integrate ceph radosgw as obejctstore in openstack swift via
keystone. Could someone please have a look at my configs and help finding the
issue?
Many thanks ins advance.
ceph version 14.2.11 nautilus (stable)
[root@ciosmon06 ~]# cat /etc/ceph/ceph.conf
[global]
fsid
El 2020-10-21 10:08, Mark Nelson escribió:
On 10/21/20 7:54 AM, Ing. Luis Felipe Domínguez Vega wrote:
El 2020-10-21 08:43, Mark Nelson escribió:
Theoretically we shouldn't be spiking memory as much these days
during
recovery, but the code is complicated and it's tough to reproduce
these kinds
There have been threads on exactly this. Might depend a bit on your ceph
version. We are running mimic and have no issues doing:
- set noout, norebalance, nobackfill
- add all OSDs (with weight 1)
- wait for peering to complete
- unset all flags and let the rebalance loose
Starting with nautilus
Hi Michael,
some quick thoughts.
That you can create a pool with 1 PG is a good sign, the crush rule is OK. That
pg query says it doesn't have PG 1.0 points in the right direction. There is an
inconsistency in the cluster. This is also indicated by the fact that no upmaps
seem to exist (the c
We are performing file maintenance( deletes essentially ) and when the
process gets to a certain point, all four rados gateways crash with the
following:
Log output:
-5> 2020-10-20 06:09:53.996 7f15f1543700 2 req 7 0.000s s3:delete_obj
verifying op params
-4> 2020-10-20 06:09:53.996 7
Hi all,
There is a huge difference between node exporter and ceph exporter
(prometheus mgr module) data. For example I see there is a 120MB/s write on
my disk from node exporter but ceph exporter says it is 22MB! Also for
latency and IOPS and...
Which one is reliable?
Thanks.
___
El 2020-10-20 17:57, Ing. Luis Felipe Domínguez Vega escribió:
Hi, today mi Infra provider has a blackout, then the Ceph was try to
recover but are in an inconsistent state because many OSD can recover
itself because the kernel kill it by OOM. Even now one OSD that was
OK, go down by OOM killed.
Theoretically we shouldn't be spiking memory as much these days during
recovery, but the code is complicated and it's tough to reproduce these
kinds of issues in-house. If you happen to catch it in the act, do you
see the pglog mempool stats also spiking up?
Mark
On 10/21/20 2:34 AM, Dan v
Hi Mac
We've also tweaked
osd-recovery-max-single-start => 2
osd-recovery-sleep-hdd => 0.05
to speed things up.
On 2020-10-20 16:04, Mac Wynkoop wrote:
OK, so for interventions, I've pushed these configs out:
ceph config set mon.* target_max_misplaced_ratio 0.05 > 0.20
ceph config get o
The best F/OSS conference in the southern hemisphere is back again,
virtualized, January 23-25. The CFP is open until November 6. Submit
early, submit often! ;-)
Forwarded Message
Subject: [lca-announce] linux.conf.au 2021 - Call for Sessions and
Miniconfs Open
Date: Thu, 15 O
Hi,
There are many dprintk call in crush/mapper.c and crush/builder.c,I want to
debug crush algorithm.
How I to see output of dprintk?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
Hi,
You can make use of the upmap so you do not need to rebalance the entire
crush map every time you change the weight.
https://docs.ceph.com/en/latest/rados/operations/upmap/
Hope it helps,
Ansgar
Kristof Coucke schrieb am Mi., 21. Okt. 2020,
13:29:
> Hi,
>
> I have a cluster with 182 OS
Hi,
I have a cluster with 182 OSDs, this has been expanded towards 282 OSDs.
Some disks were near full.
The new disks have been added with initial weight = 0.
The original plan was to increase this slowly towards their full weight
using the gentle reweight script. However, this is going way too sl
On 2020-10-20 23:57, Ing. Luis Felipe Domínguez Vega wrote:
> Hi, today mi Infra provider has a blackout, then the Ceph was try to
> recover but are in an inconsistent state because many OSD can recover
> itself because the kernel kill it by OOM. Even now one OSD that was OK,
> go down by OOM kille
Hi,
This might be the pglog issue which has been coming up a few times on the list.
If the OSD cannot boot without going OOM, you might have success by
trimming the pglog, e.g. search this list for "ceph-objectstore-tool
--op trim-pg-log" for some recipes. The thread "OSDs taking too much
memory,
El 2020-10-20 23:17, Anthony D'Atri escribió:
On Oct 20, 2020, at 6:23 PM, Ing. Luis Felipe Domínguez Vega
wrote:
El 2020-10-20 19:33, Anthony D'Atri escribió:
You have a *lot* of peering and recovery going on.
Write a script that monitors available memory on the system and
restarts the OSD p
22 matches
Mail list logo