Re: [ceph-users] problem returning mon back to cluster

2019-10-15 Thread Harald Staub
On 14.10.19 16:31, Nikola Ciprich wrote: On Mon, Oct 14, 2019 at 01:40:19PM +0200, Harald Staub wrote: Probably same problem here. When I try to add another MON, "ceph health" becomes mostly unresponsive. One of the existing ceph-mon processes uses 100% CPU for several minutes. Tri

Re: [ceph-users] problem returning mon back to cluster

2019-10-14 Thread Harald Staub
Probably same problem here. When I try to add another MON, "ceph health" becomes mostly unresponsive. One of the existing ceph-mon processes uses 100% CPU for several minutes. Tried it on 2 test clusters (14.2.4, 3 MONs, 5 storage nodes with around 2 hdd osds each). To avoid errors like "lease

Re: [ceph-users] Adventures with large RGW buckets

2019-08-02 Thread Harald Staub
Right now our main focus is on the Veeam use case (VMWare backup), used with an S3 storage tier. Currently we host a bucket with 125M objects and one with 100M objects. As Paul stated, searching common prefixes can be painful. We had some cases that did not work (taking too much time, radosgw

Re: [ceph-users] memory usage of: radosgw-admin bucket rm

2019-07-11 Thread Harald Staub
compaction.) Matt On Tue, Jul 9, 2019 at 7:12 AM Harald Staub wrote: Currently removing a bucket with a lot of objects: radosgw-admin bucket rm --bucket=$BUCKET --bypass-gc --purge-objects This process was killed by the out-of-memory killer. Then looking at the graphs, we see a continuous

[ceph-users] memory usage of: radosgw-admin bucket rm

2019-07-09 Thread Harald Staub
Currently removing a bucket with a lot of objects: radosgw-admin bucket rm --bucket=$BUCKET --bypass-gc --purge-objects This process was killed by the out-of-memory killer. Then looking at the graphs, we see a continuous increase of memory usage for this process, about +24 GB per day. Removal

[ceph-users] Even more objects in a single bucket?

2019-06-17 Thread Harald Staub
There are customers asking for 500 million objects in a single object storage bucket (i.e. 5000 shards), but also more. But we found some places that say that there is a limit in the number of shards per bucket, e.g.

Re: [ceph-users] rocksdb corruption, stale pg, rebuild bucket index

2019-06-17 Thread Harald Staub
free space for the compaction after the large omaps were removed? -- dan On Mon, Jun 17, 2019 at 11:14 AM Harald Staub wrote: We received the large omap warning before, but for some reasons we could not react quickly. We accepted the risk of the bucket becoming slow, but had not thought of fur

Re: [ceph-users] rocksdb corruption, stale pg, rebuild bucket index

2019-06-17 Thread Harald Staub
to the beginning -- is it clear to anyone what was the root cause and how other users can avoid this from happening? Maybe some better default configs to warn users earlier about too-large omaps? Cheers, Dan On Thu, Jun 13, 2019 at 7:36 PM Harald Staub wrote: Looks fine (at least so far), thank you all

Re: [ceph-users] rocksdb corruption, stale pg, rebuild bucket index

2019-06-13 Thread Harald Staub
gone after deep-scrubbing the PG. Then we set the 3 OSDs to out. Soon after, one after the other was down (maybe for 2 minutes) and we got degraded PGs, but only once. Thank you! Harry On 13.06.19 16:14, Sage Weil wrote: On Thu, 13 Jun 2019, Harald Staub wrote: On 13.06.19 15:52, Sage

Re: [ceph-users] rocksdb corruption, stale pg, rebuild bucket index

2019-06-13 Thread Harald Staub
On 13.06.19 15:52, Sage Weil wrote: On Thu, 13 Jun 2019, Harald Staub wrote: [...] I think that increasing the various suicide timeout options will allow it to stay up long enough to clean up the ginormous objects: ceph config set osd.NNN osd_op_thread_suicide_timeout 2h ok It looks

Re: [ceph-users] rocksdb corruption, stale pg, rebuild bucket index

2019-06-13 Thread Harald Staub
ot;somehow"? In case of success, we would bring back the other OSDs as well? OTOH we could try to continue with the key dump from earlier today. Any opinions? Thanks! Harry On 13.06.19 09:32, Harald Staub wrote: On 13.06.19 00:33, Sage Weil wrote: [...] One other thing to try before ta

Re: [ceph-users] rocksdb corruption, stale pg, rebuild bucket index

2019-06-13 Thread Harald Staub
On 13.06.19 00:33, Sage Weil wrote: [...] One other thing to try before taking any drastic steps (as described below): ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-NNN fsck This gives: fsck success and the large alloc warnings: tcmalloc: large alloc 2145263616 bytes == 0x562412e1

Re: [ceph-users] rocksdb corruption, stale pg, rebuild bucket index

2019-06-13 Thread Harald Staub
On 13.06.19 00:29, Sage Weil wrote: On Thu, 13 Jun 2019, Simon Leinen wrote: Sage Weil writes: 2019-06-12 23:40:43.555 7f724b27f0c0 1 rocksdb: do_open column families: [default] Unrecognized command: stats ceph-kvstore-tool: /build/ceph-14.2.1/src/rocksdb/db/version_set.cc:356:

Re: [ceph-users] rocksdb corruption, stale pg, rebuild bucket index

2019-06-12 Thread Harald Staub
On 12.06.19 17:40, Sage Weil wrote: On Wed, 12 Jun 2019, Harald Staub wrote: Also opened an issue about the rocksdb problem: https://tracker.ceph.com/issues/40300 Thanks! The 'rocksdb: Corruption: file is too short' the root of the problem here. Can you try starting the OSD

Re: [ceph-users] rocksdb corruption, stale pg, rebuild bucket index

2019-06-12 Thread Harald Staub
Also opened an issue about the rocksdb problem: https://tracker.ceph.com/issues/40300 On 12.06.19 16:06, Harald Staub wrote: We ended in a bad situation with our RadosGW (Cluster is Nautilus 14.2.1, 350 OSDs with BlueStore): 1. There is a bucket with about 60 million objects, without shards

[ceph-users] rocksdb corruption, stale pg, rebuild bucket index

2019-06-12 Thread Harald Staub
We ended in a bad situation with our RadosGW (Cluster is Nautilus 14.2.1, 350 OSDs with BlueStore): 1. There is a bucket with about 60 million objects, without shards. 2. radosgw-admin bucket reshard --bucket $BIG_BUCKET --num-shards 1024 3. Resharding looked fine first, it counted up to the

[ceph-users] BlueStore sizing

2018-08-20 Thread Harald Staub
As mentioned here recently, the sizing recommendations for BlueStore have been updated: http://docs.ceph.com/docs/master/rados/configuration/bluestore-config-ref/#sizing In our ceph cluster, we have some ratios that are much lower, like 20GB of SSD (WAL and DB) per 7TB of spinning space. This

Re: [ceph-users] pg inconsistent

2018-03-08 Thread Harald Staub
Hi Brad Thank you very much for your attention. On 07.03.2018 23:46, Brad Hubbard wrote: On Thu, Mar 8, 2018 at 1:22 AM, Harald Staub <harald.st...@switch.ch> wrote: "ceph pg repair" leads to: 5.7bd repair 2 errors, 0 fixed Only an empty list from: rados list-inconsistent-ob

[ceph-users] pg inconsistent

2018-03-07 Thread Harald Staub
"ceph pg repair" leads to: 5.7bd repair 2 errors, 0 fixed Only an empty list from: rados list-inconsistent-obj 5.7bd --format=json-pretty Inspired by http://tracker.ceph.com/issues/12577 , I tried again with more verbose logging and searched the osd logs e.g. for "!=", "mismatch", could not