[ceph-users] Garbage collection growing and db_compaction with small file uploads

2019-01-09 Thread Chris Sarginson
Hi all I'm seeing some behaviour I wish to check on a Luminous (12.2.10) cluster that I'm running for rbd and rgw (mostly SATA filestore with NVME journal with a few SATA only bluestore). There's a set of dedicated SSD OSDs running bluestore for the .rgw buckets.index pool and also holding the .r

Re: [ceph-users] Resolving Large omap objects in RGW index pool

2018-10-18 Thread Chris Sarginson
loop > through until there is no longer a "new_bucket_instance_id". After letting > this complete, this suggests that I have over 5000 indexes for 74 buckets, > some of these buckets have > 100 indexes apparently. > > :~# awk '{print $1}' buckets_with_multiple_r

Re: [ceph-users] Resolving Large omap objects in RGW index pool

2018-10-16 Thread Chris Sarginson
m/issues/24603 Should I be OK to loop through these indexes and remove any with a reshard_status of 2, a new_bucket_instance_id that does not match the bucket_instance_id returned by the command: radosgw-admin bucket stats --bucket ${bucket} I'd ideally like to get to a point where I can turn

Re: [ceph-users] Resolving Large omap objects in RGW index pool

2018-10-04 Thread Chris Sarginson
Hi, Thanks for the response - I am still unsure as to what will happen to the "marker" reference in the bucket metadata, as this is the object that is being detected as Large. Will the bucket generate a new "marker" reference in the bucket metadata? I've been reading this page to try and get a b

[ceph-users] Resolving Large omap objects in RGW index pool

2018-10-04 Thread Chris Sarginson
Hi, Ceph version: Luminous 12.2.7 Following upgrading to Luminous from Jewel we have been stuck with a cluster in HEALTH_WARN state that is complaining about large omap objects. These all seem to be located in our .rgw.buckets.index pool. We've disabled auto resharding on bucket indexes due to s

Re: [ceph-users] Many concurrent drive failures - How do I activate pgs?

2018-02-22 Thread Chris Sarginson
Hi Caspar, Sean and I replaced the problematic DC S4600 disks (after all but one had failed) in our cluster with Samsung SM863a disks. There was an NDA for new Intel firmware (as mentioned earlier in the thread by David) but given the problems we were experiencing we moved all Intel disks to a sin

Re: [ceph-users] Increase recovery / backfilling speed (with many small objects)

2018-01-05 Thread Chris Sarginson
You probably want to consider increasing osd max backfills You should be able to inject this online http://docs.ceph.com/docs/luminous/rados/configuration/osd-config-ref/ You might want to drop your osd recovery max active settings back down to around 2 or 3, although with it being SSD your perf

Re: [ceph-users] remove require_jewel_osds flag after upgrade to kraken

2017-07-13 Thread Chris Sarginson
The flag is fine, it's just to ensure that OSDs from a release before Jewel can't be added to the cluster: See http://ceph.com/geen-categorie/v10-2-4-jewel-released/ under "Upgrading from hammer" On Thu, 13 Jul 2017 at 07:59 Jan Krcmar wrote: > hi, > > is it possible to remove the require_jewel

Re: [ceph-users] saving file on cephFS mount using vi takes pause/time

2017-04-13 Thread Chris Sarginson
Is it related to this the recovery behaviour of vim creating a swap file, which I think nano does not do? http://vimdoc.sourceforge.net/htmldoc/recover.html A sync into cephfs I think needs the write to get confirmed all the way down from the osds performing the write before it returns the confir

Re: [ceph-users] civetweb deamon dies on https port

2017-01-19 Thread Chris Sarginson
You look to have a typo in this line: rgw_frontends = "civetweb port=8080s ssl_certificate=/etc/pki/tls/ cephrgw01.crt" It would seem from the error it should be port=8080, not port=8080s. On Thu, 19 Jan 2017 at 08:59 Iban Cabrillo wrote: > Dear cephers, >I just finish the integration betw

Re: [ceph-users] CephFS FAILED assert(dn->get_linkage()->is_null())

2016-12-09 Thread Chris Sarginson
Hi Goncarlo, In the end we ascertained that the assert was coming from reading corrupt data in the mds journal. We have followed the sections at the following link (http://docs.ceph.com/docs/jewel/cephfs/disaster-recovery/) in order down to (and including) MDS Table wipes (only wiping the "sessio