[ceph-users] mds servers in endless segfault loop

2019-10-10 Thread Pickett, Neale T
Hello, ceph-users. Our mds servers keep segfaulting from a failed assertion, and for the first time I can't find anyone else who's posted about this problem. None of them are able to stay up, so our cephfs is down. We recently had to truncate the journal log after an upgrade to nautilus, and

Re: [ceph-users] Pool statistics via API

2019-10-10 Thread Konstantin Shalygin
Currently I am getting the pool statistics (especially USED/MAX AVAIL) via the command line: ceph df -f json-pretty| jq '.pools[] | select(.name == "poolname") | .stats.max_avail' ceph df -f json-pretty| jq '.pools[] | select(.name == "poolname") | .stats.bytes_used' Command "ceph df" does not

Re: [ceph-users] lot of inconsistent+failed_repair - failed to pick suitable auth object (14.2.3)

2019-10-10 Thread Brad Hubbard
On Fri, Oct 11, 2019 at 12:27 AM Kenneth Waegeman wrote: > > Hi Brad, all, > > Pool 6 has min_size 2: > > pool 6 'metadata' replicated size 3 min_size 2 crush_rule 1 object_hash > rjenkins pg_num 1024 pgp_num 1024 autoscale_mode warn last_change 172476 > flags hashpspool stripe_width 0

[ceph-users] Openstack VM IOPS drops dramatically during Ceph recovery

2019-10-10 Thread huxia...@horebdata.cn
Hi, folks, I have a middle-size Ceph cluster as cinder backup for openstack (queens). Duing testing, one Ceph node went down unexpected and powered up again ca 10 minutes later, Ceph cluster starts PG recovery. To my surprise, VM IOPS drops dramatically during Ceph recovery, from ca. 13K IOPS

Re: [ceph-users] lot of inconsistent+failed_repair - failed to pick suitable auth object (14.2.3)

2019-10-10 Thread Kenneth Waegeman
Hi Brad, all, Pool 6 has min_size 2: pool 6 'metadata' replicated size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 1024 pgp_num 1024 autoscale_mode warn last_change 172476 flags hashpspool stripe_width 0 application cephfs The output for all the inconsistent pgs is this: {    

Re: [ceph-users] radosgw pegging down 5 CPU cores when no data is being transferred

2019-10-10 Thread Paul Emmerich
I've also encountered this issue on a cluster yesterday; one CPU got stuck in an infinite loop in get_obj_data::flush and it stopped serving requests. I've updated the tracker issue accordingly. Paul -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io

[ceph-users] Pool statistics via API

2019-10-10 Thread Sinan Polat
Hi, Currently I am getting the pool statistics (especially USED/MAX AVAIL) via the command line: ceph df -f json-pretty| jq '.pools[] | select(.name == "poolname") | .stats.max_avail' ceph df -f json-pretty| jq '.pools[] | select(.name == "poolname") | .stats.bytes_used' Command "ceph df" does

Re: [ceph-users] mon sudden crash loop - pinned map

2019-10-10 Thread Philippe D'Anjou
After trying to disable the paxos service trim temporarily (since that seemed to trigger it initially), we now see this:     "assert_condition": "from != to",     "assert_func": "void PaxosService::trim(MonitorDBStore::TransactionRef, version_t, version_t)",     "assert_file":

Re: [ceph-users] [Nfs-ganesha-devel] 2.7.3 with CEPH_FSAL Crashing

2019-10-10 Thread David C
Thanks, Patrick. Looks like the fix is awaiting review, I guess my options are to hold tight for 14.2.5 or patch myself if I get desperate. I've seen this crash about 4 times over the past 96 hours, is there anything I can do to mitigate the issue in the meantime? On Wed, Oct 9, 2019 at 9:23 PM

Re: [ceph-users] lot of inconsistent+failed_repair - failed to pick suitable auth object (14.2.3)

2019-10-10 Thread Brad Hubbard
Does pool 6 have min_size = 1 set? https://tracker.ceph.com/issues/24994#note-5 would possibly be helpful here, depending on what the output of the following command looks like. # rados list-inconsistent-obj [pgid] --format=json-pretty On Thu, Oct 10, 2019 at 8:16 PM Kenneth Waegeman wrote: >

[ceph-users] lot of inconsistent+failed_repair - failed to pick suitable auth object (14.2.3)

2019-10-10 Thread Kenneth Waegeman
Hi all, After some node failure and rebalancing, we have a lot of pg's in inconsistent state. I tried to repair, but it din't work. This is also in the logs: 2019-10-10 11:23:27.221 7ff54c9b0700  0 log_channel(cluster) log [DBG] : 6.327 repair starts 2019-10-10 11:23:27.431 7ff5509b8700 -1

Re: [ceph-users] mon sudden crash loop - pinned map

2019-10-10 Thread Philippe D'Anjou
How do I Import an osdmap in Nautilus? I saw documentation for older version but it seems one now can only export but not import anymore? Am Donnerstag, 10. Oktober 2019, 08:52:03 OESZ hat Philippe D'Anjou Folgendes geschrieben: I dont think this has anything to do with CephFS, the