[ceph-users] Fwd: radosgw pegging down 5 CPU cores when no data is being transferred

2019-08-22 Thread Eric Ivancich
Thank you for providing the profiling data, Vladimir. There are 5078 threads and most of them are waiting. Here is a list of the deepest call of each thread with duplicates removed. + 100.00% epoll_wait + 100.00% get_obj_data::flush(rgw::OwningList&&)

Re: [ceph-users] Watch a RADOS object for changes, specifically iscsi gateway.conf object

2019-08-22 Thread Lenz Grimmer
On 8/22/19 9:38 PM, Wesley Dillingham wrote: > I am interested in keeping a revision history of ceph-iscsi's > gateway.conf object for any and all changes. It seems to me this may > come in handy to revert the environment to a previous state. My question > is are there any existing tools which do

[ceph-users] Watch a RADOS object for changes, specifically iscsi gateway.conf object

2019-08-22 Thread Wesley Dillingham
I am interested in keeping a revision history of ceph-iscsi's gateway.conf object for any and all changes. It seems to me this may come in handy to revert the environment to a previous state. My question is are there any existing tools which do similar or could someone please suggest, if they

Re: [ceph-users] Theory: High I/O-wait inside VM with RBD due to CPU throttling

2019-08-22 Thread Jason Dillaman
On Thu, Aug 22, 2019 at 11:29 AM Wido den Hollander wrote: > > > > On 8/22/19 3:59 PM, Jason Dillaman wrote: > > On Thu, Aug 22, 2019 at 9:23 AM Wido den Hollander wrote: > >> > >> Hi, > >> > >> In a couple of situations I have encountered that Virtual Machines > >> running on RBD had a high

Re: [ceph-users] Theory: High I/O-wait inside VM with RBD due to CPU throttling

2019-08-22 Thread Wido den Hollander
On 8/22/19 3:59 PM, Jason Dillaman wrote: > On Thu, Aug 22, 2019 at 9:23 AM Wido den Hollander wrote: >> >> Hi, >> >> In a couple of situations I have encountered that Virtual Machines >> running on RBD had a high I/O-wait, nearly 100%, on their vdX (VirtIO) >> or sdX (Virtio-SCSI) devices

[ceph-users] Increase pg_num while backfilling

2019-08-22 Thread Lukáš Kubín
Hello, yesterday I've added 4th OSD node (increase from 39 to 52 OSDs) into our Jewel cluster. Backfilling of remapped pgs is still running and seems it will run for another day until complete. I know the pg_num of largest is undersized and I should increase it from 512 to 2048. The question is

Re: [ceph-users] MDSs report damaged metadata

2019-08-22 Thread Robert LeBlanc
We just had metadata damage show up on our Jewel cluster. I tried a few things like renaming directories and scanning, but the damage would just show up again in less than 24 hours. I finally just copied the directories with the damage to a tmp location on CephFS, then swapped it with the damaged

[ceph-users] hsbench 0.2 released

2019-08-22 Thread Mark Nelson
Hi Folks, I've updated hsbench (new S3 benchmark) to 0.2 Notable changes since 0.1: - Can now output CSV results - Can now output JSON results - Fix for poor read performance with low thread counts - New bucket listing benchmark with a new "mk" flag that lets you control the number of

Re: [ceph-users] Theory: High I/O-wait inside VM with RBD due to CPU throttling

2019-08-22 Thread Jason Dillaman
On Thu, Aug 22, 2019 at 9:23 AM Wido den Hollander wrote: > > Hi, > > In a couple of situations I have encountered that Virtual Machines > running on RBD had a high I/O-wait, nearly 100%, on their vdX (VirtIO) > or sdX (Virtio-SCSI) devices while they were performing CPU intensive tasks. > >

[ceph-users] Tunables client support

2019-08-22 Thread Lukáš Kubín
Hello, I am considering enabling optimal crush tunables in our Jewel cluster (4 nodes, 52 OSD, used as OpenStack Cinder+Nova backend = RBD images). I've got two questions: 1. Do I understand right that having the optimal tunables on can be considered best practice and should be applied in most

[ceph-users] Theory: High I/O-wait inside VM with RBD due to CPU throttling

2019-08-22 Thread Wido den Hollander
Hi, In a couple of situations I have encountered that Virtual Machines running on RBD had a high I/O-wait, nearly 100%, on their vdX (VirtIO) or sdX (Virtio-SCSI) devices while they were performing CPU intensive tasks. These servers would be running a very CPU intensive application while *not*

Re: [ceph-users] ceph status: pg backfill_toofull, but all OSDs have enough space

2019-08-22 Thread Brad Hubbard
https://tracker.ceph.com/issues/41255 is probably reporting the same issue. On Thu, Aug 22, 2019 at 6:31 PM Lars Täuber wrote: > > Hi there! > > We also experience this behaviour of our cluster while it is moving pgs. > > # ceph health detail > HEALTH_ERR 1 MDSs report slow metadata IOs; Reduced

Re: [ceph-users] ceph status: pg backfill_toofull, but all OSDs have enough space

2019-08-22 Thread Lars Täuber
Hi there! We also experience this behaviour of our cluster while it is moving pgs. # ceph health detail HEALTH_ERR 1 MDSs report slow metadata IOs; Reduced data availability: 2 pgs inactive; Degraded data redundancy (low space): 1 pg backfill_toofull MDS_SLOW_METADATA_IO 1 MDSs report slow

Re: [ceph-users] pg 21.1f9 is stuck inactive for 53316.902820, current state remapped

2019-08-22 Thread Lars Täuber
All osd are up. I manually mark one out of 30 "out" not "down". The primary osd of the stuck pgs are neither marked as out nor as down. Thanks Lars Thu, 22 Aug 2019 15:01:12 +0700 wahyu.muqs...@gmail.com ==> wahyu.muqs...@gmail.com, Lars Täuber : > I think you use too few osd. when you use

Re: [ceph-users] pg 21.1f9 is stuck inactive for 53316.902820, current state remapped

2019-08-22 Thread Lars Täuber
There are 30 osds. Thu, 22 Aug 2019 14:38:10 +0700 wahyu.muqs...@gmail.com ==> ceph-users@lists.ceph.com, Lars Täuber : > how many osd do you use ? ___ ceph-users mailing list ceph-users@lists.ceph.com

[ceph-users] pg 21.1f9 is stuck inactive for 53316.902820, current state remapped

2019-08-22 Thread Lars Täuber
Hi all, we are using ceph in version 14.2.2 from https://mirror.croit.io/debian-nautilus/ on debian buster and experiencing problems with cephfs. The mounted file system produces hanging processes due to pg stuck inactive. This often happens after I marked single osds out manually. A typical

Re: [ceph-users] mon db change from rocksdb to leveldb

2019-08-22 Thread nokia ceph
Thank you Paul. On Wed, Aug 21, 2019 at 5:36 PM Paul Emmerich wrote: > You can't downgrade from Luminous to Kraken well officially at least. > > I guess it maybe could somehow work but you'd need to re-create all > the services. For the mon example: delete a mon, create a new old one, > let