Re: [ceph-users] ceph status: pg backfill_toofull, but all OSDs have enough space

2019-08-21 Thread Reed Dier
Just chiming in to say that I too had some issues with backfill_toofull PGs, despite no OSD's being in a backfill_full state, albeit, there were some nearfull OSDs. I was able to get through it by reweighting down the OSD that was the target reported by ceph pg dump | grep 'backfill_toofull'.

Re: [ceph-users] Mutliple CephFS Filesystems Nautilus (14.2.2)

2019-08-21 Thread Patrick Donnelly
On Wed, Aug 21, 2019 at 2:02 PM wrote: > How experimental is the multiple CephFS filesystems per cluster feature? We > plan to use different sets of pools (meta / data) per filesystem. > > Are there any known issues? No. It will likely work fine but some things may change in a future version

[ceph-users] Mutliple CephFS Filesystems Nautilus (14.2.2)

2019-08-21 Thread DHilsbos
All; How experimental is the multiple CephFS filesystems per cluster feature? We plan to use different sets of pools (meta / data) per filesystem. Are there any known issues? While we're on the subject, is it possible to assign a different active MDS to each filesystem? Thank you, Dominic

Re: [ceph-users] radosgw pegging down 5 CPU cores when no data is being transferred

2019-08-21 Thread Vladimir Brik
> Are you running multisite? No > Do you have dynamic bucket resharding turned on? Yes. "radosgw-admin reshard list" prints "[]" > Are you using lifecycle? I am not sure. How can I check? "radosgw-admin lc list" says "[]" > And just to be clear -- sometimes all 3 of your rados gateways are >

[ceph-users] ceph status: pg backfill_toofull, but all OSDs have enough space

2019-08-21 Thread Vladimir Brik
Hello After increasing number of PGs in a pool, ceph status is reporting "Degraded data redundancy (low space): 1 pg backfill_toofull", but I don't understand why, because all OSDs seem to have enough space. ceph health detail says: pg 40.155 is active+remapped+backfill_toofull, acting

Re: [ceph-users] radosgw pegging down 5 CPU cores when no data is being transferred

2019-08-21 Thread J. Eric Ivancich
On 8/21/19 10:22 AM, Mark Nelson wrote: > Hi Vladimir, > > > On 8/21/19 8:54 AM, Vladimir Brik wrote: >> Hello >> [much elided] > You might want to try grabbing a a callgraph from perf instead of just > running perf top or using my wallclock profiler to see if you can drill > down and find out

Re: [ceph-users] radosgw pegging down 5 CPU cores when no data is being transferred

2019-08-21 Thread Vladimir Brik
Correction: the number of threads stuck using 100% of a CPU core varies from 1 to 5 (it's not always 5) Vlad On 8/21/19 8:54 AM, Vladimir Brik wrote: Hello I am running a Ceph 14.2.1 cluster with 3 rados gateways. Periodically, radosgw process on those machines starts consuming 100% of 5

Re: [ceph-users] radosgw pegging down 5 CPU cores when no data is being transferred

2019-08-21 Thread Paul Emmerich
On Wed, Aug 21, 2019 at 3:55 PM Vladimir Brik wrote: > > Hello > > I am running a Ceph 14.2.1 cluster with 3 rados gateways. Periodically, > radosgw process on those machines starts consuming 100% of 5 CPU cores > for days at a time, even though the machine is not being used for data > transfers

Re: [ceph-users] radosgw pegging down 5 CPU cores when no data is being transferred

2019-08-21 Thread Mark Nelson
Hi Vladimir, On 8/21/19 8:54 AM, Vladimir Brik wrote: Hello I am running a Ceph 14.2.1 cluster with 3 rados gateways. Periodically, radosgw process on those machines starts consuming 100% of 5 CPU cores for days at a time, even though the machine is not being used for data transfers

[ceph-users] radosgw pegging down 5 CPU cores when no data is being transferred

2019-08-21 Thread Vladimir Brik
Hello I am running a Ceph 14.2.1 cluster with 3 rados gateways. Periodically, radosgw process on those machines starts consuming 100% of 5 CPU cores for days at a time, even though the machine is not being used for data transfers (nothing in radosgw logs, couple of KB/s of network). This

Re: [ceph-users] Applications slow in VMs running RBD disks

2019-08-21 Thread EDH - Manuel Rios Fernandez
Use 100% Flash setup avoid rotational disk for get some performance in HDD with windows. Windows is very sensitive to disk latency and interface with latency provides a bad sense for customer some times. You can check in your Graphana for ceph your avg read/write when in windows go up

Re: [ceph-users] Applications slow in VMs running RBD disks

2019-08-21 Thread Gesiel Galvão Bernardes
Hi Eliza, Em qua, 21 de ago de 2019 às 09:30, Eliza escreveu: > Hi > > on 2019/8/21 20:25, Gesiel Galvão Bernardes wrote: > > I`m use a Qemu/kvm(Opennebula) with Ceph/RBD for running VMs, and I > > having problems with slowness in aplications that many times not > > consuming very CPU or RAM.

Re: [ceph-users] Applications slow in VMs running RBD disks

2019-08-21 Thread Eliza
Hi on 2019/8/21 20:25, Gesiel Galvão Bernardes wrote: I`m use a Qemu/kvm(Opennebula) with Ceph/RBD for running VMs, and I having problems with slowness in aplications that many times not consuming very CPU or RAM. This problem affect mostly Windows. Appearly the problem is that normally the

[ceph-users] Applications slow in VMs running RBD disks

2019-08-21 Thread Gesiel Galvão Bernardes
Hi, I`m use a Qemu/kvm(Opennebula) with Ceph/RBD for running VMs, and I having problems with slowness in aplications that many times not consuming very CPU or RAM. This problem affect mostly Windows. Appearly the problem is that normally the application load many short files (ex: DLLs) and these

Re: [ceph-users] mon db change from rocksdb to leveldb

2019-08-21 Thread Paul Emmerich
You can't downgrade from Luminous to Kraken well officially at least. I guess it maybe could somehow work but you'd need to re-create all the services. For the mon example: delete a mon, create a new old one, let it sync, etc. Still a bad idea. Paul -- Paul Emmerich Looking for help with

[ceph-users] mon db change from rocksdb to leveldb

2019-08-21 Thread nokia ceph
Hi Team, One of our old customer had Kraken and they are going to upgrade to Luminous . In the process they also requesting for downgrade procedure. Kraken used leveldb for ceph-mon data , from luminous it changed to rocksdb , upgrade works without any issues. When we downgrade , the ceph-mon

Re: [ceph-users] cephfs-snapshots causing mds failover, hangs

2019-08-21 Thread thoralf schulze
hi zheng, On 8/21/19 4:32 AM, Yan, Zheng wrote: > Please enable debug mds (debug_mds=10), and try reproducing it again. we will get back with the logs on monday. thank you & with kind regards, t. signature.asc Description: OpenPGP digital signature

Re: [ceph-users] fixing a bad PG per OSD decision with pg-autoscaling?

2019-08-21 Thread EDH - Manuel Rios Fernandez
HI Nigel, In Nautilus you can decrease PG , but it take weeks , for example for us to go from 4096 to 2048 took more than 2 weeks. First at all pg-autoscaling is activable by pool. And you’re going to get a lot of warning , but it works. Normally is recommended upgrade a cluster with