Re: [ceph-users] Throttle pool pg_num/pgp_num increase impact

2014-07-31 Thread Konstantinos Tompoulidis
Hi all, We have been working closely with Kostis on this and we have some results we thought we should share. Increasing the PGs was mandatory for us since we have been noticing fragmantation* issues on many OSDs. Also, we were below the recommended number for our main pool for quite some time

Re: [ceph-users] Throttle pool pg_num/pgp_num increase impact

2014-07-09 Thread Craig Lewis
FWIW, I'm beginning to think that SSD journals are a requirement. Even with minimal recovery/backfilling settings, it's very easy to kick off an operation that will bring a cluster to it's knees. Increasing PG/PGP, increasing replication, adding too many new OSDs, etc. These operations can cause

Re: [ceph-users] Throttle pool pg_num/pgp_num increase impact

2014-07-09 Thread Gregory Farnum
You're physically moving (lots of) data around between most of your disks. There's going to be an IO impact from that, although we are always working on ways to make it more controllable and try to minimize its impact. Your average latency increase sounds a little high to me, but I don't have much

Re: [ceph-users] Throttle pool pg_num/pgp_num increase impact

2014-07-08 Thread Kostis Fardelas
Hi Greg, thanks for your immediate feedback. My comments follow. Initially we thought that the 248 PG (15%) increment we used was really small, but it seems that we should increase PGs in even small increments. I think that the term "multiples" is not the appropriate term here, I fear someone woul

Re: [ceph-users] Throttle pool pg_num/pgp_num increase impact

2014-07-08 Thread Gregory Farnum
On Tue, Jul 8, 2014 at 10:14 AM, Dan Van Der Ster wrote: > Hi Greg, > We're also due for a similar splitting exercise in the not too distant > future, and will also need to minimize the impact on latency. > > In addition to increasing pg_num in small steps and using a minimal > max_backfills/recov

Re: [ceph-users] Throttle pool pg_num/pgp_num increase impact

2014-07-08 Thread Dan Van Der Ster
Hi Greg, We're also due for a similar splitting exercise in the not too distant future, and will also need to minimize the impact on latency. In addition to increasing pg_num in small steps and using a minimal max_backfills/recoveries configuration, I was planning to increase pgp_num very slowl

Re: [ceph-users] Throttle pool pg_num/pgp_num increase impact

2014-07-08 Thread Gregory Farnum
The impact won't be 300 times bigger, but it will be bigger. There are two things impacting your cluster here 1) the initial "split" of the affected PGs into multiple child PGs. You can mitigate this by stepping through pg_num at small multiples. 2) the movement of data to its new location (when yo

[ceph-users] Throttle pool pg_num/pgp_num increase impact

2014-07-08 Thread Kostis Fardelas
Hi, we maintain a cluster with 126 OSDs, replication 3 and appr. 148T raw used space. We store data objects basically on two pools, the one being appr. 300x larger in data stored and # of objects terms than the other. Based on the formula provided here http://ceph.com/docs/master/rados/operations/p