Hi all,
We have been working closely with Kostis on this and we have some results we
thought we should share.
Increasing the PGs was mandatory for us since we have been noticing
fragmantation* issues on many OSDs. Also, we were below the recommended
number for our main pool for quite some time
FWIW, I'm beginning to think that SSD journals are a requirement.
Even with minimal recovery/backfilling settings, it's very easy to kick off
an operation that will bring a cluster to it's knees. Increasing PG/PGP,
increasing replication, adding too many new OSDs, etc. These operations
can cause
You're physically moving (lots of) data around between most of your
disks. There's going to be an IO impact from that, although we are
always working on ways to make it more controllable and try to
minimize its impact. Your average latency increase sounds a little
high to me, but I don't have much
Hi Greg,
thanks for your immediate feedback. My comments follow.
Initially we thought that the 248 PG (15%) increment we used was
really small, but it seems that we should increase PGs in even small
increments. I think that the term "multiples" is not the appropriate
term here, I fear someone woul
On Tue, Jul 8, 2014 at 10:14 AM, Dan Van Der Ster
wrote:
> Hi Greg,
> We're also due for a similar splitting exercise in the not too distant
> future, and will also need to minimize the impact on latency.
>
> In addition to increasing pg_num in small steps and using a minimal
> max_backfills/recov
Hi Greg,
We're also due for a similar splitting exercise in the not too distant future,
and will also need to minimize the impact on latency.
In addition to increasing pg_num in small steps and using a minimal
max_backfills/recoveries configuration, I was planning to increase pgp_num very
slowl
The impact won't be 300 times bigger, but it will be bigger. There are two
things impacting your cluster here
1) the initial "split" of the affected PGs into multiple child PGs. You can
mitigate this by stepping through pg_num at small multiples.
2) the movement of data to its new location (when yo
Hi,
we maintain a cluster with 126 OSDs, replication 3 and appr. 148T raw
used space. We store data objects basically on two pools, the one
being appr. 300x larger in data stored and # of objects terms than the
other. Based on the formula provided here
http://ceph.com/docs/master/rados/operations/p