Re: [ceph-users] balancer mgr module

2018-02-16 Thread Caspar Smit
2018-02-16 10:16 GMT+01:00 Dan van der Ster :

> Hi Caspar,
>
> I've been trying the mgr balancer for a couple weeks now and can share
> some experience.
>
> Currently there are two modes implemented: upmap and crush-compat.
>
> Upmap requires all clients to be running luminous -- it uses this new
> pg-upmap mechanism to precisely move PGs one by one to a more balanced
> layout.
> The upmap mode is working only with num PGs, AFAICT, and on at least
> one of our clusters it happens to be moving PGs in a pool with no data
> -- useless. Checking the implementation, it should be upmapping PGs
> from a random pool each iteration -- I have a tracker open for this:
> http://tracker.ceph.com/issues/22431
>
> Upmap is the future, but for now I'm trying to exercise the
> crush-compat mode on some larger clusters. It's still early days, but
> in general it seems to be working in the right direction.
> crush-compat does two things: it creates a new "compat" crush
> weight-set to give underutilized OSDs more crush weight; and second,
> it phases out the osd reweights back to 1.0. So, if you have a cluster
> that was previously balanced with ceph osd reweight-by-*, then
> crush-compat will gently bring you to the new balancing strategy.
>
> There have been a few issues spotted in 12.2.2... some of the balancer
> config-key settings aren't cast properly to int/float so they can
> break the balancer; and more importantly the mgr doesn't refresh
> config-keys if they change. So if you do change the configuration, you
> need to ceph mgr fail  to force the next mgr to reload
> the config.
>
> My current config is:
>
> ceph config-key dump
> {
> "mgr/balancer/active": "1",
> "mgr/balancer/begin_time": "0830",
> "mgr/balancer/end_time": "1600",
> "mgr/balancer/max_misplaced": "0.01",
> "mgr/balancer/mode": "crush-compat"
> }
>
> Note that the begin_time/end_time seem to be in UTC, not the local time
> zone.
> max_displaced defaults to 0.05, and this is used to limit the
> percentage of PGs/objects to be rebalanced each iteration.
>
> I have it enabled (ceph balancer on) which means it tries to balance
> every 60s. It will skip an iteration if num misplaced is greater than
> > max_misplaced, or if any objects are degraded.
>
> When you're first trying the balancer you should do two things to test
> a one-off balancing (rather than the always on mode that I use):
>   - set debug_mgr=4/5 # then you can tail -f ceph-mgr.*.log | grep
> balancer  to see what it's doing
>   - ceph balancer mode crush-compat
>   - ceph balancer eval # to check the current score
>   - ceph balancer optimize myplan # create but do not execute a new plan
>   - ceph balancer eval myplan # check what would be the new score
> after myplan. Is it getting closer to the optimal value 0?
>   - ceph balancer show myplan # study what it's trying to do
>   - ceph balancer execute myplan # execute the plan. data movement starts
> here!
>   - ceph balancer reset # we do this because balancer rm is broken,
> and myplan isn't removed automatically after execution
>
> v12.2.3 has quite a few balancer fixes, and also adds a pool-specific
> balancing (which should hopefully fix my upmap issue).
>
> Hope that helps!
>
>
It sure does Dan! Thank you very much for your detailed answer.

I will start testing the balancer module with our demo cluster.

Caspar



> Dan
>
>
>
> On Fri, Feb 16, 2018 at 9:22 AM, Caspar Smit 
> wrote:
> > Hi,
> >
> > After watching Sage's talk at LinuxConfAU about making distributed
> storage
> > easy he mentioned the Balancer Manager module. After enabling this
> module,
> > pg's should get balanced automagically around the cluster.
> >
> > The module was added in Ceph Luminous v12.2.2
> >
> > Since i couldn't find much documentation about this module i was
> wondering
> > if it is considered stable? (production ready) or still experimental/WIP.
> >
> > Here's the original mailinglist post describing the module:
> >
> > https://www.spinics.net/lists/ceph-devel/msg37730.html
> >
> > A few questions:
> >
> > What are the differences between the different optimization modes?
> > Is the balancer run at certain intervals, if yes, what is the interval?
> > Will this trigger continuous backfillling/recovering of pg's when a
> cluster
> > is mostly under write load?
> >
> > Kind regards,
> > Caspar
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] balancer mgr module

2018-02-16 Thread Dan van der Ster
Hi Caspar,

I've been trying the mgr balancer for a couple weeks now and can share
some experience.

Currently there are two modes implemented: upmap and crush-compat.

Upmap requires all clients to be running luminous -- it uses this new
pg-upmap mechanism to precisely move PGs one by one to a more balanced
layout.
The upmap mode is working only with num PGs, AFAICT, and on at least
one of our clusters it happens to be moving PGs in a pool with no data
-- useless. Checking the implementation, it should be upmapping PGs
from a random pool each iteration -- I have a tracker open for this:
http://tracker.ceph.com/issues/22431

Upmap is the future, but for now I'm trying to exercise the
crush-compat mode on some larger clusters. It's still early days, but
in general it seems to be working in the right direction.
crush-compat does two things: it creates a new "compat" crush
weight-set to give underutilized OSDs more crush weight; and second,
it phases out the osd reweights back to 1.0. So, if you have a cluster
that was previously balanced with ceph osd reweight-by-*, then
crush-compat will gently bring you to the new balancing strategy.

There have been a few issues spotted in 12.2.2... some of the balancer
config-key settings aren't cast properly to int/float so they can
break the balancer; and more importantly the mgr doesn't refresh
config-keys if they change. So if you do change the configuration, you
need to ceph mgr fail  to force the next mgr to reload
the config.

My current config is:

ceph config-key dump
{
"mgr/balancer/active": "1",
"mgr/balancer/begin_time": "0830",
"mgr/balancer/end_time": "1600",
"mgr/balancer/max_misplaced": "0.01",
"mgr/balancer/mode": "crush-compat"
}

Note that the begin_time/end_time seem to be in UTC, not the local time zone.
max_displaced defaults to 0.05, and this is used to limit the
percentage of PGs/objects to be rebalanced each iteration.

I have it enabled (ceph balancer on) which means it tries to balance
every 60s. It will skip an iteration if num misplaced is greater than
> max_misplaced, or if any objects are degraded.

When you're first trying the balancer you should do two things to test
a one-off balancing (rather than the always on mode that I use):
  - set debug_mgr=4/5 # then you can tail -f ceph-mgr.*.log | grep
balancer  to see what it's doing
  - ceph balancer mode crush-compat
  - ceph balancer eval # to check the current score
  - ceph balancer optimize myplan # create but do not execute a new plan
  - ceph balancer eval myplan # check what would be the new score
after myplan. Is it getting closer to the optimal value 0?
  - ceph balancer show myplan # study what it's trying to do
  - ceph balancer execute myplan # execute the plan. data movement starts here!
  - ceph balancer reset # we do this because balancer rm is broken,
and myplan isn't removed automatically after execution

v12.2.3 has quite a few balancer fixes, and also adds a pool-specific
balancing (which should hopefully fix my upmap issue).

Hope that helps!

Dan



On Fri, Feb 16, 2018 at 9:22 AM, Caspar Smit  wrote:
> Hi,
>
> After watching Sage's talk at LinuxConfAU about making distributed storage
> easy he mentioned the Balancer Manager module. After enabling this module,
> pg's should get balanced automagically around the cluster.
>
> The module was added in Ceph Luminous v12.2.2
>
> Since i couldn't find much documentation about this module i was wondering
> if it is considered stable? (production ready) or still experimental/WIP.
>
> Here's the original mailinglist post describing the module:
>
> https://www.spinics.net/lists/ceph-devel/msg37730.html
>
> A few questions:
>
> What are the differences between the different optimization modes?
> Is the balancer run at certain intervals, if yes, what is the interval?
> Will this trigger continuous backfillling/recovering of pg's when a cluster
> is mostly under write load?
>
> Kind regards,
> Caspar
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com