Re: [ceph-users] How to add 100 new OSDs...

2019-09-11 Thread Stefan Kooman
Quoting Massimo Sgaravatto (massimo.sgarava...@gmail.com):
> Thank you
> 
> But the algorithms used during backfilling and during rebalancing (to
> decide where data have to be placed) are different ?

Yes, the balancer takes more factors into consideration. It also takes
into consideration all of the pools and can make smarter decisions. We
noticed way less data movement when using balancer than expected.

> I.e. assuming that no new data are written and no data are deleted, if you
> rely on the standard way (i.e. backfilling), when the data movement process
> finishes (and therefore the status is HEALTH_OK), can the automatic
> balancer (in upmap mode) decide that  some data have to be re-moved ?

Yes, for sure. Cephs balancing is not perfect (because # PGs is less
than you need for ideal placement). You can look at "ceph osd df" and
look at the standard deviation. If that is quite high it makes sense to
use balancer to equalize to otain higher utilization. Either PG
optimized or capactity optimized (or a mix of both, default balancer
settings).

Gr. Stefan

-- 
| BIT BV  https://www.bit.nl/Kamer van Koophandel 09090351
| GPG: 0xD14839C6   +31 318 648 688 / i...@bit.nl
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to add 100 new OSDs...

2019-09-11 Thread Massimo Sgaravatto
Thank you

But the algorithms used during backfilling and during rebalancing (to
decide where data have to be placed) are different ?

I.e. assuming that no new data are written and no data are deleted, if you
rely on the standard way (i.e. backfilling), when the data movement process
finishes (and therefore the status is HEALTH_OK), can the automatic
balancer (in upmap mode) decide that  some data have to be re-moved ?

Thanks, Massimo

On Wed, Sep 11, 2019 at 12:30 PM Stefan Kooman  wrote:

> Quoting Massimo Sgaravatto (massimo.sgarava...@gmail.com):
> > Just for my education, why letting the balancer moving the PGs to the new
> > OSDs (CERN approach) is better than  a throttled backfilling ?
>
> 1) Because you can pause the process on any given moment and obtain
> HEALTH_OK again. 2) The balancer moves the data more efficiently. 3) the
> balancer will avoid putting PGs on OSDs that are already full ... you
> might avoid "too full" PG situations.
>
> Gr. Stefan
>
>
> --
> | BIT BV  https://www.bit.nl/Kamer van Koophandel 09090351
> | GPG: 0xD14839C6   +31 318 648 688 / i...@bit.nl
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to add 100 new OSDs...

2019-09-11 Thread Stefan Kooman
Quoting Massimo Sgaravatto (massimo.sgarava...@gmail.com):
> Just for my education, why letting the balancer moving the PGs to the new
> OSDs (CERN approach) is better than  a throttled backfilling ?

1) Because you can pause the process on any given moment and obtain
HEALTH_OK again. 2) The balancer moves the data more efficiently. 3) the
balancer will avoid putting PGs on OSDs that are already full ... you
might avoid "too full" PG situations.

Gr. Stefan


-- 
| BIT BV  https://www.bit.nl/Kamer van Koophandel 09090351
| GPG: 0xD14839C6   +31 318 648 688 / i...@bit.nl
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to add 100 new OSDs...

2019-09-11 Thread Massimo Sgaravatto
Just for my education, why letting the balancer moving the PGs to the new
OSDs (CERN approach) is better than  a throttled backfilling ?


Thanks, Massimo


On Sat, Jul 27, 2019 at 12:31 AM Stefan Kooman  wrote:

> Quoting Peter Sabaini (pe...@sabaini.at):
> > What kind of commit/apply latency increases have you seen when adding a
> > large numbers of OSDs? I'm nervous how sensitive workloads might react
> > here, esp. with spinners.
>
> You mean when there is backfilling going on? Instead of doing "a big
> bang" you can also use Dan van der Ster's trick with upmap balancer:
>
> https://github.com/cernceph/ceph-scripts/blob/master/tools/upmap/upmap-remapped.py
>
> See
>
> https://www.slideshare.net/Inktank_Ceph/ceph-day-berlin-mastering-ceph-operations-upmap-and-the-mgr-balancer
>
> So you would still have norebalance / nobackfill / norecover and ceph
> balancer off. Then you run the script as many times as necessary to get
> "HEALTH_OK" again (on clusters other than nautilus) and there a no more
> PGs remapped. Unset the flags and enable the ceph balancer ... now the
> balancer will slowly move PGs to the new OSDs.
>
> We've used this trick to increase the number of PGs on a pool, and will
> use this to expand the cluster in the near future.
>
> This only works if you can use the balancer in "upmap" mode. Note that
> using upmap requires that all clients be Luminous or newer. If you are
> using cephfs kernel client it might report as not compatible (jewel) but
> recent linux distributions work well (Ubuntu 18.04 / CentOS 7).
>
> Gr. Stefan
>
> --
> | BIT BV  https://www.bit.nl/Kamer van Koophandel 09090351
> | GPG: 0xD14839C6   +31 318 648 688 / i...@bit.nl
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to add 100 new OSDs...

2019-08-04 Thread Anthony D'Atri
>>> We have been using:
>>> 
>>> osd op queue = wpq
>>> osd op queue cut off = high
>>> 
>>> It virtually eliminates the impact of backfills on our clusters. Our
> 
> It does better because it is a fair share queue and doesn't let recovery
> ops take priority over client ops at any point for any time. It allows
> clients to have a much more predictable latency to the storage.


Why aren’t these default settings then?  Those who set these:  do you run with 
them all the time, or only while expanding?  Is peering still impactful?

— aad

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to add 100 new OSDs...

2019-08-03 Thread Robert LeBlanc
It does better because it is a fair share queue and doesn't let recovery
ops take priority over client ops at any point for any time. It allows
clients to have a much more predictable latency to the storage.

Sent from a mobile device, please excuse any typos.

On Sat, Aug 3, 2019, 1:10 PM Alex Gorbachev  wrote:

> On Fri, Aug 2, 2019 at 6:57 PM Robert LeBlanc 
> wrote:
> >
> > On Fri, Jul 26, 2019 at 1:02 PM Peter Sabaini  wrote:
> >>
> >> On 26.07.19 15:03, Stefan Kooman wrote:
> >> > Quoting Peter Sabaini (pe...@sabaini.at):
> >> >> What kind of commit/apply latency increases have you seen when
> adding a
> >> >> large numbers of OSDs? I'm nervous how sensitive workloads might
> react
> >> >> here, esp. with spinners.
> >> >
> >> > You mean when there is backfilling going on? Instead of doing "a big
> >>
> >> Yes exactly. I usually tune down max rebalance and max recovery active
> >> knobs to lessen impact but still I found the additional write load can
> >> substantially increase i/o latencies. Not all workloads like this.
> >
> >
> > We have been using:
> >
> > osd op queue = wpq
> > osd op queue cut off = high
> >
> > It virtually eliminates the impact of backfills on our clusters. Our
> backfill and recovery times have increased when the cluster has lots of
> client I/O, but the clients haven't noticed that huge backfills have been
> going on.
> >
> > 
> > Robert LeBlanc
> > PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> Would this be superior to setting:
>
> osd_recovery_sleep = 0.5 (or some high value)
>
>
> --
> Alex Gorbachev
> Intelligent Systems Services Inc.
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to add 100 new OSDs...

2019-08-03 Thread Alex Gorbachev
On Fri, Aug 2, 2019 at 6:57 PM Robert LeBlanc  wrote:
>
> On Fri, Jul 26, 2019 at 1:02 PM Peter Sabaini  wrote:
>>
>> On 26.07.19 15:03, Stefan Kooman wrote:
>> > Quoting Peter Sabaini (pe...@sabaini.at):
>> >> What kind of commit/apply latency increases have you seen when adding a
>> >> large numbers of OSDs? I'm nervous how sensitive workloads might react
>> >> here, esp. with spinners.
>> >
>> > You mean when there is backfilling going on? Instead of doing "a big
>>
>> Yes exactly. I usually tune down max rebalance and max recovery active
>> knobs to lessen impact but still I found the additional write load can
>> substantially increase i/o latencies. Not all workloads like this.
>
>
> We have been using:
>
> osd op queue = wpq
> osd op queue cut off = high
>
> It virtually eliminates the impact of backfills on our clusters. Our backfill 
> and recovery times have increased when the cluster has lots of client I/O, 
> but the clients haven't noticed that huge backfills have been going on.
>
> 
> Robert LeBlanc
> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Would this be superior to setting:

osd_recovery_sleep = 0.5 (or some high value)


--
Alex Gorbachev
Intelligent Systems Services Inc.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to add 100 new OSDs...

2019-08-02 Thread Robert LeBlanc
On Fri, Jul 26, 2019 at 1:02 PM Peter Sabaini  wrote:

> On 26.07.19 15:03, Stefan Kooman wrote:
> > Quoting Peter Sabaini (pe...@sabaini.at):
> >> What kind of commit/apply latency increases have you seen when adding a
> >> large numbers of OSDs? I'm nervous how sensitive workloads might react
> >> here, esp. with spinners.
> >
> > You mean when there is backfilling going on? Instead of doing "a big
>
> Yes exactly. I usually tune down max rebalance and max recovery active
> knobs to lessen impact but still I found the additional write load can
> substantially increase i/o latencies. Not all workloads like this.
>

We have been using:

osd op queue = wpq
osd op queue cut off = high

It virtually eliminates the impact of backfills on our clusters. Our
backfill and recovery times have increased when the cluster has lots of
client I/O, but the clients haven't noticed that huge backfills have been
going on.


Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to add 100 new OSDs...

2019-07-28 Thread Paul Mezzanini
I'll throw my $.02 in from when I was growing our cluster.

My method ended up being to script up the LVM creation so the lvm names reflect 
OSD/Journal serial numbers for easy location later,  "ceph-volume prepare" the 
whole node to get it ready for insertion followed by "ceph-volume activate".  I 
typically see more of an impact on performance with peering instead of with 
rebalancing.  

If I'm doing a whole node, I make sure the node's weight is set to 0 and slowly 
walk it up in chunks.  If it's anything less I just let it fly as-is.  

My workloads didn't seem to mind the increased latency during a huge rebalance 
but another admin has some latency sensitive VMs hosted and by moving it up 
slowly I could easily wait for things to settle if he saw the numbers get too 
high.  It's a simple knob twist to make another admin happy when doing storage 
changes so I do it.


--
Paul Mezzanini
Sr Systems Administrator / Engineer, Research Computing
Information & Technology Services
Finance & Administration
Rochester Institute of Technology
o:(585) 475-3245 | pfm...@rit.edu

CONFIDENTIALITY NOTE: The information transmitted, including attachments, is
intended only for the person(s) or entity to which it is addressed and may
contain confidential and/or privileged material. Any review, retransmission,
dissemination or other use of, or taking of any action in reliance upon this
information by persons or entities other than the intended recipient is
prohibited. If you received this in error, please contact the sender and
destroy any copies of this information.



From: ceph-users  on behalf of Anthony 
D'Atri 
Sent: Sunday, July 28, 2019 4:09 AM
To: ceph-users
Subject: Re: [ceph-users] How to add 100 new OSDs...

Paul Emmerich wrote:

> +1 on adding them all at the same time.
>
> All these methods that gradually increase the weight aren't really
> necessary in newer releases of Ceph.

Because the default backfill/recovery values are lower than they were in, say, 
Dumpling?

Doubling (or more) the size of a cluster in one swoop still means a lot of 
peering and a lot of recovery I/O, I’ve seen a cluster’s data rate go to or 
near 0 for a brief but nonzero length of time.  If something goes wrong with 
the network (cough cough subtle jumbo frame lossage cough) , if one has 
fat-fingered something along the way, etc. going in increments means that a ^C 
lets the cluster stablize before very long.  Then you get to troubleshoot with 
HEALTH_OK instead of HEALTH_WARN or HEALTH_ERR.

Having experienced a cluster be DoS’d for hours when its size was tripled in 
one go, I’m once bitten twice shy.  Yes, that was Dumpling, but even with SSDs 
on Jewel and Luminous I’ve seen sigificant client performance impact from 
en-masse topology changes.

— aad

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to add 100 new OSDs...

2019-07-28 Thread Anthony D'Atri
Paul Emmerich wrote:

> +1 on adding them all at the same time.
> 
> All these methods that gradually increase the weight aren't really
> necessary in newer releases of Ceph.

Because the default backfill/recovery values are lower than they were in, say, 
Dumpling?

Doubling (or more) the size of a cluster in one swoop still means a lot of 
peering and a lot of recovery I/O, I’ve seen a cluster’s data rate go to or 
near 0 for a brief but nonzero length of time.  If something goes wrong with 
the network (cough cough subtle jumbo frame lossage cough) , if one has 
fat-fingered something along the way, etc. going in increments means that a ^C 
lets the cluster stablize before very long.  Then you get to troubleshoot with 
HEALTH_OK instead of HEALTH_WARN or HEALTH_ERR.

Having experienced a cluster be DoS’d for hours when its size was tripled in 
one go, I’m once bitten twice shy.  Yes, that was Dumpling, but even with SSDs 
on Jewel and Luminous I’ve seen sigificant client performance impact from 
en-masse topology changes.

— aad

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to add 100 new OSDs...

2019-07-26 Thread Peter Sabaini
On 26.07.19 15:03, Stefan Kooman wrote:
> Quoting Peter Sabaini (pe...@sabaini.at):
>> What kind of commit/apply latency increases have you seen when adding a
>> large numbers of OSDs? I'm nervous how sensitive workloads might react
>> here, esp. with spinners.
> 
> You mean when there is backfilling going on? Instead of doing "a big

Yes exactly. I usually tune down max rebalance and max recovery active
knobs to lessen impact but still I found the additional write load can
substantially increase i/o latencies. Not all workloads like this.

> bang" you can also use Dan van der Ster's trick with upmap balancer:
> https://github.com/cernceph/ceph-scripts/blob/master/tools/upmap/upmap-remapped.py
> 
> See
> https://www.slideshare.net/Inktank_Ceph/ceph-day-berlin-mastering-ceph-operations-upmap-and-the-mgr-balancer

Thanks, thats interesting -- though I wish it weren't necessary.


cheers,
peter.


> So you would still have norebalance / nobackfill / norecover and ceph
> balancer off. Then you run the script as many times as necessary to get
> "HEALTH_OK" again (on clusters other than nautilus) and there a no more
> PGs remapped. Unset the flags and enable the ceph balancer ... now the
> balancer will slowly move PGs to the new OSDs.
> 
> We've used this trick to increase the number of PGs on a pool, and will
> use this to expand the cluster in the near future.
> 
> This only works if you can use the balancer in "upmap" mode. Note that
> using upmap requires that all clients be Luminous or newer. If you are
> using cephfs kernel client it might report as not compatible (jewel) but
> recent linux distributions work well (Ubuntu 18.04 / CentOS 7).
> 
> Gr. Stefan
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to add 100 new OSDs...

2019-07-26 Thread Stefan Kooman
Quoting Peter Sabaini (pe...@sabaini.at):
> What kind of commit/apply latency increases have you seen when adding a
> large numbers of OSDs? I'm nervous how sensitive workloads might react
> here, esp. with spinners.

You mean when there is backfilling going on? Instead of doing "a big
bang" you can also use Dan van der Ster's trick with upmap balancer:
https://github.com/cernceph/ceph-scripts/blob/master/tools/upmap/upmap-remapped.py

See
https://www.slideshare.net/Inktank_Ceph/ceph-day-berlin-mastering-ceph-operations-upmap-and-the-mgr-balancer

So you would still have norebalance / nobackfill / norecover and ceph
balancer off. Then you run the script as many times as necessary to get
"HEALTH_OK" again (on clusters other than nautilus) and there a no more
PGs remapped. Unset the flags and enable the ceph balancer ... now the
balancer will slowly move PGs to the new OSDs.

We've used this trick to increase the number of PGs on a pool, and will
use this to expand the cluster in the near future.

This only works if you can use the balancer in "upmap" mode. Note that
using upmap requires that all clients be Luminous or newer. If you are
using cephfs kernel client it might report as not compatible (jewel) but
recent linux distributions work well (Ubuntu 18.04 / CentOS 7).

Gr. Stefan

-- 
| BIT BV  https://www.bit.nl/Kamer van Koophandel 09090351
| GPG: 0xD14839C6   +31 318 648 688 / i...@bit.nl
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to add 100 new OSDs...

2019-07-26 Thread Peter Sabaini
What kind of commit/apply latency increases have you seen when adding a
large numbers of OSDs? I'm nervous how sensitive workloads might react
here, esp. with spinners.

cheers,
peter.

On 24.07.19 20:58, Reed Dier wrote:
> Just chiming in to say that this too has been my preferred method for
> adding [large numbers of] OSDs.
> 
> Set the norebalance nobackfill flags.
> Create all the OSDs, and verify everything looks good.
> Make sure my max_backfills, recovery_max_active are as expected.
> Make sure everything has peered.
> Unset flags and let it run.
> 
> One crush map change, one data movement.
> 
> Reed
> 
>>
>> That works, but with newer releases I've been doing this:
>>
>> - Make sure cluster is HEALTH_OK
>> - Set the 'norebalance' flag (and usually nobackfill)
>> - Add all the OSDs
>> - Wait for the PGs to peer. I usually wait a few minutes
>> - Remove the norebalance and nobackfill flag
>> - Wait for HEALTH_OK
>>
>> Wido
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com 
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to add 100 new OSDs...

2019-07-25 Thread Matthew Vernon

On 24/07/2019 20:06, Paul Emmerich wrote:

+1 on adding them all at the same time.

All these methods that gradually increase the weight aren't really 
necessary in newer releases of Ceph.


FWIW, we added a rack-full (9x60 = 540 OSDs) in one go to our production 
cluster (then running Jewel) taking it from 2520 to 3060 OSDs and it 
wasn't a big issue.


Regards,

Matthew



--
The Wellcome Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 
___

ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to add 100 new OSDs...

2019-07-25 Thread zhanrzh...@teamsun.com.cn
Hi,Janne
 Thank you for correcting my mistake.
Maybe the first advice description is unclear,I want to say that add osds into 
one failuer domain at a time ,
so that only one PG  among up set to remap at a time.


--
zhanrzh...@teamsun.com.cn
>Den tors 25 juli 2019 kl 10:47 skrev 展荣臻(信泰) :
>
>>
>> 1、Adding osds in same one failure domain is to ensure only one PG in pg up
>> set (ceph pg dump shows)to remap.
>> 2、Setting "osd_pool_default_min_size=1" is to ensure objects to read/write
>> uninterruptedly while pg remap.
>> Is this wrong?
>>
>
>How did you read the first email where he described how 3 copies was not
>enough, wanting to perhaps go to 4 copies
>to make sure he is not putting data at risk?
>
>The effect you describe is technically correct, it will allow writes to
>pass, but it would also go 100% against what ceph tries to do here, retain
>the data even while doing planned maintenance, even while getting
>unexpected downtime.
>
>Setting min_size=1 means you don't care at all for your data, and that you
>will be placing it under extreme risks.
>
>Not only will that single copy be a danger, but you can easily get into a
>situation where your singlecopy-write gets accepted and then that drive
>gets destroyed, and the cluster will know the latest writes ended up on it,
>and even getting the two older copies back will not help, since it has
>already registered that somewhere there is a newer version. For a single
>object, reverting to older (if possible) isn't all that bad, but for a
>section in the middle of a VM drive, that could mean total disaster.
>
>There are lots of people losing data with 1 copy, lots of posts on how
>repl_size=2, min_size=1 lost data for people using ceph, so I think posting
>advice to that effect goes against what ceph is good for.
>
>Not that I think the original poster would fall into that trap, but others
>might find this post later and think that it would be a good solution to
>maximize risk while adding/rebuilding 100s of OSDs. I don't agree.
>
>
>> Den tors 25 juli 2019 kl 04:36 skrev zhanrzh...@teamsun.com.cn <
>> zhanrzh...@teamsun.com.cn>:
>>
>>> I think it should to set "osd_pool_default_min_size=1" before you add osd
>>> ,
>>> and the osd that you add  at a time  should in same Failure domain.
>>>
>>
>> That sounds like weird or even bad advice?
>> What is the motivation behind it?
>>
>>
>--
>May the most significant bit of your life be positive.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to add 100 new OSDs...

2019-07-25 Thread Janne Johansson
Den tors 25 juli 2019 kl 10:47 skrev 展荣臻(信泰) :

>
> 1、Adding osds in same one failure domain is to ensure only one PG in pg up
> set (ceph pg dump shows)to remap.
> 2、Setting "osd_pool_default_min_size=1" is to ensure objects to read/write
> uninterruptedly while pg remap.
> Is this wrong?
>

How did you read the first email where he described how 3 copies was not
enough, wanting to perhaps go to 4 copies
to make sure he is not putting data at risk?

The effect you describe is technically correct, it will allow writes to
pass, but it would also go 100% against what ceph tries to do here, retain
the data even while doing planned maintenance, even while getting
unexpected downtime.

Setting min_size=1 means you don't care at all for your data, and that you
will be placing it under extreme risks.

Not only will that single copy be a danger, but you can easily get into a
situation where your singlecopy-write gets accepted and then that drive
gets destroyed, and the cluster will know the latest writes ended up on it,
and even getting the two older copies back will not help, since it has
already registered that somewhere there is a newer version. For a single
object, reverting to older (if possible) isn't all that bad, but for a
section in the middle of a VM drive, that could mean total disaster.

There are lots of people losing data with 1 copy, lots of posts on how
repl_size=2, min_size=1 lost data for people using ceph, so I think posting
advice to that effect goes against what ceph is good for.

Not that I think the original poster would fall into that trap, but others
might find this post later and think that it would be a good solution to
maximize risk while adding/rebuilding 100s of OSDs. I don't agree.


> Den tors 25 juli 2019 kl 04:36 skrev zhanrzh...@teamsun.com.cn <
> zhanrzh...@teamsun.com.cn>:
>
>> I think it should to set "osd_pool_default_min_size=1" before you add osd
>> ,
>> and the osd that you add  at a time  should in same Failure domain.
>>
>
> That sounds like weird or even bad advice?
> What is the motivation behind it?
>
>
-- 
May the most significant bit of your life be positive.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to add 100 new OSDs...

2019-07-25 Thread 展荣臻(信泰)

1、Adding osds in same one failure domain is to ensure only one PG in pg up set 
(ceph pg dump shows)to remap.
2、Setting "osd_pool_default_min_size=1" is to ensure objects to read/write 
uninterruptedly while pg remap.
Is this wrong?


-原始邮件-
发件人:"Janne Johansson" 
发送时间:2019-07-25 15:01:37 (星期四)
收件人: "zhanrzh...@teamsun.com.cn" 
抄送: "xavier.trilla" , ceph-users 

主题: Re: [ceph-users] How to add 100 new OSDs...






Den tors 25 juli 2019 kl 04:36 skrev zhanrzh...@teamsun.com.cn 
:

I think it should to set "osd_pool_default_min_size=1" before you add osd ,
and the osd that you add  at a time  should in same Failure domain.


That sounds like weird or even bad advice?
What is the motivation behind it?


--

May the most significant bit of your life be positive.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to add 100 new OSDs...

2019-07-25 Thread Thomas Byrne - UKRI STFC
As a counterpoint, adding large amounts of new hardware in gradually (or more 
specifically in a few steps) has a few benefits IMO.

- Being able to pause the operation and confirm the new hardware (and cluster) 
is operating as expected. You can identify problems with hardware with OSDs at 
10% weight that would be much harder to notice during backfilling, and could 
cause performance issues to the cluster if they ended up with their full 
complement of PGs.

- Breaking up long backfills. For a full cluster with large OSDs, backfills can 
take weeks. I find that letting the mon stores compact, and getting the cluster 
back to health OK is good for my sanity and gives a good stopping point to work 
on other cluster issues. This obviously depends on the cluster fullness and OSD 
size.

I still aim for the smallest amount of steps/work, but an initial crush 
weighting of 10-25% of final weight is a good sanity check of the new hardware, 
and gives a good indication of how to approach the rest of the backfill.

Cheers,
Tom

From: ceph-users  On Behalf Of Paul Emmerich
Sent: 24 July 2019 20:06
To: Reed Dier 
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] How to add 100 new OSDs...

+1 on adding them all at the same time.

All these methods that gradually increase the weight aren't really necessary in 
newer releases of Ceph.

Paul

--
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io<http://www.croit.io>
Tel: +49 89 1896585 90


On Wed, Jul 24, 2019 at 8:59 PM Reed Dier 
mailto:reed.d...@focusvq.com>> wrote:
Just chiming in to say that this too has been my preferred method for adding 
[large numbers of] OSDs.

Set the norebalance nobackfill flags.
Create all the OSDs, and verify everything looks good.
Make sure my max_backfills, recovery_max_active are as expected.
Make sure everything has peered.
Unset flags and let it run.

One crush map change, one data movement.

Reed



That works, but with newer releases I've been doing this:

- Make sure cluster is HEALTH_OK
- Set the 'norebalance' flag (and usually nobackfill)
- Add all the OSDs
- Wait for the PGs to peer. I usually wait a few minutes
- Remove the norebalance and nobackfill flag
- Wait for HEALTH_OK

Wido

___
ceph-users mailing list
ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to add 100 new OSDs...

2019-07-25 Thread Janne Johansson
Den tors 25 juli 2019 kl 04:36 skrev zhanrzh...@teamsun.com.cn <
zhanrzh...@teamsun.com.cn>:

> I think it should to set "osd_pool_default_min_size=1" before you add osd ,
> and the osd that you add  at a time  should in same Failure domain.
>

That sounds like weird or even bad advice?
What is the motivation behind it?

-- 
May the most significant bit of your life be positive.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to add 100 new OSDs...

2019-07-24 Thread Kaspar Bosma

+1 on that. We are going to add 384 OSDs next week to a 2K+ cluster. The proposed solution really works well!KasparOp 24 juli 2019 om 21:06 schreef Paul Emmerich :  +1 on adding them all at the same time.All these methods that gradually increase the weight aren't really necessary in newer releases of Ceph. Paul-- Paul Emmerich  Looking for help with your Ceph cluster? Contact us at https://croit.io  croit GmbH Freseniusstr. 31h 81247 München  www.croit.io Tel: +49 89 1896585 90On Wed, Jul 24, 2019 at 8:59 PM Reed Dier < reed.d...@focusvq.com> wrote: Just chiming in to say that this too has been my preferred method for adding [large numbers of] OSDs.Set the norebalance nobackfill flags.Create all the OSDs, and verify everything looks good.Make sure my max_backfills, recovery_max_active are as expected.Make sure everything has peered.Unset flags and let it run.One crush map change, one data movement.Reed  That works, but with newer releases I've been doing this:   - Make sure cluster is HEALTH_OK  - Set the 'norebalance' flag (and usually nobackfill)  - Add all the OSDs  - Wait for the PGs to peer. I usually wait a few minutes  - Remove the norebalance and nobackfill flag  - Wait for HEALTH_OK   Wido   ___  ceph-users mailing list  ceph-users@lists.ceph.com  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com___  ceph-users mailing list  ceph-users@lists.ceph.com  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to add 100 new OSDs...

2019-07-24 Thread zhanrzh...@teamsun.com.cn
I think it should to set "osd_pool_default_min_size=1" before you add osd ,
and the osd that you add  at a time  should in same Failure domain.



Hi,
What would be the proper way to add 100 new OSDs to a cluster?
I have to add 100 new OSDs to our actual > 300 OSDs cluster, and I would like 
to know how you do it.
Usually, we add them quite slowly. Our cluster is a pure SSD/NVMe one, and it 
can handle plenty of load, but for the sake of safety -it hosts thousands of 
VMs via RBD- we usually add them one by one, waiting for a long time between 
adding each OSD.
Obviously this leads to PLENTY of data movement, as each time the cluster 
geometry changes, data is migrated among all the OSDs. But with the kind of 
load we have, if we add several OSDs at the same time, some PGs can get stuck 
for a while, while they peer to the new OSDs.
Now that I have to add > 100 new OSDs I was wondering if somebody has some 
suggestions.
Thanks!
Xavier.
 



zhanrzh...@teamsun.com.cn
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to add 100 new OSDs...

2019-07-24 Thread Paul Emmerich
+1 on adding them all at the same time.

All these methods that gradually increase the weight aren't really
necessary in newer releases of Ceph.

Paul

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90


On Wed, Jul 24, 2019 at 8:59 PM Reed Dier  wrote:

> Just chiming in to say that this too has been my preferred method for
> adding [large numbers of] OSDs.
>
> Set the norebalance nobackfill flags.
> Create all the OSDs, and verify everything looks good.
> Make sure my max_backfills, recovery_max_active are as expected.
> Make sure everything has peered.
> Unset flags and let it run.
>
> One crush map change, one data movement.
>
> Reed
>
>
> That works, but with newer releases I've been doing this:
>
> - Make sure cluster is HEALTH_OK
> - Set the 'norebalance' flag (and usually nobackfill)
> - Add all the OSDs
> - Wait for the PGs to peer. I usually wait a few minutes
> - Remove the norebalance and nobackfill flag
> - Wait for HEALTH_OK
>
> Wido
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to add 100 new OSDs...

2019-07-24 Thread Reed Dier
Just chiming in to say that this too has been my preferred method for adding 
[large numbers of] OSDs.

Set the norebalance nobackfill flags.
Create all the OSDs, and verify everything looks good.
Make sure my max_backfills, recovery_max_active are as expected.
Make sure everything has peered.
Unset flags and let it run.

One crush map change, one data movement.

Reed

> 
> That works, but with newer releases I've been doing this:
> 
> - Make sure cluster is HEALTH_OK
> - Set the 'norebalance' flag (and usually nobackfill)
> - Add all the OSDs
> - Wait for the PGs to peer. I usually wait a few minutes
> - Remove the norebalance and nobackfill flag
> - Wait for HEALTH_OK
> 
> Wido
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> 


smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to add 100 new OSDs...

2019-07-24 Thread Wido den Hollander



On 7/24/19 7:15 PM, Kevin Hrpcek wrote:
> I often add 50+ OSDs at a time and my cluster is all NLSAS. Here is what
> I do, you can obviously change the weight increase steps to what you are
> comfortable with. This has worked well for me and my workloads. I've
> sometimes seen peering take longer if I do steps too quickly but I don't
> run any mission critical has to be up 100% stuff and I usually don't
> notice if a pg takes a while to peer.
> 
> Add all OSDs with an initial weight of 0. (nothing gets remapped)
> Ensure cluster is healthy.
> Use a for loop to increase weight on all news OSDs to 0.5 with a
> generous sleep between each for peering.
> Let the cluster balance and get healthy or close to healthy.
> Then repeat the previous 2 steps increasing weight by +0.5 or +1.0 until
> I am at the desired weight.

That works, but with newer releases I've been doing this:

- Make sure cluster is HEALTH_OK
- Set the 'norebalance' flag (and usually nobackfill)
- Add all the OSDs
- Wait for the PGs to peer. I usually wait a few minutes
- Remove the norebalance and nobackfill flag
- Wait for HEALTH_OK

Wido

> 
> Kevin
> 
> On 7/24/19 11:44 AM, Xavier Trilla wrote:
>>
>> Hi,
>>
>>  
>>
>> What would be the proper way to add 100 new OSDs to a cluster?
>>
>>  
>>
>> I have to add 100 new OSDs to our actual > 300 OSDs cluster, and I
>> would like to know how you do it.
>>
>>  
>>
>> Usually, we add them quite slowly. Our cluster is a pure SSD/NVMe one,
>> and it can handle plenty of load, but for the sake of safety -it hosts
>> thousands of VMs via RBD- we usually add them one by one, waiting for
>> a long time between adding each OSD.
>>
>>  
>>
>> Obviously this leads to PLENTY of data movement, as each time the
>> cluster geometry changes, data is migrated among all the OSDs. But
>> with the kind of load we have, if we add several OSDs at the same
>> time, some PGs can get stuck for a while, while they peer to the new OSDs.
>>
>>  
>>
>> Now that I have to add > 100 new OSDs I was wondering if somebody has
>> some suggestions.
>>
>>  
>>
>> Thanks!
>>
>> Xavier.
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to add 100 new OSDs...

2019-07-24 Thread Ch Wan
I usually add 20 OSDs each time.
To take control of the influence of backfilling, I will set
primary-affinity to 0 of those new OSDs and adjust backfilling
configurations.
http://docs.ceph.com/docs/master/rados/configuration/osd-config-ref/#backfilling


Kevin Hrpcek  于2019年7月25日周四 上午2:02写道:

> I change the crush weights. My 4 second sleep doesn't let peering finish
> for each one before continuing. I'd test with some small steps to get an
> idea of how much remaps when increasing the weight by $x. I've found my
> cluster is comfortable with +1 increases...also it take awhile to get to a
> weight of 11 if I did anything smaller.
>
> for i in {264..311}; do ceph osd crush reweight osd.${i} 11.0;sleep 4;done
>
> Kevin
>
> On 7/24/19 12:33 PM, Xavier Trilla wrote:
>
> Hi Kevin,
>
> Yeah, that makes a lot of sense, and looks even safer than adding OSDs one
> by one. What do you change, the crush weight? Or the reweight? (I guess you
> change the crush weight, I am right?)
>
> Thanks!
>
>
>
> El 24 jul 2019, a les 19:17, Kevin Hrpcek  va
> escriure:
>
> I often add 50+ OSDs at a time and my cluster is all NLSAS. Here is what I
> do, you can obviously change the weight increase steps to what you are
> comfortable with. This has worked well for me and my workloads. I've
> sometimes seen peering take longer if I do steps too quickly but I don't
> run any mission critical has to be up 100% stuff and I usually don't notice
> if a pg takes a while to peer.
>
> Add all OSDs with an initial weight of 0. (nothing gets remapped)
> Ensure cluster is healthy.
> Use a for loop to increase weight on all news OSDs to 0.5 with a generous
> sleep between each for peering.
> Let the cluster balance and get healthy or close to healthy.
> Then repeat the previous 2 steps increasing weight by +0.5 or +1.0 until I
> am at the desired weight.
>
> Kevin
>
> On 7/24/19 11:44 AM, Xavier Trilla wrote:
>
> Hi,
>
>
>
> What would be the proper way to add 100 new OSDs to a cluster?
>
>
>
> I have to add 100 new OSDs to our actual > 300 OSDs cluster, and I would
> like to know how you do it.
>
>
>
> Usually, we add them quite slowly. Our cluster is a pure SSD/NVMe one, and
> it can handle plenty of load, but for the sake of safety -it hosts
> thousands of VMs via RBD- we usually add them one by one, waiting for a
> long time between adding each OSD.
>
>
>
> Obviously this leads to PLENTY of data movement, as each time the cluster
> geometry changes, data is migrated among all the OSDs. But with the kind of
> load we have, if we add several OSDs at the same time, some PGs can get
> stuck for a while, while they peer to the new OSDs.
>
>
>
> Now that I have to add > 100 new OSDs I was wondering if somebody has some
> suggestions.
>
>
>
> Thanks!
>
> Xavier.
>
> ___
> ceph-users mailing 
> listceph-us...@lists.ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to add 100 new OSDs...

2019-07-24 Thread Kevin Hrpcek
I change the crush weights. My 4 second sleep doesn't let peering finish for 
each one before continuing. I'd test with some small steps to get an idea of 
how much remaps when increasing the weight by $x. I've found my cluster is 
comfortable with +1 increases...also it take awhile to get to a weight of 11 if 
I did anything smaller.

for i in {264..311}; do ceph osd crush reweight osd.${i} 11.0;sleep 4;done

Kevin

On 7/24/19 12:33 PM, Xavier Trilla wrote:
Hi Kevin,

Yeah, that makes a lot of sense, and looks even safer than adding OSDs one by 
one. What do you change, the crush weight? Or the reweight? (I guess you change 
the crush weight, I am right?)

Thanks!



El 24 jul 2019, a les 19:17, Kevin Hrpcek 
mailto:kevin.hrp...@ssec.wisc.edu>> va escriure:

I often add 50+ OSDs at a time and my cluster is all NLSAS. Here is what I do, 
you can obviously change the weight increase steps to what you are comfortable 
with. This has worked well for me and my workloads. I've sometimes seen peering 
take longer if I do steps too quickly but I don't run any mission critical has 
to be up 100% stuff and I usually don't notice if a pg takes a while to peer.

Add all OSDs with an initial weight of 0. (nothing gets remapped)
Ensure cluster is healthy.
Use a for loop to increase weight on all news OSDs to 0.5 with a generous sleep 
between each for peering.
Let the cluster balance and get healthy or close to healthy.
Then repeat the previous 2 steps increasing weight by +0.5 or +1.0 until I am 
at the desired weight.

Kevin

On 7/24/19 11:44 AM, Xavier Trilla wrote:
Hi,

What would be the proper way to add 100 new OSDs to a cluster?

I have to add 100 new OSDs to our actual > 300 OSDs cluster, and I would like 
to know how you do it.

Usually, we add them quite slowly. Our cluster is a pure SSD/NVMe one, and it 
can handle plenty of load, but for the sake of safety -it hosts thousands of 
VMs via RBD- we usually add them one by one, waiting for a long time between 
adding each OSD.

Obviously this leads to PLENTY of data movement, as each time the cluster 
geometry changes, data is migrated among all the OSDs. But with the kind of 
load we have, if we add several OSDs at the same time, some PGs can get stuck 
for a while, while they peer to the new OSDs.

Now that I have to add > 100 new OSDs I was wondering if somebody has some 
suggestions.

Thanks!
Xavier.



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to add 100 new OSDs...

2019-07-24 Thread Xavier Trilla
Hi Kevin,

Yeah, that makes a lot of sense, and looks even safer than adding OSDs one by 
one. What do you change, the crush weight? Or the reweight? (I guess you change 
the crush weight, I am right?)

Thanks!



El 24 jul 2019, a les 19:17, Kevin Hrpcek 
mailto:kevin.hrp...@ssec.wisc.edu>> va escriure:

I often add 50+ OSDs at a time and my cluster is all NLSAS. Here is what I do, 
you can obviously change the weight increase steps to what you are comfortable 
with. This has worked well for me and my workloads. I've sometimes seen peering 
take longer if I do steps too quickly but I don't run any mission critical has 
to be up 100% stuff and I usually don't notice if a pg takes a while to peer.

Add all OSDs with an initial weight of 0. (nothing gets remapped)
Ensure cluster is healthy.
Use a for loop to increase weight on all news OSDs to 0.5 with a generous sleep 
between each for peering.
Let the cluster balance and get healthy or close to healthy.
Then repeat the previous 2 steps increasing weight by +0.5 or +1.0 until I am 
at the desired weight.

Kevin

On 7/24/19 11:44 AM, Xavier Trilla wrote:
Hi,

What would be the proper way to add 100 new OSDs to a cluster?

I have to add 100 new OSDs to our actual > 300 OSDs cluster, and I would like 
to know how you do it.

Usually, we add them quite slowly. Our cluster is a pure SSD/NVMe one, and it 
can handle plenty of load, but for the sake of safety -it hosts thousands of 
VMs via RBD- we usually add them one by one, waiting for a long time between 
adding each OSD.

Obviously this leads to PLENTY of data movement, as each time the cluster 
geometry changes, data is migrated among all the OSDs. But with the kind of 
load we have, if we add several OSDs at the same time, some PGs can get stuck 
for a while, while they peer to the new OSDs.

Now that I have to add > 100 new OSDs I was wondering if somebody has some 
suggestions.

Thanks!
Xavier.



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to add 100 new OSDs...

2019-07-24 Thread Kevin Hrpcek
I often add 50+ OSDs at a time and my cluster is all NLSAS. Here is what I do, 
you can obviously change the weight increase steps to what you are comfortable 
with. This has worked well for me and my workloads. I've sometimes seen peering 
take longer if I do steps too quickly but I don't run any mission critical has 
to be up 100% stuff and I usually don't notice if a pg takes a while to peer.

Add all OSDs with an initial weight of 0. (nothing gets remapped)
Ensure cluster is healthy.
Use a for loop to increase weight on all news OSDs to 0.5 with a generous sleep 
between each for peering.
Let the cluster balance and get healthy or close to healthy.
Then repeat the previous 2 steps increasing weight by +0.5 or +1.0 until I am 
at the desired weight.

Kevin

On 7/24/19 11:44 AM, Xavier Trilla wrote:
Hi,

What would be the proper way to add 100 new OSDs to a cluster?

I have to add 100 new OSDs to our actual > 300 OSDs cluster, and I would like 
to know how you do it.

Usually, we add them quite slowly. Our cluster is a pure SSD/NVMe one, and it 
can handle plenty of load, but for the sake of safety -it hosts thousands of 
VMs via RBD- we usually add them one by one, waiting for a long time between 
adding each OSD.

Obviously this leads to PLENTY of data movement, as each time the cluster 
geometry changes, data is migrated among all the OSDs. But with the kind of 
load we have, if we add several OSDs at the same time, some PGs can get stuck 
for a while, while they peer to the new OSDs.

Now that I have to add > 100 new OSDs I was wondering if somebody has some 
suggestions.

Thanks!
Xavier.



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com