Re: [ceph-users] add multiple OSDs to cluster

2017-03-22 Thread mj

Hi Jonathan, Anthony and Steve,

Thanks very much for your valuable advise and suggestions!

MJ

On 03/21/2017 08:53 PM, Jonathan Proulx wrote:




If it took 7hr for one drive you probably already done this (or
defaults are for low impact recovery) but before doing anything you
want to besure you OSD settings max backfills, max recovery active,
recovery sleep (perhaps others?) are set such that revovery and
backfilling doesn't overwhelm produciton use.

look through the recovery section of
http://docs.ceph.com/docs/master/rados/configuration/osd-config-ref/

This is important because if you do have a failure and thus unplanned
recovery you want to have this tuned to your prefered balance of
quick performance or quick return to full redundancy.

That said my theory is to add things in as balanced a way as possible to
minimize moves.

What that means depends on your crush map.

For me I have 3 "racks" and all (most) of my pools are 3x replication
so each object should have one copy in each rack.

I've only expanded once, but what I did was to add three servers.  One
to each 'rack'.  I set them all 'in' at the same time which should
have minimized movement between racks and moved obbjects from other
servers' osds in the same rack onto the osds in the new server.  This
seemed to work well for me.

In your case this would mean adding drives to all servers at once in a
balanced way.  That would prevent copy across servers since the
balance amoung servers wouldn't change.

You could do one disk on each server or load them all up and trust
recovery settings to keep the thundering herd in check.

As I said I've only gone through one expantion round and while this
theory seemed to work out for me hopefully someone with deeper
knowlege can confirm or deny it's general applicability.

-Jon

On Tue, Mar 21, 2017 at 07:56:57PM +0100, mj wrote:
:Hi,
:
:Just a quick question about adding OSDs, since most of the docs I can find
:talk about adding ONE OSD, and I'd like to add four per server on my
:three-node cluster.
:
:This morning I tried the careful approach, and added one OSD to server1. It
:all went fine, everything rebuilt and I have a HEALTH_OK again now. It took
:around 7 hours.
:
:But now I started thinking... (and that's when things go wrong, therefore
:hoping for feedback here)
:
:The question: was I being stupid to add only ONE osd to the server1? Is it
:not smarter to add all four OSDs at the same time?
:
:I mean: things will rebuild anyway...and I have the feeling that rebuilding
:from 4 -> 8 OSDs is not going to be much heavier than rebuilding from 4 -> 5
:OSDs. Right?
:
:So better add all new OSDs together on a specific server?
:
:Or not? :-)
:
:MJ
:___
:ceph-users mailing list
:ceph-users@lists.ceph.com
:http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] add multiple OSDs to cluster

2017-03-21 Thread Anthony D'Atri
Deploying or removing OSD’s in parallel for sure can save elapsed time and 
avoid moving data more than once.  There are certain pitfalls, though, and the 
strategy needs careful planning.

- Deploying a new OSD at full weight means a lot of write operations.  Running 
multiple whole-OSD backfills to a single host can — depending on your situation 
— saturate the HBA, resulting in slow requests. 
- Judicious setting of norebalance/norecover can help somewhat, to give the 
affected OSD’s/ PG’s time to peer and become ready before shoving data at them
- Deploying at 0 CRUSH weight and incrementally ratcheting up the weight as 
PG’s peer can spread that out
- I’ve recently seen the idea of temporarily setting primary-affinity to 0 on 
the affected OSD’s to deflect some competing traffic as well
- One workaround is that if you have OSD’s to deploy on more than one server, 
you could deploy them in batches of say 1-2 on each server, striping them if 
you will.  That diffuses the impact and results in faster elapsed recovery

As for how many is safe to do in parallel, there are multiple variables there.  
HDD vs SSD, client workload.  And especially how many other OSD’s are in the 
same logical rack/host.  On a cluster of 450 OSD’s, with 150 in each logical 
rack, each OSD is less than 1% of a rack, so deploying 4 of them at once would 
not be a massive change.  However in a smaller cluster with say 45 OSD’s, 15 in 
each rack, that would tickle a much larger fraction of the cluster and be more 
disruptive.

If the numbers below are TOTALS, if you would be expanding your cluster from a 
total of 4 OSD’s to a total of 8, that would be something I wouldn’t do, having 
experienced under Dumpling what it was like to triple the size of a certain 
cluster in one swoop.  

So one approach is trial and error to see how many you can get away with before 
you get slow requests, then backing off.  In production of course this is 
playing with fire. Depending on which release you’re running, cranking down a 
common set of backfill/recovery tunable can help mitigate the thundering herd 
effect as well.

— aad

> This morning I tried the careful approach, and added one OSD to server1. 
> It all went fine, everything rebuilt and I have a HEALTH_OK again now. 
> It took around 7 hours.
> 
> But now I started thinking... (and that's when things go wrong, 
> therefore hoping for feedback here)
> 
> The question: was I being stupid to add only ONE osd to the server1? Is 
> it not smarter to add all four OSDs at the same time?
> 
> I mean: things will rebuild anyway...and I have the feeling that 
> rebuilding from 4 -> 8 OSDs is not going to be much heavier than 
> rebuilding from 4 -> 5 OSDs. Right?
> 
> So better add all new OSDs together on a specific server?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] add multiple OSDs to cluster

2017-03-21 Thread Jonathan Proulx



If it took 7hr for one drive you probably already done this (or
defaults are for low impact recovery) but before doing anything you
want to besure you OSD settings max backfills, max recovery active,
recovery sleep (perhaps others?) are set such that revovery and
backfilling doesn't overwhelm produciton use.

look through the recovery section of
http://docs.ceph.com/docs/master/rados/configuration/osd-config-ref/ 

This is important because if you do have a failure and thus unplanned
recovery you want to have this tuned to your prefered balance of
quick performance or quick return to full redundancy. 

That said my theory is to add things in as balanced a way as possible to
minimize moves.

What that means depends on your crush map.

For me I have 3 "racks" and all (most) of my pools are 3x replication
so each object should have one copy in each rack.

I've only expanded once, but what I did was to add three servers.  One
to each 'rack'.  I set them all 'in' at the same time which should
have minimized movement between racks and moved obbjects from other
servers' osds in the same rack onto the osds in the new server.  This
seemed to work well for me.

In your case this would mean adding drives to all servers at once in a
balanced way.  That would prevent copy across servers since the
balance amoung servers wouldn't change.

You could do one disk on each server or load them all up and trust
recovery settings to keep the thundering herd in check.

As I said I've only gone through one expantion round and while this
theory seemed to work out for me hopefully someone with deeper
knowlege can confirm or deny it's general applicability.

-Jon

On Tue, Mar 21, 2017 at 07:56:57PM +0100, mj wrote:
:Hi,
:
:Just a quick question about adding OSDs, since most of the docs I can find
:talk about adding ONE OSD, and I'd like to add four per server on my
:three-node cluster.
:
:This morning I tried the careful approach, and added one OSD to server1. It
:all went fine, everything rebuilt and I have a HEALTH_OK again now. It took
:around 7 hours.
:
:But now I started thinking... (and that's when things go wrong, therefore
:hoping for feedback here)
:
:The question: was I being stupid to add only ONE osd to the server1? Is it
:not smarter to add all four OSDs at the same time?
:
:I mean: things will rebuild anyway...and I have the feeling that rebuilding
:from 4 -> 8 OSDs is not going to be much heavier than rebuilding from 4 -> 5
:OSDs. Right?
:
:So better add all new OSDs together on a specific server?
:
:Or not? :-)
:
:MJ
:___
:ceph-users mailing list
:ceph-users@lists.ceph.com
:http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] add multiple OSDs to cluster

2017-03-21 Thread Steve Taylor
Generally speaking, you are correct. Adding more OSDs at once is more
efficient than adding fewer at a time.

That being said, do so carefully. We typically add OSDs to our clusters
either 32 or 64 at once, and we have had issues on occasion with bad
drives. It's common for us to have a drive or two go bad within 24
hours or so of adding them to Ceph, and if multiple drives fail in
multiple failure domains within a short amount of time, bad things can
happen. The efficient, safe approach is to add as many drives as
possible within a single failure domain, wait for recovery, and repeat.

On Tue, 2017-03-21 at 19:56 +0100, mj wrote:
> Hi,
>
> Just a quick question about adding OSDs, since most of the docs I
> can
> find talk about adding ONE OSD, and I'd like to add four per server
> on
> my three-node cluster.
>
> This morning I tried the careful approach, and added one OSD to
> server1.
> It all went fine, everything rebuilt and I have a HEALTH_OK again
> now.
> It took around 7 hours.
>
> But now I started thinking... (and that's when things go wrong,
> therefore hoping for feedback here)
>
> The question: was I being stupid to add only ONE osd to the server1?
> Is
> it not smarter to add all four OSDs at the same time?
>
> I mean: things will rebuild anyway...and I have the feeling that
> rebuilding from 4 -> 8 OSDs is not going to be much heavier than
> rebuilding from 4 -> 5 OSDs. Right?
>
> So better add all new OSDs together on a specific server?
>
> Or not? :-)
>
> MJ
>



[cid:imagec7a5fc.JPG@dc945914.44a32fb5]   Steve 
Taylor | Senior Software Engineer | StorageCraft Technology 
Corporation
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 |



If you are not the intended recipient of this message or received it 
erroneously, please notify the sender and delete it, together with any 
attachments, and be advised that any dissemination or copying of this message 
is prohibited.



___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] add multiple OSDs to cluster

2017-03-21 Thread mj

Hi,

Just a quick question about adding OSDs, since most of the docs I can 
find talk about adding ONE OSD, and I'd like to add four per server on 
my three-node cluster.


This morning I tried the careful approach, and added one OSD to server1. 
It all went fine, everything rebuilt and I have a HEALTH_OK again now. 
It took around 7 hours.


But now I started thinking... (and that's when things go wrong, 
therefore hoping for feedback here)


The question: was I being stupid to add only ONE osd to the server1? Is 
it not smarter to add all four OSDs at the same time?


I mean: things will rebuild anyway...and I have the feeling that 
rebuilding from 4 -> 8 OSDs is not going to be much heavier than 
rebuilding from 4 -> 5 OSDs. Right?


So better add all new OSDs together on a specific server?

Or not? :-)

MJ
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com