Re: [ceph-users] Help/advice with crush rules

2018-05-21 Thread Gregory Farnum
On Mon, May 21, 2018 at 11:19 AM Andras Pataki <
apat...@flatironinstitute.org> wrote:

> Hi Greg,
>
> Thanks for the detailed explanation - the examples make a lot of sense.
>
> One followup question regarding a two level crush rule like:
>
>
> step take default
> step choose 3 type=rack
> step chooseleaf 3 type=host
> step emit
>
> If the erasure code has 9 chunks, this lines up exactly without any
> problems.  What if the erasure code isn't a nice product of the racks and
> hosts/rack, for example 6+2 with the above example?  Will it just take 3
> chunks in the first two racks and 2 from the last without any issues?
>

Yes, assuming your ceph install is new enough. (At one point it crashed if
you did that :o)



The other direction I presume can't work, i.e. on the above example I can't
> put any erasure code with more than 9 chunks.
>

Right


>
> Andras
>
>
>
> On 05/18/2018 06:30 PM, Gregory Farnum wrote:
>
> On Thu, May 17, 2018 at 9:05 AM Andras Pataki <
> apat...@flatironinstitute.org> wrote:
>
>> I've been trying to wrap my head around crush rules, and I need some
>> help/advice.  I'm thinking of using erasure coding instead of
>> replication, and trying to understand the possibilities for planning for
>> failure cases.
>>
>> For a simplified example, consider a 2 level topology, OSDs live on
>> hosts, and hosts live in racks.  I'd like to set up a rule for a 6+3
>> erasure code that would put at most 1 of the 9 chunks on a host, and no
>> more than 3 chunks in a rack (so in case the rack is lost, we still have
>> a way to recover).  Some racks may not have 3 hosts in them, so they
>> could potentially accept only 1 or 2 chunks then.  How can something
>> like this be implemented as a crush rule?  Or, if not exactly this,
>> something in this spirit?  I don't want to say that all chunks need to
>> live in a separate rack because that is too restrictive (some racks may
>> be much bigger than others, or there might not even be 9 racks).
>>
>
> Unfortunately what you describe here is a little too detailed in ways
> CRUSH can't easily specify. You should think of a CRUSH rule as a sequence
> of steps that start out at a root (the "take" step), and incrementally
> specify more detail about which piece of the CRUSH hierarchy they run on,
> but run the *same* rule on every piece they select.
>
> So the simplest thing that comes close to what you suggest is:
> (forgive me if my syntax is slightly off, I'm doing this from memory)
> step take default
> step chooseleaf n type=rack
> step emit
>
> That would start at the default root, select "n" racks (9, in your case)
> and then for each rack find an OSD within it. (chooseleaf is special and
> more flexibly than most of the CRUSH language; it's nice because if it
> can't find an OSD in one of the selected racks, it will pick another rack).
> But a rule that's more illustrative of how things work is:
> step take default
> step choose 3 type=rack
> step chooseleaf 3 type=host
> step emit
>
> That one selects three racks, then selects three OSDs within different
> hosts *in each rack*. (You'll note that it doesn't necessarily work out so
> well if you don't want 9 OSDs!) If one of the racks it selected doesn't
> have 3 separate hosts...well, tough, it tried to do what you told it. :/
>
> If you were dedicated, you could split up your racks into
> equivalently-sized units — let's say rows. Then you could do
> step take default
> step choose 3 type=row
> step chooseleaf 3 type=host
> step emit
>
> Assuming you have 3+ rows of good size, that'll get you 9 OSDs which are
> all on different hosts.
> -Greg
>
>
>>
>> Thanks,
>>
>> Andras
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Help/advice with crush rules

2018-05-21 Thread Andras Pataki

Hi Greg,

Thanks for the detailed explanation - the examples make a lot of sense.

One followup question regarding a two level crush rule like:

step take default
step choose 3 type=rack
step chooseleaf 3 type=host
step emit

If the erasure code has 9 chunks, this lines up exactly without any 
problems.  What if the erasure code isn't a nice product of the racks 
and hosts/rack, for example 6+2 with the above example?  Will it just 
take 3 chunks in the first two racks and 2 from the last without any 
issues?  The other direction I presume can't work, i.e. on the above 
example I can't put any erasure code with more than 9 chunks.


Andras


On 05/18/2018 06:30 PM, Gregory Farnum wrote:
On Thu, May 17, 2018 at 9:05 AM Andras Pataki 
mailto:apat...@flatironinstitute.org>> 
wrote:


I've been trying to wrap my head around crush rules, and I need some
help/advice.  I'm thinking of using erasure coding instead of
replication, and trying to understand the possibilities for
planning for
failure cases.

For a simplified example, consider a 2 level topology, OSDs live on
hosts, and hosts live in racks.  I'd like to set up a rule for a 6+3
erasure code that would put at most 1 of the 9 chunks on a host,
and no
more than 3 chunks in a rack (so in case the rack is lost, we
still have
a way to recover).  Some racks may not have 3 hosts in them, so they
could potentially accept only 1 or 2 chunks then.  How can something
like this be implemented as a crush rule?  Or, if not exactly this,
something in this spirit?  I don't want to say that all chunks
need to
live in a separate rack because that is too restrictive (some
racks may
be much bigger than others, or there might not even be 9 racks).


Unfortunately what you describe here is a little too detailed in ways 
CRUSH can't easily specify. You should think of a CRUSH rule as a 
sequence of steps that start out at a root (the "take" step), and 
incrementally specify more detail about which piece of the CRUSH 
hierarchy they run on, but run the *same* rule on every piece they select.


So the simplest thing that comes close to what you suggest is:
(forgive me if my syntax is slightly off, I'm doing this from memory)
step take default
step chooseleaf n type=rack
step emit

That would start at the default root, select "n" racks (9, in your 
case) and then for each rack find an OSD within it. (chooseleaf is 
special and more flexibly than most of the CRUSH language; it's nice 
because if it can't find an OSD in one of the selected racks, it will 
pick another rack).

But a rule that's more illustrative of how things work is:
step take default
step choose 3 type=rack
step chooseleaf 3 type=host
step emit

That one selects three racks, then selects three OSDs within different 
hosts *in each rack*. (You'll note that it doesn't necessarily work 
out so well if you don't want 9 OSDs!) If one of the racks it selected 
doesn't have 3 separate hosts...well, tough, it tried to do what you 
told it. :/


If you were dedicated, you could split up your racks into 
equivalently-sized units — let's say rows. Then you could do

step take default
step choose 3 type=row
step chooseleaf 3 type=host
step emit

Assuming you have 3+ rows of good size, that'll get you 9 OSDs which 
are all on different hosts.

-Greg


Thanks,

Andras

___
ceph-users mailing list
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Help/advice with crush rules

2018-05-18 Thread Gregory Farnum
On Thu, May 17, 2018 at 9:05 AM Andras Pataki 
wrote:

> I've been trying to wrap my head around crush rules, and I need some
> help/advice.  I'm thinking of using erasure coding instead of
> replication, and trying to understand the possibilities for planning for
> failure cases.
>
> For a simplified example, consider a 2 level topology, OSDs live on
> hosts, and hosts live in racks.  I'd like to set up a rule for a 6+3
> erasure code that would put at most 1 of the 9 chunks on a host, and no
> more than 3 chunks in a rack (so in case the rack is lost, we still have
> a way to recover).  Some racks may not have 3 hosts in them, so they
> could potentially accept only 1 or 2 chunks then.  How can something
> like this be implemented as a crush rule?  Or, if not exactly this,
> something in this spirit?  I don't want to say that all chunks need to
> live in a separate rack because that is too restrictive (some racks may
> be much bigger than others, or there might not even be 9 racks).
>

Unfortunately what you describe here is a little too detailed in ways CRUSH
can't easily specify. You should think of a CRUSH rule as a sequence of
steps that start out at a root (the "take" step), and incrementally specify
more detail about which piece of the CRUSH hierarchy they run on, but run
the *same* rule on every piece they select.

So the simplest thing that comes close to what you suggest is:
(forgive me if my syntax is slightly off, I'm doing this from memory)
step take default
step chooseleaf n type=rack
step emit

That would start at the default root, select "n" racks (9, in your case)
and then for each rack find an OSD within it. (chooseleaf is special and
more flexibly than most of the CRUSH language; it's nice because if it
can't find an OSD in one of the selected racks, it will pick another rack).
But a rule that's more illustrative of how things work is:
step take default
step choose 3 type=rack
step chooseleaf 3 type=host
step emit

That one selects three racks, then selects three OSDs within different
hosts *in each rack*. (You'll note that it doesn't necessarily work out so
well if you don't want 9 OSDs!) If one of the racks it selected doesn't
have 3 separate hosts...well, tough, it tried to do what you told it. :/

If you were dedicated, you could split up your racks into
equivalently-sized units — let's say rows. Then you could do
step take default
step choose 3 type=row
step chooseleaf 3 type=host
step emit

Assuming you have 3+ rows of good size, that'll get you 9 OSDs which are
all on different hosts.
-Greg


>
> Thanks,
>
> Andras
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Help/advice with crush rules

2018-05-17 Thread Andras Pataki
I've been trying to wrap my head around crush rules, and I need some 
help/advice.  I'm thinking of using erasure coding instead of 
replication, and trying to understand the possibilities for planning for 
failure cases.


For a simplified example, consider a 2 level topology, OSDs live on 
hosts, and hosts live in racks.  I'd like to set up a rule for a 6+3 
erasure code that would put at most 1 of the 9 chunks on a host, and no 
more than 3 chunks in a rack (so in case the rack is lost, we still have 
a way to recover).  Some racks may not have 3 hosts in them, so they 
could potentially accept only 1 or 2 chunks then.  How can something 
like this be implemented as a crush rule?  Or, if not exactly this, 
something in this spirit?  I don't want to say that all chunks need to 
live in a separate rack because that is too restrictive (some racks may 
be much bigger than others, or there might not even be 9 racks).


Thanks,

Andras

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com