Re: [ceph-users] Upper limit of MONs and MDSs in a Cluster

2017-05-25 Thread Gregory Farnum
You absolutely cannot do this with your monitors -- as David says every
node would have to participate in every monitor decision; the long tails
would be horrifying and I expect it would collapse in ignominious defeat
very quickly.

Your MDSes should be fine since they are indeed just a bunch of standby
daemons at that point. You'd want to consider how that fits with your RAM
requirements though; it's probably not a good deployment decision even
though it would work at the daemon level.
-Greg


On Thu, May 25, 2017 at 8:30 AM David Turner  wrote:

> For the MDS, the primary doesn't hold state data that needs to be replayed
> to a standby.  The information exists in the cluster.  Your setup would be
> 1 Active, 100 Standby.  If the active went down, 1 of the standby's would
> be promoted and read the information from the cluster.
>
> With Mons, it's interesting because of the quorum mechanics.  4 mons is
> worse than 3 mons because of the chance for split brain where 2 of them
> think something is right and the other 2 think it's wrong.  You have no tie
> breaking vote.  Odd numbers are always best and it seems like your proposal
> would regularly have an even number of Mons.  I haven't heard of a
> deployment with more than 5 mons.  I would imagine there are some with 7
> mons out there, but it's not worth the hardware expense in 99.999% of cases.
>
> I'm assuming your question comes from a place of wanting to have 1
> configuration to rule them all and not have multiple types of nodes in your
> ceph deployment scripts.  Just put in the time and do it right.  Have MDS
> servers, have Mons, have OSD nodes, etc.  Once you reach scale, your mons
> are going to need their resources, your OSDs are going to need theirs, your
> RGW will be using more bandwidth, ad infinitum.  That isn't to mention all
> of the RAM that the services will need during any recovery (assume 3x
> memory requirements for most Ceph services when recovering.
>
> Hyper converged clusters are not recommended for production deployments.
> Several people use them, but generally for smaller clusters.  By the time
> you reach dozens and hundreds of servers, you will only cause yourself
> headaches by becoming the special snowflake in the community.  Every time
> you have a problem, the first place to look will be your resource
> contention between Ceph daemons.
>
>
> Back to some of your direct questions.  Not having tested this, but using
> educated guesses... A possible complication of having 100's of Mons would
> be that they all have to agree on a new map causing a LOT more
> communication between your mons which could likely lead to a bottleneck for
> map updates (snapshot creation/deletion, osds going up/down, scrubs
> happening, anything that affects data in a map).  When an MDS fails, I
> don't know how the voting would go for choosing a new Active MDS among 100
> Stand-by's.  That could either go very quickly or take quite a bit longer
> depending on the logic behind the choice.  100's of RGW servers behind an
> LB (I'm assuming) would negate any caching that is happening on the RGW
> servers as multiple accesses to the same file will not likely reach the
> same RGW.
>
> On Thu, May 25, 2017 at 10:40 AM Wes Dillingham <
> wes_dilling...@harvard.edu> wrote:
>
>> How much testing has there been / what are the implications of having a
>> large number of Monitor and Metadata daemons running in a cluster.
>>
>> Thus far I  have deployed all of our Ceph clusters as a single service
>> type per physical machine but I am interested in a use case where we deploy
>> dozens/hundreds? of boxes each of which would be a mon,mds,mgr,osd,rgw all
>> in one and all a single cluster. I do realize it is somewhat trivial (with
>> config mgmt and all) to dedicate a couple of lean boxes as MDS's and MONs
>> and only expand at the OSD level but I'm still curious.
>>
>> My use case in mind is for backup targets where pools span the entire
>> cluster and am looking to streamline the process for possible rack and
>> stack situations where boxes can just be added in place booted up and they
>> auto-join the cluster as a mon/mds/mgr/osd/rgw.
>>
>> So does anyone run clusters with dozen's of MONs' and/or MDS or aware of
>> any testing with very high numbers of each? At the MDS level I would just
>> be looking for 1 Active, 1 Standby-replay and X standby until multiple
>> active MDSs are production ready. Thanks!
>>
>> --
>> Respectfully,
>>
>> Wes Dillingham
>> wes_dilling...@harvard.edu
>> Research Computing | Infrastructure Engineer
>> Harvard University | 38 Oxford Street, Cambridge, Ma 02138 | Room 102
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

Re: [ceph-users] Upper limit of MONs and MDSs in a Cluster

2017-05-25 Thread David Turner
For the MDS, the primary doesn't hold state data that needs to be replayed
to a standby.  The information exists in the cluster.  Your setup would be
1 Active, 100 Standby.  If the active went down, 1 of the standby's would
be promoted and read the information from the cluster.

With Mons, it's interesting because of the quorum mechanics.  4 mons is
worse than 3 mons because of the chance for split brain where 2 of them
think something is right and the other 2 think it's wrong.  You have no tie
breaking vote.  Odd numbers are always best and it seems like your proposal
would regularly have an even number of Mons.  I haven't heard of a
deployment with more than 5 mons.  I would imagine there are some with 7
mons out there, but it's not worth the hardware expense in 99.999% of cases.

I'm assuming your question comes from a place of wanting to have 1
configuration to rule them all and not have multiple types of nodes in your
ceph deployment scripts.  Just put in the time and do it right.  Have MDS
servers, have Mons, have OSD nodes, etc.  Once you reach scale, your mons
are going to need their resources, your OSDs are going to need theirs, your
RGW will be using more bandwidth, ad infinitum.  That isn't to mention all
of the RAM that the services will need during any recovery (assume 3x
memory requirements for most Ceph services when recovering.

Hyper converged clusters are not recommended for production deployments.
Several people use them, but generally for smaller clusters.  By the time
you reach dozens and hundreds of servers, you will only cause yourself
headaches by becoming the special snowflake in the community.  Every time
you have a problem, the first place to look will be your resource
contention between Ceph daemons.


Back to some of your direct questions.  Not having tested this, but using
educated guesses... A possible complication of having 100's of Mons would
be that they all have to agree on a new map causing a LOT more
communication between your mons which could likely lead to a bottleneck for
map updates (snapshot creation/deletion, osds going up/down, scrubs
happening, anything that affects data in a map).  When an MDS fails, I
don't know how the voting would go for choosing a new Active MDS among 100
Stand-by's.  That could either go very quickly or take quite a bit longer
depending on the logic behind the choice.  100's of RGW servers behind an
LB (I'm assuming) would negate any caching that is happening on the RGW
servers as multiple accesses to the same file will not likely reach the
same RGW.

On Thu, May 25, 2017 at 10:40 AM Wes Dillingham 
wrote:

> How much testing has there been / what are the implications of having a
> large number of Monitor and Metadata daemons running in a cluster.
>
> Thus far I  have deployed all of our Ceph clusters as a single service
> type per physical machine but I am interested in a use case where we deploy
> dozens/hundreds? of boxes each of which would be a mon,mds,mgr,osd,rgw all
> in one and all a single cluster. I do realize it is somewhat trivial (with
> config mgmt and all) to dedicate a couple of lean boxes as MDS's and MONs
> and only expand at the OSD level but I'm still curious.
>
> My use case in mind is for backup targets where pools span the entire
> cluster and am looking to streamline the process for possible rack and
> stack situations where boxes can just be added in place booted up and they
> auto-join the cluster as a mon/mds/mgr/osd/rgw.
>
> So does anyone run clusters with dozen's of MONs' and/or MDS or aware of
> any testing with very high numbers of each? At the MDS level I would just
> be looking for 1 Active, 1 Standby-replay and X standby until multiple
> active MDSs are production ready. Thanks!
>
> --
> Respectfully,
>
> Wes Dillingham
> wes_dilling...@harvard.edu
> Research Computing | Infrastructure Engineer
> Harvard University | 38 Oxford Street, Cambridge, Ma 02138 | Room 102
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Upper limit of MONs and MDSs in a Cluster

2017-05-25 Thread Wes Dillingham
How much testing has there been / what are the implications of having a
large number of Monitor and Metadata daemons running in a cluster.

Thus far I  have deployed all of our Ceph clusters as a single service type
per physical machine but I am interested in a use case where we deploy
dozens/hundreds? of boxes each of which would be a mon,mds,mgr,osd,rgw all
in one and all a single cluster. I do realize it is somewhat trivial (with
config mgmt and all) to dedicate a couple of lean boxes as MDS's and MONs
and only expand at the OSD level but I'm still curious.

My use case in mind is for backup targets where pools span the entire
cluster and am looking to streamline the process for possible rack and
stack situations where boxes can just be added in place booted up and they
auto-join the cluster as a mon/mds/mgr/osd/rgw.

So does anyone run clusters with dozen's of MONs' and/or MDS or aware of
any testing with very high numbers of each? At the MDS level I would just
be looking for 1 Active, 1 Standby-replay and X standby until multiple
active MDSs are production ready. Thanks!

-- 
Respectfully,

Wes Dillingham
wes_dilling...@harvard.edu
Research Computing | Infrastructure Engineer
Harvard University | 38 Oxford Street, Cambridge, Ma 02138 | Room 102
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com