Re: Current State of Service Discovery

Christos Kozyrakis Sun, 12 Apr 2015 09:25:13 -0700

This is a great discussion thread. A few points from my side.

I agree that service discovery and load balancing for performance or HA are
two different topics. The former can help with the latter though. Trying to
solve service discovery with proxies is ok at times but a) does not
eliminate the naming issue (you still need a naming scheme) and b) can get
quite difficult to manage after a certain scale, especially if multiple
cluster managers or other tools are involved.


*About DNS and interoperability.* A great advantage of using DNS is that
there are some mechanisms for interoperability to begin with. If needed,
different tools can be responsible for different zones and zone delegation
will do the right thing. We still need a sane scheme of naming tasks and
services though. But that cannot be enforced by the tools developer anyway.
The user/customer needs to select what the names are (e.g.,
search.mycompany.com, search.local, search.QA.eastcoast.mycompany.com, or
search.node12.rack9.row4.datacenter1.mycompany.com) and make sure that the
scheme is reasonable. Keep this in mind, no matter what the DNS tool(s)
involved, the user must select and remember a naming scheme.

*About Mesos-DNS.* The whole motivation for Mesos-DNS was interoperability
within Mesos clusters. A single cluster may run Marathon, Singularity,
Aurora, K8S, Cassandra, Spark, Myriad, Jenkins, ... and a tone of other
frameworks. We cannot rely on a single one of them for service discovery
and we need to allow tasks launched by different frameworks to find each
other. The Mesos master is the one place that we have the state of all
tasks in a Mesos cluster, so it is reasonable to draw that state and turn
it into DNS records. Right now, Mesos DNS can be the primary DNS server in
the cluster or you can delegate a zone to it. Nevertheless, we could also
make Mesos-DNS just generate records to dynamically update another server
(e.g. Bind). We did not do this so far because cluster have very different
setups in term of the current DNS servers and dynamic update are not always
possible due to their security/organizations setup. I honestly cannot see
how "there can be only one" in this topic.

A few things on the roadmap for Mesos-DNS, some of the explained in
http://goo.gl/okMKAr
- an HTTP interface for SRV & port information (see http brach for a simple
start)
- a much more flexible naming scheme based on the service discovery info in
Mesos 0.22 (see the link above)
- switch from a pull mode to a push mode for the Master state (a
scalability issue0
- allow for plugins (see Jame's proposal for a K8S for Mesos-DNS).

Finally, how should Mesos-DNS interoperate with non-meson tasks, tasks
launched through other tools potentially on other machines? As I mentioned
DNS zones is already a good way of doing that without too much coordination
between different cluster managers, container managers, different clusters,
 etc. I like the idea of plugins in order to describe and discover higher
level concepts like services (collections of tasks) and help name load
balancers. This is what James plugin tries to achieve by the way. On the
other hand, using N or N^2 plugins to get all possible tools to talk to the
approved DNS server in each cluster through a distributed database seems
like pain. Also note that many people trying to build such "one size fits
all" setup, merge health tracking and load balancing into the service
discovery system. This is not necessarily a great idea (see Yaron's message)

Regards



On Sun, Apr 12, 2015 at 2:37 AM, Tom Arnfeld <[email protected]> wrote:

> Hi Yaron,
>
> (Also apologies for the long reply)
>
> >> First, I think that we need to clearly distinguish that service
> discovery, and request routing / load balancing are related but separate
> concerns. In my mind, at least, discovery is about finding a way to
> communicate with a service (much like a DNS query), whereas routing and
> load balancing concerns effectively and efficiently directing (handling)
> network traffic.
>
> It's very refreshing to hear you talk about service discovery this way. I
> think this is a very important point that often gets lost in discussions,
> and implementations don't always truly take this to heart so the result
> doesn't end up as system agnostic as intended. We've spend the last ~year
> deploying our own service discovery system because we felt nothing in the
> community really fitted *truly* into the sentence you described above...
>
> The result for us was something very similar to what you've come up with,
> a DNS (DNS-SD rfc6763) system that runs multi-datacenter backed by an
> distributed etcd database for the names and records. We built our own DNS
> server to do this as consul/weave didn't exist back then ->
> https://github.com/duedil-ltd/discodns. We're firm believers that DNS
> itself can provide us with the framework to achieve 99% of all use cases,
> assuming we ultimately build in support for things like dynamic updates via
> DNS (rfc2136) and possibly even the "push" based long-polling features that
> Apple use in Bonjour/mDNSResponder. Not to mention that using say,
> Marathon, for service discovery is increadiblly restricting. We'd like to
> use the same system for every service we run, ranging from things deployed
> in one cloud environment on Mesos to others in another environment deployed
> using Chef. There's no reason why this can't be the case, imo.
>
> The key idea very much the same, we want different systems to be able to
> register services in the way they need to (either via http to etcd, or via
> dns update) and using their own semantics. A good example is that some
> systems might want to have TTLs on records (in etcd, so the record
> automatically disappears) to remove unhealthy instances of services,
> however other systems might not want to relate their existence in service
> discovery with their health (think long running distributed databases).
> Currently we have some command line tools and chef cookbooks for service
> registering and a WIP branch for *dnsupdate* (
> https://github.com/duedil-ltd/discodns/pull/31).
>
> (I'd be very very interested to hear more about your experience with Weave
> for this purpose, perhaps a blog post? :-))
>
> > Regarding DNS: again I don’t think it makes sense to have a ‘mesos-dns’
> and ‘weave-dns’ and ‘kubernets-dns’ - it makes much more sense to have a
> single DNS that easily integrates with multiple data sources.
>
> There's actually a ticket on mesos-dns to support plugins
> https://github.com/mesosphere/mesos-dns/issues/62 and I had an idea to
> write a discondns plugin that'd write the records into our etcd database,
> which might be an interesting way to achieve integration with these tools.
> Though I wonder whether this approach results is scalability problems
> because the state becomes too large for a single system to re-process on a
> regular basis, maybe it's best for the things running on mesos to register
> themselves, or even a mesos module for the slaves.
>
> > https://registry.hub.docker.com/u/yaronr/discovery
>
> We'd played around with the LUA support in NGINX to create some sort of
> dns-sd based service discovery proxy, though don't have anything worth
> sharing yet as far as I know!
>
> Thanks for sharing!
>
> On 12 April 2015 at 10:03, Yaron Rosenbaum <[email protected]>
> wrote:
>
>> Ok, this is a bit long, I apologize in advance
>>
>> I’ve been researching and experimenting with various challenges around
>> managing microservices at-scale. I’ve been working extensively with Docker,
>> CoreOS and recently Mesos.
>>
>> First, I think that we need to clearly distinguish that service
>> discovery, and request routing / load balancing are related but separate
>> concerns. In my mind, at least, discovery is about finding a way to
>> communicate with a service (much like a DNS query), whereas routing and
>> load balancing concerns effectively and efficiently directing (handling)
>> network traffic.
>>
>> There are multiple solutions and approaches out there, but I don’t know
>> if any single technology could address all ‘microservices at-scale’ needs
>> on its own effectively and efficiently. In other words - mixing multiple
>> approaches, tools and technologies is probably the right way to go.
>> I’m saying this because many of the existing tools come with a single
>> technology in mind. Tools that come form the Mesos camp obviously have
>> Mesos in mind, tools that come from Kube have Kube in mind, tools coming
>> from CoreOS have CoreOS in mind, etc.
>>
>> I think It’s time to start mixing things together to really benefit from
>> all the goodness in all the various camps.
>>
>> I’ll give an example:
>> First, with respect to network traffic routing / balancing, I’ve created
>> an HAProxy based microservice, which supports multiple data sources as
>> inputs for auto-configuration. (BTW, I’m also experimenting with Velocity
>> scripts for the configuration template, and i’m finding it’s a big
>> improvement over Go scripting. Appreciate your feedback!)
>> https://registry.hub.docker.com/u/yaronr/discovery
>> For now just etcd and Marathon data sources are implemented. Some
>> services will rely of CoreOS/etcd for publishing their availability, and
>> others on Marathon. In my case, I run my Kernel cluster - Mesos-master,
>> Marathon and a  a clustered REST DHCP like service on CoreOS, and allow my
>> customers to schedule their own microservices on Mesos/Marathon. All
>> require discovery / routing / load balancing, the only difference is where
>> the data comes from.
>>
>> Regarding DNS: again I don’t think it makes sense to have a ‘mesos-dns’
>> and ‘weave-dns’ and ‘kubernets-dns’ - it makes much more sense to have a
>> single DNS that easily integrates with multiple data sources. Network is a
>> common, low level resource, that’s used by the upper level abstractions /
>> frameworks such as Kube, Mesos, and possibly Fleet. Therefore, if a single
>> technology ’takes ownership’ of (parts of) this resource, that could create
>> a technology lock down the line.
>> I personally use Weave to abstract Netowrk altogether, and Weave DNS to
>> provide service discovery (not network routing).
>>
>> (Y)
>>
>>
>


-- 
Christos

Re: Current State of Service Discovery

Reply via email to