Re: Current State of Service Discovery

Jeff Schroeder Sun, 12 Apr 2015 10:13:46 -0700

This discussion reminds me of a few excellent blog posts on solving similar
problems.


Smart clients vs dumb load balancers:
http://blog.lusis.org/blog/2013/05/13/smart-clients/

Then there is the "local haproxy" idea, which I think is a lot less
applicable to a mesos environment, but still worth thinking about:
http://agiletesting.blogspot.com/2013/12/ops-design-pattern-local-haproxy.htm
<http://agiletesting.blogspot.com/2013/12/ops-design-pattern-local-haproxy.html?m=1>
l

On Sunday, April 12, 2015, Christos Kozyrakis <kozyr...@gmail.com> wrote:

> This is a great discussion thread. A few points from my side.
>
> I agree that service discovery and load balancing for performance or HA
> are two different topics. The former can help with the latter though.
> Trying to solve service discovery with proxies is ok at times but a) does
> not eliminate the naming issue (you still need a naming scheme) and b) can
> get quite difficult to manage after a certain scale, especially if multiple
> cluster managers or other tools are involved.
>
> *About DNS and interoperability.* A great advantage of using DNS is that
> there are some mechanisms for interoperability to begin with. If needed,
> different tools can be responsible for different zones and zone delegation
> will do the right thing. We still need a sane scheme of naming tasks and
> services though. But that cannot be enforced by the tools developer anyway.
> The user/customer needs to select what the names are (e.g.,
> search.mycompany.com, search.local, search.QA.eastcoast.mycompany.com, or
> search.node12.rack9.row4.datacenter1.mycompany.com) and make sure that
> the scheme is reasonable. Keep this in mind, no matter what the DNS tool(s)
> involved, the user must select and remember a naming scheme.
>
> *About Mesos-DNS.* The whole motivation for Mesos-DNS was
> interoperability within Mesos clusters. A single cluster may run Marathon,
> Singularity, Aurora, K8S, Cassandra, Spark, Myriad, Jenkins, ... and a tone
> of other frameworks. We cannot rely on a single one of them for service
> discovery and we need to allow tasks launched by different frameworks to
> find each other. The Mesos master is the one place that we have the state
> of all tasks in a Mesos cluster, so it is reasonable to draw that state and
> turn it into DNS records. Right now, Mesos DNS can be the primary DNS
> server in the cluster or you can delegate a zone to it. Nevertheless, we
> could also make Mesos-DNS just generate records to dynamically update
> another server (e.g. Bind). We did not do this so far because cluster have
> very different setups in term of the current DNS servers and dynamic update
> are not always possible due to their security/organizations setup. I
> honestly cannot see how "there can be only one" in this topic.
>
> A few things on the roadmap for Mesos-DNS, some of the explained in
> http://goo.gl/okMKAr
> - an HTTP interface for SRV & port information (see http brach for a
> simple start)
> - a much more flexible naming scheme based on the service discovery info
> in Mesos 0.22 (see the link above)
> - switch from a pull mode to a push mode for the Master state (a
> scalability issue0
> - allow for plugins (see Jame's proposal for a K8S for Mesos-DNS).
>
> Finally, how should Mesos-DNS interoperate with non-meson tasks, tasks
> launched through other tools potentially on other machines? As I mentioned
> DNS zones is already a good way of doing that without too much coordination
> between different cluster managers, container managers, different clusters,
>  etc. I like the idea of plugins in order to describe and discover higher
> level concepts like services (collections of tasks) and help name load
> balancers. This is what James plugin tries to achieve by the way. On the
> other hand, using N or N^2 plugins to get all possible tools to talk to the
> approved DNS server in each cluster through a distributed database seems
> like pain. Also note that many people trying to build such "one size fits
> all" setup, merge health tracking and load balancing into the service
> discovery system. This is not necessarily a great idea (see Yaron's message)
>
> Regards
>
>
>
> On Sun, Apr 12, 2015 at 2:37 AM, Tom Arnfeld <t...@duedil.com
> <javascript:_e(%7B%7D,'cvml','t...@duedil.com');>> wrote:
>
>> Hi Yaron,
>>
>> (Also apologies for the long reply)
>>
>> >> First, I think that we need to clearly distinguish that service
>> discovery, and request routing / load balancing are related but separate
>> concerns. In my mind, at least, discovery is about finding a way to
>> communicate with a service (much like a DNS query), whereas routing and
>> load balancing concerns effectively and efficiently directing (handling)
>> network traffic.
>>
>> It's very refreshing to hear you talk about service discovery this way. I
>> think this is a very important point that often gets lost in discussions,
>> and implementations don't always truly take this to heart so the result
>> doesn't end up as system agnostic as intended. We've spend the last ~year
>> deploying our own service discovery system because we felt nothing in the
>> community really fitted *truly* into the sentence you described above...
>>
>> The result for us was something very similar to what you've come up with,
>> a DNS (DNS-SD rfc6763) system that runs multi-datacenter backed by an
>> distributed etcd database for the names and records. We built our own DNS
>> server to do this as consul/weave didn't exist back then ->
>> https://github.com/duedil-ltd/discodns. We're firm believers that DNS
>> itself can provide us with the framework to achieve 99% of all use cases,
>> assuming we ultimately build in support for things like dynamic updates via
>> DNS (rfc2136) and possibly even the "push" based long-polling features that
>> Apple use in Bonjour/mDNSResponder. Not to mention that using say,
>> Marathon, for service discovery is increadiblly restricting. We'd like to
>> use the same system for every service we run, ranging from things deployed
>> in one cloud environment on Mesos to others in another environment deployed
>> using Chef. There's no reason why this can't be the case, imo.
>>
>> The key idea very much the same, we want different systems to be able to
>> register services in the way they need to (either via http to etcd, or via
>> dns update) and using their own semantics. A good example is that some
>> systems might want to have TTLs on records (in etcd, so the record
>> automatically disappears) to remove unhealthy instances of services,
>> however other systems might not want to relate their existence in service
>> discovery with their health (think long running distributed databases).
>> Currently we have some command line tools and chef cookbooks for service
>> registering and a WIP branch for *dnsupdate* (
>> https://github.com/duedil-ltd/discodns/pull/31).
>>
>> (I'd be very very interested to hear more about your experience with
>> Weave for this purpose, perhaps a blog post? :-))
>>
>> > Regarding DNS: again I don’t think it makes sense to have a
>> ‘mesos-dns’ and ‘weave-dns’ and ‘kubernets-dns’ - it makes much more sense
>> to have a single DNS that easily integrates with multiple data sources.
>>
>> There's actually a ticket on mesos-dns to support plugins
>> https://github.com/mesosphere/mesos-dns/issues/62 and I had an idea to
>> write a discondns plugin that'd write the records into our etcd database,
>> which might be an interesting way to achieve integration with these tools.
>> Though I wonder whether this approach results is scalability problems
>> because the state becomes too large for a single system to re-process on a
>> regular basis, maybe it's best for the things running on mesos to register
>> themselves, or even a mesos module for the slaves.
>>
>> > https://registry.hub.docker.com/u/yaronr/discovery
>>
>> We'd played around with the LUA support in NGINX to create some sort of
>> dns-sd based service discovery proxy, though don't have anything worth
>> sharing yet as far as I know!
>>
>> Thanks for sharing!
>>
>> On 12 April 2015 at 10:03, Yaron Rosenbaum <yaron.rosenb...@gmail.com
>> <javascript:_e(%7B%7D,'cvml','yaron.rosenb...@gmail.com');>> wrote:
>>
>>> Ok, this is a bit long, I apologize in advance
>>>
>>> I’ve been researching and experimenting with various challenges around
>>> managing microservices at-scale. I’ve been working extensively with Docker,
>>> CoreOS and recently Mesos.
>>>
>>> First, I think that we need to clearly distinguish that service
>>> discovery, and request routing / load balancing are related but separate
>>> concerns. In my mind, at least, discovery is about finding a way to
>>> communicate with a service (much like a DNS query), whereas routing and
>>> load balancing concerns effectively and efficiently directing (handling)
>>> network traffic.
>>>
>>> There are multiple solutions and approaches out there, but I don’t know
>>> if any single technology could address all ‘microservices at-scale’ needs
>>> on its own effectively and efficiently. In other words - mixing multiple
>>> approaches, tools and technologies is probably the right way to go.
>>> I’m saying this because many of the existing tools come with a single
>>> technology in mind. Tools that come form the Mesos camp obviously have
>>> Mesos in mind, tools that come from Kube have Kube in mind, tools coming
>>> from CoreOS have CoreOS in mind, etc.
>>>
>>> I think It’s time to start mixing things together to really benefit from
>>> all the goodness in all the various camps.
>>>
>>> I’ll give an example:
>>> First, with respect to network traffic routing / balancing, I’ve created
>>> an HAProxy based microservice, which supports multiple data sources as
>>> inputs for auto-configuration. (BTW, I’m also experimenting with Velocity
>>> scripts for the configuration template, and i’m finding it’s a big
>>> improvement over Go scripting. Appreciate your feedback!)
>>> https://registry.hub.docker.com/u/yaronr/discovery
>>> For now just etcd and Marathon data sources are implemented. Some
>>> services will rely of CoreOS/etcd for publishing their availability, and
>>> others on Marathon. In my case, I run my Kernel cluster - Mesos-master,
>>> Marathon and a  a clustered REST DHCP like service on CoreOS, and allow my
>>> customers to schedule their own microservices on Mesos/Marathon. All
>>> require discovery / routing / load balancing, the only difference is where
>>> the data comes from.
>>>
>>> Regarding DNS: again I don’t think it makes sense to have a ‘mesos-dns’
>>> and ‘weave-dns’ and ‘kubernets-dns’ - it makes much more sense to have a
>>> single DNS that easily integrates with multiple data sources. Network is a
>>> common, low level resource, that’s used by the upper level abstractions /
>>> frameworks such as Kube, Mesos, and possibly Fleet. Therefore, if a single
>>> technology ’takes ownership’ of (parts of) this resource, that could create
>>> a technology lock down the line.
>>> I personally use Weave to abstract Netowrk altogether, and Weave DNS to
>>> provide service discovery (not network routing).
>>>
>>> (Y)
>>>
>>>
>>
>
>
> --
> Christos
>


-- 
Text by Jeff, typos by iPhone

Re: Current State of Service Discovery

Reply via email to