Re: Current State of Service Discovery

Steven Borrelli Sun, 12 Apr 2015 16:24:58 -0700

We’ve been looking at this as a core part of microservices-infrastructure [1], 
so I have been thinking of this problem a lot.


We’re using consul [2]  as our service discovery engine. We like the fact that 
it already has an http api, WAN dns discovery, and health checks at the agent 
level. 
One of my colleagues is developing a mesos->consul bridge, we should have the 
code available this week.
 
As for service discovery, we are currently moving to the global cluster port 
per application + haproxy. We are doing this because right now we can’t expect 
developers add add support for DNS SRV to their client applications.

It is our feeling that the best path forward is to have the option to give each 
task an IP address instead of a port and getting rid of the proxy layer. Each 
task would have ServiceDiscovery information encoded to allow DNS systems to 
group tasks under a common name.

I’m not sure what the support in Mesos is for allocating IP’s and managing IP 
pools based on CIDRs, but I would love to get the discussion going. 

[1] https://github.com/CiscoCloud/microservices-infrastructure
[2] http://consul <http://consul/>.io


> On Apr 12, 2015, at 12:13 PM, Jeff Schroeder <[email protected]> 
> wrote:
> 
> This discussion reminds me of a few excellent blog posts on solving similar 
> problems.
> 
> Smart clients vs dumb load balancers:
> http://blog.lusis.org/blog/2013/05/13/smart-clients/ 
> <http://blog.lusis.org/blog/2013/05/13/smart-clients/>
> 
> Then there is the "local haproxy" idea, which I think is a lot less 
> applicable to a mesos environment, but still worth thinking about:
> http://agiletesting.blogspot.com/2013/12/ops-design-pattern-local-haproxy.htm 
> <http://agiletesting.blogspot.com/2013/12/ops-design-pattern-local-haproxy.html?m=1>l
> 
> On Sunday, April 12, 2015, Christos Kozyrakis <[email protected] 
> <mailto:[email protected]>> wrote:
> This is a great discussion thread. A few points from my side. 
> 
> I agree that service discovery and load balancing for performance or HA are 
> two different topics. The former can help with the latter though. Trying to 
> solve service discovery with proxies is ok at times but a) does not eliminate 
> the naming issue (you still need a naming scheme) and b) can get quite 
> difficult to manage after a certain scale, especially if multiple cluster 
> managers or other tools are involved.
> 
> About DNS and interoperability. A great advantage of using DNS is that there 
> are some mechanisms for interoperability to begin with. If needed, different 
> tools can be responsible for different zones and zone delegation will do the 
> right thing. We still need a sane scheme of naming tasks and services though. 
> But that cannot be enforced by the tools developer anyway. The user/customer 
> needs to select what the names are (e.g., search.mycompany.com 
> <http://search.mycompany.com/>, search.local, 
> search.QA.eastcoast.mycompany.com 
> <http://search.qa.eastcoast.mycompany.com/>, or 
> search.node12.rack9.row4.datacenter1.mycompany.com 
> <http://search.node12.rack9.row4.datacenter1.mycompany.com/>) and make sure 
> that the scheme is reasonable. Keep this in mind, no matter what the DNS 
> tool(s) involved, the user must select and remember a naming scheme. 
> 
> About Mesos-DNS. The whole motivation for Mesos-DNS was interoperability 
> within Mesos clusters. A single cluster may run Marathon, Singularity, 
> Aurora, K8S, Cassandra, Spark, Myriad, Jenkins, ... and a tone of other 
> frameworks. We cannot rely on a single one of them for service discovery and 
> we need to allow tasks launched by different frameworks to find each other. 
> The Mesos master is the one place that we have the state of all tasks in a 
> Mesos cluster, so it is reasonable to draw that state and turn it into DNS 
> records. Right now, Mesos DNS can be the primary DNS server in the cluster or 
> you can delegate a zone to it. Nevertheless, we could also make Mesos-DNS 
> just generate records to dynamically update another server (e.g. Bind). We 
> did not do this so far because cluster have very different setups in term of 
> the current DNS servers and dynamic update are not always possible due to 
> their security/organizations setup. I honestly cannot see how "there can be 
> only one" in this topic. 
> 
> A few things on the roadmap for Mesos-DNS, some of the explained in 
> http://goo.gl/okMKAr <http://goo.gl/okMKAr>
> - an HTTP interface for SRV & port information (see http brach for a simple 
> start)
> - a much more flexible naming scheme based on the service discovery info in 
> Mesos 0.22 (see the link above)
> - switch from a pull mode to a push mode for the Master state (a scalability 
> issue0
> - allow for plugins (see Jame's proposal for a K8S for Mesos-DNS). 
> 
> Finally, how should Mesos-DNS interoperate with non-meson tasks, tasks 
> launched through other tools potentially on other machines? As I mentioned 
> DNS zones is already a good way of doing that without too much coordination 
> between different cluster managers, container managers, different clusters,  
> etc. I like the idea of plugins in order to describe and discover higher 
> level concepts like services (collections of tasks) and help name load 
> balancers. This is what James plugin tries to achieve by the way. On the 
> other hand, using N or N^2 plugins to get all possible tools to talk to the 
> approved DNS server in each cluster through a distributed database seems like 
> pain. Also note that many people trying to build such "one size fits all" 
> setup, merge health tracking and load balancing into the service discovery 
> system. This is not necessarily a great idea (see Yaron's message)
> 
> Regards
> 
>  
> 
> On Sun, Apr 12, 2015 at 2:37 AM, Tom Arnfeld <[email protected] 
> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote:
> Hi Yaron,
> 
> (Also apologies for the long reply)
> 
> >> First, I think that we need to clearly distinguish that service discovery, 
> >> and request routing / load balancing are related but separate concerns. In 
> >> my mind, at least, discovery is about finding a way to communicate with a 
> >> service (much like a DNS query), whereas routing and load balancing 
> >> concerns effectively and efficiently directing (handling) network traffic.
> 
> It's very refreshing to hear you talk about service discovery this way. I 
> think this is a very important point that often gets lost in discussions, and 
> implementations don't always truly take this to heart so the result doesn't 
> end up as system agnostic as intended. We've spend the last ~year deploying 
> our own service discovery system because we felt nothing in the community 
> really fitted truly into the sentence you described above...
> 
> The result for us was something very similar to what you've come up with, a 
> DNS (DNS-SD rfc6763) system that runs multi-datacenter backed by an 
> distributed etcd database for the names and records. We built our own DNS 
> server to do this as consul/weave didn't exist back then -> 
> https://github.com/duedil-ltd/discodns 
> <https://github.com/duedil-ltd/discodns>. We're firm believers that DNS 
> itself can provide us with the framework to achieve 99% of all use cases, 
> assuming we ultimately build in support for things like dynamic updates via 
> DNS (rfc2136) and possibly even the "push" based long-polling features that 
> Apple use in Bonjour/mDNSResponder. Not to mention that using say, Marathon, 
> for service discovery is increadiblly restricting. We'd like to use the same 
> system for every service we run, ranging from things deployed in one cloud 
> environment on Mesos to others in another environment deployed using Chef. 
> There's no reason why this can't be the case, imo.
> 
> The key idea very much the same, we want different systems to be able to 
> register services in the way they need to (either via http to etcd, or via 
> dns update) and using their own semantics. A good example is that some 
> systems might want to have TTLs on records (in etcd, so the record 
> automatically disappears) to remove unhealthy instances of services, however 
> other systems might not want to relate their existence in service discovery 
> with their health (think long running distributed databases). Currently we 
> have some command line tools and chef cookbooks for service registering and a 
> WIP branch for dnsupdate (https://github.com/duedil-ltd/discodns/pull/31 
> <https://github.com/duedil-ltd/discodns/pull/31>).
> 
> (I'd be very very interested to hear more about your experience with Weave 
> for this purpose, perhaps a blog post? :-))
> 
> > Regarding DNS: again I don’t think it makes sense to have a ‘mesos-dns’ and 
> > ‘weave-dns’ and ‘kubernets-dns’ - it makes much more sense to have a single 
> > DNS that easily integrates with multiple data sources.
> 
> There's actually a ticket on mesos-dns to support plugins 
> https://github.com/mesosphere/mesos-dns/issues/62 
> <https://github.com/mesosphere/mesos-dns/issues/62> and I had an idea to 
> write a discondns plugin that'd write the records into our etcd database, 
> which might be an interesting way to achieve integration with these tools. 
> Though I wonder whether this approach results is scalability problems because 
> the state becomes too large for a single system to re-process on a regular 
> basis, maybe it's best for the things running on mesos to register 
> themselves, or even a mesos module for the slaves.
> 
> > https://registry.hub.docker.com/u/yaronr/discovery 
> > <https://registry.hub.docker.com/u/yaronr/discovery>
> 
> We'd played around with the LUA support in NGINX to create some sort of 
> dns-sd based service discovery proxy, though don't have anything worth 
> sharing yet as far as I know!
> 
> Thanks for sharing!
> 
> On 12 April 2015 at 10:03, Yaron Rosenbaum <[email protected] 
> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote:
> Ok, this is a bit long, I apologize in advance
> 
> I’ve been researching and experimenting with various challenges around 
> managing microservices at-scale. I’ve been working extensively with Docker, 
> CoreOS and recently Mesos.
> 
> First, I think that we need to clearly distinguish that service discovery, 
> and request routing / load balancing are related but separate concerns. In my 
> mind, at least, discovery is about finding a way to communicate with a 
> service (much like a DNS query), whereas routing and load balancing concerns 
> effectively and efficiently directing (handling) network traffic.
> 
> There are multiple solutions and approaches out there, but I don’t know if 
> any single technology could address all ‘microservices at-scale’ needs on its 
> own effectively and efficiently. In other words - mixing multiple approaches, 
> tools and technologies is probably the right way to go.
> I’m saying this because many of the existing tools come with a single 
> technology in mind. Tools that come form the Mesos camp obviously have Mesos 
> in mind, tools that come from Kube have Kube in mind, tools coming from 
> CoreOS have CoreOS in mind, etc.
> 
> I think It’s time to start mixing things together to really benefit from all 
> the goodness in all the various camps.
> 
> I’ll give an example:
> First, with respect to network traffic routing / balancing, I’ve created an 
> HAProxy based microservice, which supports multiple data sources as inputs 
> for auto-configuration. (BTW, I’m also experimenting with Velocity scripts 
> for the configuration template, and i’m finding it’s a big improvement over 
> Go scripting. Appreciate your feedback!) 
> https://registry.hub.docker.com/u/yaronr/discovery 
> <https://registry.hub.docker.com/u/yaronr/discovery>
> For now just etcd and Marathon data sources are implemented. Some services 
> will rely of CoreOS/etcd for publishing their availability, and others on 
> Marathon. In my case, I run my Kernel cluster - Mesos-master, Marathon and a  
> a clustered REST DHCP like service on CoreOS, and allow my customers to 
> schedule their own microservices on Mesos/Marathon. All require discovery / 
> routing / load balancing, the only difference is where the data comes from.
> 
> Regarding DNS: again I don’t think it makes sense to have a ‘mesos-dns’ and 
> ‘weave-dns’ and ‘kubernets-dns’ - it makes much more sense to have a single 
> DNS that easily integrates with multiple data sources. Network is a common, 
> low level resource, that’s used by the upper level abstractions / frameworks 
> such as Kube, Mesos, and possibly Fleet. Therefore, if a single technology 
> ’takes ownership’ of (parts of) this resource, that could create a technology 
> lock down the line.
> I personally use Weave to abstract Netowrk altogether, and Weave DNS to 
> provide service discovery (not network routing).
> 
> (Y)
> 
> 
> 
> 
> 
> -- 
> Christos
> 
> 
> -- 
> Text by Jeff, typos by iPhone

Re: Current State of Service Discovery

Reply via email to