I would say that service discovery is only for those services that don't have a built in method for discovery. When I run Elastic Search, I specify the port range I can start elastic search in, and let it run. If the port is taken, it tries a different one (I am using the Elastic Search for Yarn package running on Apache Myriad). Since I know which nodes and what port ranges to use, I just add that to my Elastic Search config, and thus HA proxy is not intercepting that traffic. If I have a front end running in Flask that connects to the ES back end, then I would use Mesos-DNS with HAProxy to solve that problem. In addition, Spark as a framework does the service discovery, HA Proxy wouldn't be getting inbetween spark nodes, same with Kafka (I haven't played with Cassandra yet).
There is some work being done on IP per container which will help this as well, but all in all, I've found that as long I am some what smart about my frameworks, I can manage them (my cluster isn't huge either). As things grow, I am hoping to grow into IP per container. John On Wed, Dec 30, 2015 at 11:56 AM, vincent gromakowski < [email protected]> wrote: > I am currently using mesos as a big data backend for spark, cassandra, > kafka and elasticsearch but I cannot find a good overall design regarding > service discovery. I explain: > Generally, the service discovery is managed by a HAproxy instance on each > node which redirect trafic from service ports to real assigned network > ports. Currently I am not using it because the cluster is quite small and I > don't need to deploy lots of service but I am thinking on futur design that > will allows me to scale. > The problem with HAproxy dealing with all network trafic is that I am > afraid it will break the data locality which is so important in the big > data world regarding performances. > For example when Spark tries to connect to elasticsearch, it will discover > the elasticsearch topology and try to launch tasks next to elasticsearch > shards. If HAproxy intercept network flows, what would be the result ? > Will HAproxy masquarade the elasticsearch IP/ports ? Same thing for Kafka > and Cassandra ? > > I assume it depends on each connector but it's very hard to find any > information. Thanks for your help if you have any experience in it. > Regards > > >

