I am currently using mesos as a big data backend for spark, cassandra, kafka and elasticsearch but I cannot find a good overall design regarding service discovery. I explain: Generally, the service discovery is managed by a HAproxy instance on each node which redirect trafic from service ports to real assigned network ports. Currently I am not using it because the cluster is quite small and I don't need to deploy lots of service but I am thinking on futur design that will allows me to scale. The problem with HAproxy dealing with all network trafic is that I am afraid it will break the data locality which is so important in the big data world regarding performances. For example when Spark tries to connect to elasticsearch, it will discover the elasticsearch topology and try to launch tasks next to elasticsearch shards. If HAproxy intercept network flows, what would be the result ? Will HAproxy masquarade the elasticsearch IP/ports ? Same thing for Kafka and Cassandra ?
I assume it depends on each connector but it's very hard to find any information. Thanks for your help if you have any experience in it. Regards

