I would say that service discovery is only for those services that don't
have a built in method for discovery. When I run Elastic Search, I specify
the port range I can start elastic search in, and let it run. If the port
is taken, it tries a different one (I am using the Elastic Search for Yarn
package running on Apache Myriad).  Since I know which nodes and what port
ranges to use, I just add that to my Elastic Search config, and thus HA
proxy is not intercepting that traffic.  If I have a front end running in
Flask that connects to the ES back end, then I would use Mesos-DNS with
HAProxy to solve that problem.  In  addition, Spark as a framework does the
service discovery, HA Proxy wouldn't be getting inbetween spark nodes, same
with Kafka (I haven't played with Cassandra yet).

There is some work being done on IP per container which will help this as
well, but all in all, I've found that as long I am some what smart about my
frameworks, I can manage them (my cluster isn't huge either).   As things
grow, I am hoping to grow into IP per container.

John


On Wed, Dec 30, 2015 at 11:56 AM, vincent gromakowski <
[email protected]> wrote:

> I am currently using mesos as a big data backend for spark, cassandra,
> kafka and elasticsearch but I cannot find a good overall design regarding
> service discovery. I explain:
> Generally, the service discovery is managed by a HAproxy instance on each
> node which redirect trafic from service ports to real assigned network
> ports. Currently I am not using it because the cluster is quite small and I
> don't need to deploy lots of service but I am thinking on futur design that
> will allows me to scale.
> The problem with HAproxy dealing with all network trafic is that I am
> afraid it will break the data locality which is so important in the big
> data world regarding performances.
> For example when Spark tries to connect to elasticsearch, it will discover
> the elasticsearch topology and try to launch tasks next to elasticsearch
> shards. If HAproxy intercept network flows, what would be the result ?
> Will HAproxy masquarade the elasticsearch  IP/ports ? Same thing for Kafka
> and Cassandra ?
>
> I assume it depends on each connector but it's very hard to find any
> information. Thanks for your help if you have any experience in it.
> Regards
>
>
>

Reply via email to