Can you confirm what I understand ? Spark will connect to Elasticsearch through the service port (means HApoxy) and then will get direct IP/ports for the topology?
2015-12-30 19:06 GMT+01:00 John Omernik <[email protected]>: > I would say that service discovery is only for those services that don't > have a built in method for discovery. When I run Elastic Search, I specify > the port range I can start elastic search in, and let it run. If the port > is taken, it tries a different one (I am using the Elastic Search for Yarn > package running on Apache Myriad). Since I know which nodes and what port > ranges to use, I just add that to my Elastic Search config, and thus HA > proxy is not intercepting that traffic. If I have a front end running in > Flask that connects to the ES back end, then I would use Mesos-DNS with > HAProxy to solve that problem. In addition, Spark as a framework does the > service discovery, HA Proxy wouldn't be getting inbetween spark nodes, same > with Kafka (I haven't played with Cassandra yet). > > There is some work being done on IP per container which will help this as > well, but all in all, I've found that as long I am some what smart about my > frameworks, I can manage them (my cluster isn't huge either). As things > grow, I am hoping to grow into IP per container. > > John > > > On Wed, Dec 30, 2015 at 11:56 AM, vincent gromakowski < > [email protected]> wrote: > >> I am currently using mesos as a big data backend for spark, cassandra, >> kafka and elasticsearch but I cannot find a good overall design regarding >> service discovery. I explain: >> Generally, the service discovery is managed by a HAproxy instance on each >> node which redirect trafic from service ports to real assigned network >> ports. Currently I am not using it because the cluster is quite small and I >> don't need to deploy lots of service but I am thinking on futur design that >> will allows me to scale. >> The problem with HAproxy dealing with all network trafic is that I am >> afraid it will break the data locality which is so important in the big >> data world regarding performances. >> For example when Spark tries to connect to elasticsearch, it will >> discover the elasticsearch topology and try to launch tasks next to >> elasticsearch shards. If HAproxy intercept network flows, what would be the >> result ? Will HAproxy masquarade the elasticsearch IP/ports ? Same thing >> for Kafka and Cassandra ? >> >> I assume it depends on each connector but it's very hard to find any >> information. Thanks for your help if you have any experience in it. >> Regards >> >> >> >

