Can you confirm what I understand ? Spark will connect to Elasticsearch
through the service port (means HApoxy) and then will get direct IP/ports
for the topology?

2015-12-30 19:06 GMT+01:00 John Omernik <[email protected]>:

> I would say that service discovery is only for those services that don't
> have a built in method for discovery. When I run Elastic Search, I specify
> the port range I can start elastic search in, and let it run. If the port
> is taken, it tries a different one (I am using the Elastic Search for Yarn
> package running on Apache Myriad).  Since I know which nodes and what port
> ranges to use, I just add that to my Elastic Search config, and thus HA
> proxy is not intercepting that traffic.  If I have a front end running in
> Flask that connects to the ES back end, then I would use Mesos-DNS with
> HAProxy to solve that problem.  In  addition, Spark as a framework does the
> service discovery, HA Proxy wouldn't be getting inbetween spark nodes, same
> with Kafka (I haven't played with Cassandra yet).
>
> There is some work being done on IP per container which will help this as
> well, but all in all, I've found that as long I am some what smart about my
> frameworks, I can manage them (my cluster isn't huge either).   As things
> grow, I am hoping to grow into IP per container.
>
> John
>
>
> On Wed, Dec 30, 2015 at 11:56 AM, vincent gromakowski <
> [email protected]> wrote:
>
>> I am currently using mesos as a big data backend for spark, cassandra,
>> kafka and elasticsearch but I cannot find a good overall design regarding
>> service discovery. I explain:
>> Generally, the service discovery is managed by a HAproxy instance on each
>> node which redirect trafic from service ports to real assigned network
>> ports. Currently I am not using it because the cluster is quite small and I
>> don't need to deploy lots of service but I am thinking on futur design that
>> will allows me to scale.
>> The problem with HAproxy dealing with all network trafic is that I am
>> afraid it will break the data locality which is so important in the big
>> data world regarding performances.
>> For example when Spark tries to connect to elasticsearch, it will
>> discover the elasticsearch topology and try to launch tasks next to
>> elasticsearch shards. If HAproxy intercept network flows, what would be the
>> result ?  Will HAproxy masquarade the elasticsearch  IP/ports ? Same thing
>> for Kafka and Cassandra ?
>>
>> I assume it depends on each connector but it's very hard to find any
>> information. Thanks for your help if you have any experience in it.
>> Regards
>>
>>
>>
>

Reply via email to