Good idea to get data locality for non distributed apps but spark driver
will distribute  info to workers so it may result in all workers connecting
to instance on the same node as the driver.
I will do some  test...
Le 31 déc. 2015 1:26 AM, "Shuai Lin" <[email protected]> a écrit :

> What about specifying all non-local instances as "backup" in haproxy.cfg?
> This way haproxy would only direct traffic to the local instance as long as
> the local instance is alive.
>
> For example, if you plan to use the haproxy-marathon-bridge script, you
> can modify this line to achieve that:
> https://github.com/mesosphere/marathon/blob/8b3ce8844dcc53055345914ef11019789dd843cf/bin/haproxy-marathon-bridge#L162
> .
>
>
> On Thu, Dec 31, 2015 at 1:56 AM, vincent gromakowski <
> [email protected]> wrote:
>
>> I am currently using mesos as a big data backend for spark, cassandra,
>> kafka and elasticsearch but I cannot find a good overall design regarding
>> service discovery. I explain:
>> Generally, the service discovery is managed by a HAproxy instance on each
>> node which redirect trafic from service ports to real assigned network
>> ports. Currently I am not using it because the cluster is quite small and I
>> don't need to deploy lots of service but I am thinking on futur design that
>> will allows me to scale.
>> The problem with HAproxy dealing with all network trafic is that I am
>> afraid it will break the data locality which is so important in the big
>> data world regarding performances.
>> For example when Spark tries to connect to elasticsearch, it will
>> discover the elasticsearch topology and try to launch tasks next to
>> elasticsearch shards. If HAproxy intercept network flows, what would be the
>> result ?  Will HAproxy masquarade the elasticsearch  IP/ports ? Same thing
>> for Kafka and Cassandra ?
>>
>> I assume it depends on each connector but it's very hard to find any
>> information. Thanks for your help if you have any experience in it.
>> Regards
>>
>>
>>
>

Reply via email to