> 0. How do i go about the issue of HA at the scheduler level?
One alternative to having to do your own leader election is to use a
meta-framework like Marathon or Aurora to automatically restart your
scheduler. There will be a short downtime during the failover, but as soon
as the new scheduler comes back up it can recover state, reregister, and
reconcile. Then you only ever need one running instance, which is always
the leader.

> 1. How do i deal with restarts and reconciling the tasks?
I strongly recommend you read
http://mesos.apache.org/documentation/latest/reconciliation/

> 3. How does one go about testing frameworks? Any suggestions / pointers.
- Unit tests within your framework code, mocking necessary Mesos
Master/Slave components.
- Health checks on all your tasks, and a `/health` endpoint on your
scheduler, to ease integration testing.

On Sat, Jul 25, 2015 at 12:30 PM, Ankur Chauhan <[email protected]> wrote:

> Hi all,
>
>
> I am working on creating an integration between Apache Flink (
> http://flink.apache.org) and mesos which would be similar to the way the
> current hadoop-mesos integration works using the java mesos client.
> My current idea is that the scheduler will also run a JobManager process
> (similar to the jobTracker) which will start off a bunch of taskManager
> (similar to the TaskTracker) tasks using a custom executor.
>
> I want to get some feedback and information of the following questions I
> have:
>
> 0. How do i go about the issue of HA at the scheduler level?
>     I was thinking of using zookeeper based leader election by directly
> maintaining a zookeeper connection myself. Is there a better way to do this
> (something which does not require me to use a self managed zookeeper
> connection)?
>
> 1. How do i deal with restarts and reconciling the tasks?
>     In case the scheduler restarts (currently maintains an in-memory map
> of currently running tasks), How do I go about rediscovering tasks and
> reconciling state?
>     I was thinking of using DiscoverInfo but I can't find any reference to
> figure out how to "query" mesos for tasks matching the service discovery
> information. - Any suggestions on how to do this.
>
> 3. How does one go about testing frameworks? Any suggestions / pointers.
>
> My work in progress version is at
> https://github.com/ankurcha/flink/tree/flink-mesos/flink-mesos
>
> Any help would be much appreciated.
>
>
> Thanks!
> Ankur
>

Reply via email to