Specifically with regards to the state of the framework due to callback
ordering, we ensure that our framework is written in a functional style, so
that all callbacks atomically transform the previous state to a new state.
By doing this, we serialize all callbacks. At this point, you can do
generative testing to create events and run them through your system. This,
at least, makes #3 possible.

For #4, we are pretty careful to choose idempotent writes into the DB and a
DB that supports snapshot reads. This way, you can just use at-least-once
semantics for easy-to-implement retries. If a write fails, you just crash,
since that means your DB's completely down. Then we test by thinking
through and discussing whether operations have this idempotency property
and the simple retry logic independently. This starts to get at a way to
manage #4 to avoid learning in production.

On Sun, Oct 12, 2014 at 11:44 AM, Dharmesh Kakadia <dhkaka...@gmail.com>
wrote:

> Thanks David.
>
> Taking state of the framework is an interesting design. I am assuming the
> scheduler is maintaining the state and then handing tasks on slaves. If
> that's the case, we can safely test executor (stateless - receiving event
> and returning appropriate status to the scheduler). You construct scheduler
> tests similarly by passing different states and event and observing the
> next state. This way you will be sure that your callbacks works fine in
> *isolation*. I would be concerned about the state of the framework in
> case of callback ordering (or re-execution) in *all possible scenarios*.
> Mocking is exactly what might uncover such bugs, but as you pointed out, I
> also think it would not be trivial for many frameworks.
>
> At a high-level it would be important to know for frameworks developers
> that,
> 1. executors are working fine in isolation on fresh start, implementing
> the feature.
> 2. executors are working fine when rescheduled/restarted/in presence of
> other executors.
> 3. scheduler is working fine in isolation.
> 4. scheduler is fine in the wild ( in presence of
> others/failures/checkpointing/...).
>
> 1 is easy to do traditionally. 2 is possible if your executors do not have
> side effect or using Docker etc.
> 3 and 4 are not easy to do. I think having support/library for testing
> scheduler is something all the framework writer would benefit from. Not
> having to think about communication between executors and scheduler is
> already a big plus, can we also make it easier for developers to test about
> their scheduler behaviour?
>
> Thoughts?
>
> I would love to hear thoughts from others.
>
> Thanks,
> Dharmesh
>
> On Sun, Oct 12, 2014 at 8:03 PM, David Greenberg <dsg123456...@gmail.com>
> wrote:
>
>> For our frameworks, we don't tend to do much automated testing of the
>> Mesos interface--instead, we construct the framework state, then "send it a
>> message", since our callbacks take the state of the framework + the event
>> as the argument. This way, we don't need to have mesos running, and we can
>> trim away large amounts of code necessary to connect to mesos but
>> unnecessary for the actual feature under test. We've also been
>> experimenting with simulation testing by mocking out the mesos APIs. These
>> techniques are mostly effective when you can pretend that the executors
>> you're using don't communicate much, or when they're trivial to mock.
>>
>> On Sun, Oct 12, 2014 at 9:42 AM, Dharmesh Kakadia <dhkaka...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I am working on a tiny experimental framework for Mesos. I was wondering
>>> what is the recommended way of writing testcases for framework testing. I
>>> looked at the several existing frameworks, but its still not clear to me. I
>>> understand that I might be able to test executor functionality in isolation
>>> through normal test cases, but testing as a whole framework is what I am
>>> unclear about.
>>>
>>> Suggestions? Is that a non-goal? How do other framework developers go
>>> about it?
>>>
>>> Also, on the related note, is there a way to debug frameworks in better
>>> way than sifting through logs?
>>>
>>> Thanks,
>>> Dharmesh
>>>
>>>
>>>
>>
>

Reply via email to