>
> @Sharma #3 looks impressive and I hear the pain. Few questions:
> * Since you already have the state machine modeling, can't the scheduler
> actions also be modeled as a state machine transitions?
>
I suppose that is possible in theory. I am thinking that the scheduler
state will have to be a function of all the tasks' and slaves' states,
which could be more tedious to verify with every task assignment than
validate individual assignment decisions. Maybe there is a different way to
look at this.
> * Having a spec for (in form of state machine or otherwise) scheduler
> looks important (and hard) goal. Mocking looks like a good thing. Is
> mocking general enough to become a library available to all, to enable
> *verifiably* correct scheduler behavior?
A general library for mocking parts of the scheduler may be useful, I
agree. Here's what I have right now. I mock the incoming offers with an
OfferProvider that has these methods:
getOffer(numCpus, memory, portRanges, attributesMap) ## and overloaded
variants
getConsumedOffer(assignments)
The first is used to setup a new offer for a slave. When that slave gets
used for some task assignments, the second method returns a new offer that
has resources minus the resources used for the assignments.
This works for the task assignment part of the scheduler (#3 in my previous
email). Also, I don't build the actual Protos.Offer object since the task
assigner object I have deals with a wrapper object around the offer, which
is what I mock, strictly speaking.
Sharma
On Tue, Oct 14, 2014 at 9:36 AM, Dharmesh Kakadia <[email protected]>
wrote:
> Thanks to both of you.
>
> @David Idempotence (and functional style) will both mitigate the issue of
> testing.
>
> @Sharma #3 looks impressive and I hear the pain. Few questions:
> * Since you already have the state machine modeling, can't the scheduler
> actions also be modeled as a state machine transitions?
> * Having a spec for (in form of state machine or otherwise) scheduler
> looks important (and hard) goal. Mocking looks like a good thing. Is
> mocking general enough to become a library available to all, to enable
> *verifiably* correct scheduler behavior?
>
> Again thanks for sharing your thoughts.
>
> Thanks,
> Dharmesh
>
> On Mon, Oct 13, 2014 at 7:29 AM, David Greenberg <[email protected]>
> wrote:
>
>> Specifically with regards to the state of the framework due to callback
>> ordering, we ensure that our framework is written in a functional style, so
>> that all callbacks atomically transform the previous state to a new state.
>> By doing this, we serialize all callbacks. At this point, you can do
>> generative testing to create events and run them through your system. This,
>> at least, makes #3 possible.
>>
>> For #4, we are pretty careful to choose idempotent writes into the DB and
>> a DB that supports snapshot reads. This way, you can just use at-least-once
>> semantics for easy-to-implement retries. If a write fails, you just crash,
>> since that means your DB's completely down. Then we test by thinking
>> through and discussing whether operations have this idempotency property
>> and the simple retry logic independently. This starts to get at a way to
>> manage #4 to avoid learning in production.
>>
>> On Sun, Oct 12, 2014 at 11:44 AM, Dharmesh Kakadia <[email protected]>
>> wrote:
>>
>>> Thanks David.
>>>
>>> Taking state of the framework is an interesting design. I am assuming
>>> the scheduler is maintaining the state and then handing tasks on slaves. If
>>> that's the case, we can safely test executor (stateless - receiving event
>>> and returning appropriate status to the scheduler). You construct scheduler
>>> tests similarly by passing different states and event and observing the
>>> next state. This way you will be sure that your callbacks works fine in
>>> *isolation*. I would be concerned about the state of the framework in
>>> case of callback ordering (or re-execution) in *all possible scenarios*.
>>> Mocking is exactly what might uncover such bugs, but as you pointed out, I
>>> also think it would not be trivial for many frameworks.
>>>
>>> At a high-level it would be important to know for frameworks developers
>>> that,
>>> 1. executors are working fine in isolation on fresh start, implementing
>>> the feature.
>>> 2. executors are working fine when rescheduled/restarted/in presence of
>>> other executors.
>>> 3. scheduler is working fine in isolation.
>>> 4. scheduler is fine in the wild ( in presence of
>>> others/failures/checkpointing/...).
>>>
>>> 1 is easy to do traditionally. 2 is possible if your executors do not
>>> have side effect or using Docker etc.
>>> 3 and 4 are not easy to do. I think having support/library for testing
>>> scheduler is something all the framework writer would benefit from. Not
>>> having to think about communication between executors and scheduler is
>>> already a big plus, can we also make it easier for developers to test about
>>> their scheduler behaviour?
>>>
>>> Thoughts?
>>>
>>> I would love to hear thoughts from others.
>>>
>>> Thanks,
>>> Dharmesh
>>>
>>> On Sun, Oct 12, 2014 at 8:03 PM, David Greenberg <[email protected]
>>> > wrote:
>>>
>>>> For our frameworks, we don't tend to do much automated testing of the
>>>> Mesos interface--instead, we construct the framework state, then "send it a
>>>> message", since our callbacks take the state of the framework + the event
>>>> as the argument. This way, we don't need to have mesos running, and we can
>>>> trim away large amounts of code necessary to connect to mesos but
>>>> unnecessary for the actual feature under test. We've also been
>>>> experimenting with simulation testing by mocking out the mesos APIs. These
>>>> techniques are mostly effective when you can pretend that the executors
>>>> you're using don't communicate much, or when they're trivial to mock.
>>>>
>>>> On Sun, Oct 12, 2014 at 9:42 AM, Dharmesh Kakadia <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I am working on a tiny experimental framework for Mesos. I was
>>>>> wondering what is the recommended way of writing testcases for framework
>>>>> testing. I looked at the several existing frameworks, but its still not
>>>>> clear to me. I understand that I might be able to test executor
>>>>> functionality in isolation through normal test cases, but testing as a
>>>>> whole framework is what I am unclear about.
>>>>>
>>>>> Suggestions? Is that a non-goal? How do other framework developers go
>>>>> about it?
>>>>>
>>>>> Also, on the related note, is there a way to debug frameworks in
>>>>> better way than sifting through logs?
>>>>>
>>>>> Thanks,
>>>>> Dharmesh
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>