As you know, Mesoscon Europe is fast approaching. At Mesoscon Europe, I'll
be giving a talk on our advanced, preempting, multi-tenant spark on Mesos
scheduler--Cook. Most excitingly, this framework will be fully open source
by then! So, you might be able to switch to Mesos even sooner.

If you're interested in giving it a spin sooner (in the next few days),
email me directly--we could use a new user's eyes on our documentation, to
make sure we didn't leave anything out.
On Fri, Sep 18, 2015 at 3:53 AM Marco Massenzio <[email protected]> wrote:

> Thanks, Stephen - feedback much appreciated!
>
> *Marco Massenzio*
>
> *Distributed Systems Engineerhttp://codetrips.com <http://codetrips.com>*
>
> On Thu, Sep 17, 2015 at 5:03 PM, Stephen Boesch <[email protected]> wrote:
>
>> Compared to Yarn Mesos is just faster. Mesos has a smaller  startup time
>> and the delay between tasks is smaller.  The run times for terasort 100GB
>> tended towards 110sec median on Mesos vs about double that on Yarn.
>>
>> Unfortunately we require mature Multi-Tenancy/Isolation/Queues support
>> -which is still initial stages of WIP for Mesos. So we will need to use
>> YARN for the near and likely medium term.
>>
>>
>>
>> 2015-09-17 15:52 GMT-07:00 Marco Massenzio <[email protected]>:
>>
>>> Hey Stephen,
>>>
>>> The spark on mesos is twice as fast as yarn on our 20 node cluster. In
>>>> addition Mesos  is handling datasizes that yarn simply dies on  it. But
>>>> mesos is  still just taking linearly increased time  compared to smaller
>>>> datasizes.
>>>
>>>
>>> Obviously delighted to hear that, BUT me not much like "but" :)
>>> I've added Tim who is one of the main contributors to our Mesos/Spark
>>> bindings, and it would be great to hear your use case/experience and find
>>> out whether we can improve on that front too!
>>>
>>> As the case may be, we could also jump on a hangout if it makes
>>> conversation easier/faster.
>>>
>>> Cheers,
>>>
>>> *Marco Massenzio*
>>>
>>> *Distributed Systems Engineerhttp://codetrips.com <http://codetrips.com>*
>>>
>>> On Wed, Sep 9, 2015 at 1:33 PM, Stephen Boesch <[email protected]>
>>> wrote:
>>>
>>>> Thanks Vinod. I went back to see the logs and nothing interesting .
>>>> However int he process I found that my spark port was pointing to 7077
>>>> instead of 5050. After re-running .. spark on mesos worked!
>>>>
>>>> The spark on mesos is twice as fast as yarn on our 20 node cluster. In
>>>> addition Mesos  is handling datasizes that yarn simply dies on  it. But
>>>> mesos is  still just taking linearly increased time  compared to smaller
>>>> datasizes.
>>>>
>>>> We have significant additional work to incorporate mesos into
>>>> operations and support but given the strong perforrmance and stability
>>>> characterstics we are initially seeing here that effort is likely to get
>>>> underway.
>>>>
>>>>
>>>>
>>>> 2015-09-09 12:54 GMT-07:00 Vinod Kone <[email protected]>:
>>>>
>>>>> sounds like it. can you see what the slave/agent and executor logs say?
>>>>>
>>>>> On Tue, Sep 8, 2015 at 11:46 AM, Stephen Boesch <[email protected]>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>> I am in the process of learning how to run a mesos cluster with the
>>>>>> intent for it to be the resource manager for Spark.  As a small step in
>>>>>> that direction a basic test of mesos was performed, as suggested by the
>>>>>> Mesos Getting Started page.
>>>>>>
>>>>>> In the following output we see tasks launched and resources offered
>>>>>> on a 20 node cluster:
>>>>>>
>>>>>> [stack@yarnmaster-8245 build]$ ./src/examples/java/test-framework
>>>>>> $(hostname -s):5050
>>>>>> I0908 18:40:10.900964 31959 sched.cpp:157] Version: 0.23.0
>>>>>> I0908 18:40:10.918957 32000 sched.cpp:254] New master detected at
>>>>>> [email protected]:5050
>>>>>> I0908 18:40:10.921525 32000 sched.cpp:264] No credentials provided.
>>>>>> Attempting to register without authentication
>>>>>> I0908 18:40:10.928963 31997 sched.cpp:448] Framework registered with
>>>>>> 20150908-182014-2093760522-5050-15313-0000
>>>>>> Registered! ID = 20150908-182014-2093760522-5050-15313-0000
>>>>>> Received offer 20150908-182014-2093760522-5050-15313-O0 with cpus:
>>>>>> 16.0 and mem: 119855.0
>>>>>> Launching task 0 using offer 20150908-182014-2093760522-5050-15313-O0
>>>>>> Launching task 1 using offer 20150908-182014-2093760522-5050-15313-O0
>>>>>> Launching task 2 using offer 20150908-182014-2093760522-5050-15313-O0
>>>>>> Launching task 3 using offer 20150908-182014-2093760522-5050-15313-O0
>>>>>> Launching task 4 using offer 20150908-182014-2093760522-5050-15313-O0
>>>>>> Received offer 20150908-182014-2093760522-5050-15313-O1 with cpus:
>>>>>> 16.0 and mem: 119855.0
>>>>>> Received offer 20150908-182014-2093760522-5050-15313-O2 with cpus:
>>>>>> 16.0 and mem: 119855.0
>>>>>> Received offer 20150908-182014-2093760522-5050-15313-O3 with cpus:
>>>>>> 16.0 and mem: 119855.0
>>>>>> Received offer 20150908-182014-2093760522-5050-15313-O4 with cpus:
>>>>>> 16.0 and mem: 119855.0
>>>>>> Received offer 20150908-182014-2093760522-5050-15313-O5 with cpus:
>>>>>> 16.0 and mem: 119855.0
>>>>>> Received offer 20150908-182014-2093760522-5050-15313-O6 with cpus:
>>>>>> 16.0 and mem: 119855.0
>>>>>> Received offer 20150908-182014-2093760522-5050-15313-O7 with cpus:
>>>>>> 16.0 and mem: 119855.0
>>>>>> Received offer 20150908-182014-2093760522-5050-15313-O8 with cpus:
>>>>>> 16.0 and mem: 119855.0
>>>>>> Received offer 20150908-182014-2093760522-5050-15313-O9 with cpus:
>>>>>> 16.0 and mem: 119855.0
>>>>>> Received offer 20150908-182014-2093760522-5050-15313-O10 with cpus:
>>>>>> 16.0 and mem: 119855.0
>>>>>> Received offer 20150908-182014-2093760522-5050-15313-O11 with cpus:
>>>>>> 16.0 and mem: 119855.0
>>>>>> Received offer 20150908-182014-2093760522-5050-15313-O12 with cpus:
>>>>>> 16.0 and mem: 119855.0
>>>>>> Received offer 20150908-182014-2093760522-5050-15313-O13 with cpus:
>>>>>> 16.0 and mem: 119855.0
>>>>>> Received offer 20150908-182014-2093760522-5050-15313-O14 with cpus:
>>>>>> 16.0 and mem: 119855.0
>>>>>> Received offer 20150908-182014-2093760522-5050-15313-O15 with cpus:
>>>>>> 16.0 and mem: 119855.0
>>>>>> Received offer 20150908-182014-2093760522-5050-15313-O16 with cpus:
>>>>>> 16.0 and mem: 119855.0
>>>>>> Received offer 20150908-182014-2093760522-5050-15313-O17 with cpus:
>>>>>> 16.0 and mem: 119855.0
>>>>>> Received offer 20150908-182014-2093760522-5050-15313-O18 with cpus:
>>>>>> 16.0 and mem: 119855.0
>>>>>> Received offer 20150908-182014-2093760522-5050-15313-O19 with cpus:
>>>>>> 16.0 and mem: 119855.0
>>>>>> Received offer 20150908-182014-2093760522-5050-15313-O20 with cpus:
>>>>>> 16.0 and mem: 119855.0
>>>>>> Status update: task 0 is in state TASK_LOST
>>>>>> Aborting because task 0 is in unexpected state TASK_LOST with reason
>>>>>> 'REASON_EXECUTOR_TERMINATED' from source 'SOURCE_SLAVE' with message
>>>>>> 'Executor terminated'
>>>>>> I0908 18:40:12.466081 31996 sched.cpp:1625] Asked to abort the driver
>>>>>> I0908 18:40:12.467051 31996 sched.cpp:861] Aborting framework
>>>>>> '20150908-182014-2093760522-5050-15313-0000'
>>>>>> I0908 18:40:12.468053 31959 sched.cpp:1591] Asked to stop the driver
>>>>>> I0908 18:40:12.468683 31991 sched.cpp:835] Stopping framework
>>>>>> '20150908-182014-2093760522-5050-15313-0000'
>>>>>>
>>>>>>
>>>>>> Why did the task transition to TASK_LOST ?   Is there a
>>>>>> misconfiguration on the cluster?
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Reply via email to