Hey Stephen,

The spark on mesos is twice as fast as yarn on our 20 node cluster. In
> addition Mesos  is handling datasizes that yarn simply dies on  it. But
> mesos is  still just taking linearly increased time  compared to smaller
> datasizes.


Obviously delighted to hear that, BUT me not much like "but" :)
I've added Tim who is one of the main contributors to our Mesos/Spark
bindings, and it would be great to hear your use case/experience and find
out whether we can improve on that front too!

As the case may be, we could also jump on a hangout if it makes
conversation easier/faster.

Cheers,

*Marco Massenzio*

*Distributed Systems Engineerhttp://codetrips.com <http://codetrips.com>*

On Wed, Sep 9, 2015 at 1:33 PM, Stephen Boesch <[email protected]> wrote:

> Thanks Vinod. I went back to see the logs and nothing interesting .
> However int he process I found that my spark port was pointing to 7077
> instead of 5050. After re-running .. spark on mesos worked!
>
> The spark on mesos is twice as fast as yarn on our 20 node cluster. In
> addition Mesos  is handling datasizes that yarn simply dies on  it. But
> mesos is  still just taking linearly increased time  compared to smaller
> datasizes.
>
> We have significant additional work to incorporate mesos into operations
> and support but given the strong perforrmance and stability characterstics
> we are initially seeing here that effort is likely to get underway.
>
>
>
> 2015-09-09 12:54 GMT-07:00 Vinod Kone <[email protected]>:
>
>> sounds like it. can you see what the slave/agent and executor logs say?
>>
>> On Tue, Sep 8, 2015 at 11:46 AM, Stephen Boesch <[email protected]>
>> wrote:
>>
>>>
>>> I am in the process of learning how to run a mesos cluster with the
>>> intent for it to be the resource manager for Spark.  As a small step in
>>> that direction a basic test of mesos was performed, as suggested by the
>>> Mesos Getting Started page.
>>>
>>> In the following output we see tasks launched and resources offered on a
>>> 20 node cluster:
>>>
>>> [stack@yarnmaster-8245 build]$ ./src/examples/java/test-framework
>>> $(hostname -s):5050
>>> I0908 18:40:10.900964 31959 sched.cpp:157] Version: 0.23.0
>>> I0908 18:40:10.918957 32000 sched.cpp:254] New master detected at
>>> [email protected]:5050
>>> I0908 18:40:10.921525 32000 sched.cpp:264] No credentials provided.
>>> Attempting to register without authentication
>>> I0908 18:40:10.928963 31997 sched.cpp:448] Framework registered with
>>> 20150908-182014-2093760522-5050-15313-0000
>>> Registered! ID = 20150908-182014-2093760522-5050-15313-0000
>>> Received offer 20150908-182014-2093760522-5050-15313-O0 with cpus: 16.0
>>> and mem: 119855.0
>>> Launching task 0 using offer 20150908-182014-2093760522-5050-15313-O0
>>> Launching task 1 using offer 20150908-182014-2093760522-5050-15313-O0
>>> Launching task 2 using offer 20150908-182014-2093760522-5050-15313-O0
>>> Launching task 3 using offer 20150908-182014-2093760522-5050-15313-O0
>>> Launching task 4 using offer 20150908-182014-2093760522-5050-15313-O0
>>> Received offer 20150908-182014-2093760522-5050-15313-O1 with cpus: 16.0
>>> and mem: 119855.0
>>> Received offer 20150908-182014-2093760522-5050-15313-O2 with cpus: 16.0
>>> and mem: 119855.0
>>> Received offer 20150908-182014-2093760522-5050-15313-O3 with cpus: 16.0
>>> and mem: 119855.0
>>> Received offer 20150908-182014-2093760522-5050-15313-O4 with cpus: 16.0
>>> and mem: 119855.0
>>> Received offer 20150908-182014-2093760522-5050-15313-O5 with cpus: 16.0
>>> and mem: 119855.0
>>> Received offer 20150908-182014-2093760522-5050-15313-O6 with cpus: 16.0
>>> and mem: 119855.0
>>> Received offer 20150908-182014-2093760522-5050-15313-O7 with cpus: 16.0
>>> and mem: 119855.0
>>> Received offer 20150908-182014-2093760522-5050-15313-O8 with cpus: 16.0
>>> and mem: 119855.0
>>> Received offer 20150908-182014-2093760522-5050-15313-O9 with cpus: 16.0
>>> and mem: 119855.0
>>> Received offer 20150908-182014-2093760522-5050-15313-O10 with cpus: 16.0
>>> and mem: 119855.0
>>> Received offer 20150908-182014-2093760522-5050-15313-O11 with cpus: 16.0
>>> and mem: 119855.0
>>> Received offer 20150908-182014-2093760522-5050-15313-O12 with cpus: 16.0
>>> and mem: 119855.0
>>> Received offer 20150908-182014-2093760522-5050-15313-O13 with cpus: 16.0
>>> and mem: 119855.0
>>> Received offer 20150908-182014-2093760522-5050-15313-O14 with cpus: 16.0
>>> and mem: 119855.0
>>> Received offer 20150908-182014-2093760522-5050-15313-O15 with cpus: 16.0
>>> and mem: 119855.0
>>> Received offer 20150908-182014-2093760522-5050-15313-O16 with cpus: 16.0
>>> and mem: 119855.0
>>> Received offer 20150908-182014-2093760522-5050-15313-O17 with cpus: 16.0
>>> and mem: 119855.0
>>> Received offer 20150908-182014-2093760522-5050-15313-O18 with cpus: 16.0
>>> and mem: 119855.0
>>> Received offer 20150908-182014-2093760522-5050-15313-O19 with cpus: 16.0
>>> and mem: 119855.0
>>> Received offer 20150908-182014-2093760522-5050-15313-O20 with cpus: 16.0
>>> and mem: 119855.0
>>> Status update: task 0 is in state TASK_LOST
>>> Aborting because task 0 is in unexpected state TASK_LOST with reason
>>> 'REASON_EXECUTOR_TERMINATED' from source 'SOURCE_SLAVE' with message
>>> 'Executor terminated'
>>> I0908 18:40:12.466081 31996 sched.cpp:1625] Asked to abort the driver
>>> I0908 18:40:12.467051 31996 sched.cpp:861] Aborting framework
>>> '20150908-182014-2093760522-5050-15313-0000'
>>> I0908 18:40:12.468053 31959 sched.cpp:1591] Asked to stop the driver
>>> I0908 18:40:12.468683 31991 sched.cpp:835] Stopping framework
>>> '20150908-182014-2093760522-5050-15313-0000'
>>>
>>>
>>> Why did the task transition to TASK_LOST ?   Is there a misconfiguration
>>> on the cluster?
>>>
>>
>>
>

Reply via email to