Thanks Vinod. I went back to see the logs and nothing interesting .
However int he process I found that my spark port was pointing to 7077
instead of 5050. After re-running .. spark on mesos worked!

The spark on mesos is twice as fast as yarn on our 20 node cluster. In
addition Mesos  is handling datasizes that yarn simply dies on  it. But
mesos is  still just taking linearly increased time  compared to smaller
datasizes.

We have significant additional work to incorporate mesos into operations
and support but given the strong perforrmance and stability characterstics
we are initially seeing here that effort is likely to get underway.



2015-09-09 12:54 GMT-07:00 Vinod Kone <[email protected]>:

> sounds like it. can you see what the slave/agent and executor logs say?
>
> On Tue, Sep 8, 2015 at 11:46 AM, Stephen Boesch <[email protected]> wrote:
>
>>
>> I am in the process of learning how to run a mesos cluster with the
>> intent for it to be the resource manager for Spark.  As a small step in
>> that direction a basic test of mesos was performed, as suggested by the
>> Mesos Getting Started page.
>>
>> In the following output we see tasks launched and resources offered on a
>> 20 node cluster:
>>
>> [stack@yarnmaster-8245 build]$ ./src/examples/java/test-framework
>> $(hostname -s):5050
>> I0908 18:40:10.900964 31959 sched.cpp:157] Version: 0.23.0
>> I0908 18:40:10.918957 32000 sched.cpp:254] New master detected at
>> [email protected]:5050
>> I0908 18:40:10.921525 32000 sched.cpp:264] No credentials provided.
>> Attempting to register without authentication
>> I0908 18:40:10.928963 31997 sched.cpp:448] Framework registered with
>> 20150908-182014-2093760522-5050-15313-0000
>> Registered! ID = 20150908-182014-2093760522-5050-15313-0000
>> Received offer 20150908-182014-2093760522-5050-15313-O0 with cpus: 16.0
>> and mem: 119855.0
>> Launching task 0 using offer 20150908-182014-2093760522-5050-15313-O0
>> Launching task 1 using offer 20150908-182014-2093760522-5050-15313-O0
>> Launching task 2 using offer 20150908-182014-2093760522-5050-15313-O0
>> Launching task 3 using offer 20150908-182014-2093760522-5050-15313-O0
>> Launching task 4 using offer 20150908-182014-2093760522-5050-15313-O0
>> Received offer 20150908-182014-2093760522-5050-15313-O1 with cpus: 16.0
>> and mem: 119855.0
>> Received offer 20150908-182014-2093760522-5050-15313-O2 with cpus: 16.0
>> and mem: 119855.0
>> Received offer 20150908-182014-2093760522-5050-15313-O3 with cpus: 16.0
>> and mem: 119855.0
>> Received offer 20150908-182014-2093760522-5050-15313-O4 with cpus: 16.0
>> and mem: 119855.0
>> Received offer 20150908-182014-2093760522-5050-15313-O5 with cpus: 16.0
>> and mem: 119855.0
>> Received offer 20150908-182014-2093760522-5050-15313-O6 with cpus: 16.0
>> and mem: 119855.0
>> Received offer 20150908-182014-2093760522-5050-15313-O7 with cpus: 16.0
>> and mem: 119855.0
>> Received offer 20150908-182014-2093760522-5050-15313-O8 with cpus: 16.0
>> and mem: 119855.0
>> Received offer 20150908-182014-2093760522-5050-15313-O9 with cpus: 16.0
>> and mem: 119855.0
>> Received offer 20150908-182014-2093760522-5050-15313-O10 with cpus: 16.0
>> and mem: 119855.0
>> Received offer 20150908-182014-2093760522-5050-15313-O11 with cpus: 16.0
>> and mem: 119855.0
>> Received offer 20150908-182014-2093760522-5050-15313-O12 with cpus: 16.0
>> and mem: 119855.0
>> Received offer 20150908-182014-2093760522-5050-15313-O13 with cpus: 16.0
>> and mem: 119855.0
>> Received offer 20150908-182014-2093760522-5050-15313-O14 with cpus: 16.0
>> and mem: 119855.0
>> Received offer 20150908-182014-2093760522-5050-15313-O15 with cpus: 16.0
>> and mem: 119855.0
>> Received offer 20150908-182014-2093760522-5050-15313-O16 with cpus: 16.0
>> and mem: 119855.0
>> Received offer 20150908-182014-2093760522-5050-15313-O17 with cpus: 16.0
>> and mem: 119855.0
>> Received offer 20150908-182014-2093760522-5050-15313-O18 with cpus: 16.0
>> and mem: 119855.0
>> Received offer 20150908-182014-2093760522-5050-15313-O19 with cpus: 16.0
>> and mem: 119855.0
>> Received offer 20150908-182014-2093760522-5050-15313-O20 with cpus: 16.0
>> and mem: 119855.0
>> Status update: task 0 is in state TASK_LOST
>> Aborting because task 0 is in unexpected state TASK_LOST with reason
>> 'REASON_EXECUTOR_TERMINATED' from source 'SOURCE_SLAVE' with message
>> 'Executor terminated'
>> I0908 18:40:12.466081 31996 sched.cpp:1625] Asked to abort the driver
>> I0908 18:40:12.467051 31996 sched.cpp:861] Aborting framework
>> '20150908-182014-2093760522-5050-15313-0000'
>> I0908 18:40:12.468053 31959 sched.cpp:1591] Asked to stop the driver
>> I0908 18:40:12.468683 31991 sched.cpp:835] Stopping framework
>> '20150908-182014-2093760522-5050-15313-0000'
>>
>>
>> Why did the task transition to TASK_LOST ?   Is there a misconfiguration
>> on the cluster?
>>
>
>

Reply via email to