Thanks, Stephen - feedback much appreciated! *Marco Massenzio*
*Distributed Systems Engineerhttp://codetrips.com <http://codetrips.com>* On Thu, Sep 17, 2015 at 5:03 PM, Stephen Boesch <[email protected]> wrote: > Compared to Yarn Mesos is just faster. Mesos has a smaller startup time > and the delay between tasks is smaller. The run times for terasort 100GB > tended towards 110sec median on Mesos vs about double that on Yarn. > > Unfortunately we require mature Multi-Tenancy/Isolation/Queues support > -which is still initial stages of WIP for Mesos. So we will need to use > YARN for the near and likely medium term. > > > > 2015-09-17 15:52 GMT-07:00 Marco Massenzio <[email protected]>: > >> Hey Stephen, >> >> The spark on mesos is twice as fast as yarn on our 20 node cluster. In >>> addition Mesos is handling datasizes that yarn simply dies on it. But >>> mesos is still just taking linearly increased time compared to smaller >>> datasizes. >> >> >> Obviously delighted to hear that, BUT me not much like "but" :) >> I've added Tim who is one of the main contributors to our Mesos/Spark >> bindings, and it would be great to hear your use case/experience and find >> out whether we can improve on that front too! >> >> As the case may be, we could also jump on a hangout if it makes >> conversation easier/faster. >> >> Cheers, >> >> *Marco Massenzio* >> >> *Distributed Systems Engineerhttp://codetrips.com <http://codetrips.com>* >> >> On Wed, Sep 9, 2015 at 1:33 PM, Stephen Boesch <[email protected]> wrote: >> >>> Thanks Vinod. I went back to see the logs and nothing interesting . >>> However int he process I found that my spark port was pointing to 7077 >>> instead of 5050. After re-running .. spark on mesos worked! >>> >>> The spark on mesos is twice as fast as yarn on our 20 node cluster. In >>> addition Mesos is handling datasizes that yarn simply dies on it. But >>> mesos is still just taking linearly increased time compared to smaller >>> datasizes. >>> >>> We have significant additional work to incorporate mesos into operations >>> and support but given the strong perforrmance and stability characterstics >>> we are initially seeing here that effort is likely to get underway. >>> >>> >>> >>> 2015-09-09 12:54 GMT-07:00 Vinod Kone <[email protected]>: >>> >>>> sounds like it. can you see what the slave/agent and executor logs say? >>>> >>>> On Tue, Sep 8, 2015 at 11:46 AM, Stephen Boesch <[email protected]> >>>> wrote: >>>> >>>>> >>>>> I am in the process of learning how to run a mesos cluster with the >>>>> intent for it to be the resource manager for Spark. As a small step in >>>>> that direction a basic test of mesos was performed, as suggested by the >>>>> Mesos Getting Started page. >>>>> >>>>> In the following output we see tasks launched and resources offered on >>>>> a 20 node cluster: >>>>> >>>>> [stack@yarnmaster-8245 build]$ ./src/examples/java/test-framework >>>>> $(hostname -s):5050 >>>>> I0908 18:40:10.900964 31959 sched.cpp:157] Version: 0.23.0 >>>>> I0908 18:40:10.918957 32000 sched.cpp:254] New master detected at >>>>> [email protected]:5050 >>>>> I0908 18:40:10.921525 32000 sched.cpp:264] No credentials provided. >>>>> Attempting to register without authentication >>>>> I0908 18:40:10.928963 31997 sched.cpp:448] Framework registered with >>>>> 20150908-182014-2093760522-5050-15313-0000 >>>>> Registered! ID = 20150908-182014-2093760522-5050-15313-0000 >>>>> Received offer 20150908-182014-2093760522-5050-15313-O0 with cpus: >>>>> 16.0 and mem: 119855.0 >>>>> Launching task 0 using offer 20150908-182014-2093760522-5050-15313-O0 >>>>> Launching task 1 using offer 20150908-182014-2093760522-5050-15313-O0 >>>>> Launching task 2 using offer 20150908-182014-2093760522-5050-15313-O0 >>>>> Launching task 3 using offer 20150908-182014-2093760522-5050-15313-O0 >>>>> Launching task 4 using offer 20150908-182014-2093760522-5050-15313-O0 >>>>> Received offer 20150908-182014-2093760522-5050-15313-O1 with cpus: >>>>> 16.0 and mem: 119855.0 >>>>> Received offer 20150908-182014-2093760522-5050-15313-O2 with cpus: >>>>> 16.0 and mem: 119855.0 >>>>> Received offer 20150908-182014-2093760522-5050-15313-O3 with cpus: >>>>> 16.0 and mem: 119855.0 >>>>> Received offer 20150908-182014-2093760522-5050-15313-O4 with cpus: >>>>> 16.0 and mem: 119855.0 >>>>> Received offer 20150908-182014-2093760522-5050-15313-O5 with cpus: >>>>> 16.0 and mem: 119855.0 >>>>> Received offer 20150908-182014-2093760522-5050-15313-O6 with cpus: >>>>> 16.0 and mem: 119855.0 >>>>> Received offer 20150908-182014-2093760522-5050-15313-O7 with cpus: >>>>> 16.0 and mem: 119855.0 >>>>> Received offer 20150908-182014-2093760522-5050-15313-O8 with cpus: >>>>> 16.0 and mem: 119855.0 >>>>> Received offer 20150908-182014-2093760522-5050-15313-O9 with cpus: >>>>> 16.0 and mem: 119855.0 >>>>> Received offer 20150908-182014-2093760522-5050-15313-O10 with cpus: >>>>> 16.0 and mem: 119855.0 >>>>> Received offer 20150908-182014-2093760522-5050-15313-O11 with cpus: >>>>> 16.0 and mem: 119855.0 >>>>> Received offer 20150908-182014-2093760522-5050-15313-O12 with cpus: >>>>> 16.0 and mem: 119855.0 >>>>> Received offer 20150908-182014-2093760522-5050-15313-O13 with cpus: >>>>> 16.0 and mem: 119855.0 >>>>> Received offer 20150908-182014-2093760522-5050-15313-O14 with cpus: >>>>> 16.0 and mem: 119855.0 >>>>> Received offer 20150908-182014-2093760522-5050-15313-O15 with cpus: >>>>> 16.0 and mem: 119855.0 >>>>> Received offer 20150908-182014-2093760522-5050-15313-O16 with cpus: >>>>> 16.0 and mem: 119855.0 >>>>> Received offer 20150908-182014-2093760522-5050-15313-O17 with cpus: >>>>> 16.0 and mem: 119855.0 >>>>> Received offer 20150908-182014-2093760522-5050-15313-O18 with cpus: >>>>> 16.0 and mem: 119855.0 >>>>> Received offer 20150908-182014-2093760522-5050-15313-O19 with cpus: >>>>> 16.0 and mem: 119855.0 >>>>> Received offer 20150908-182014-2093760522-5050-15313-O20 with cpus: >>>>> 16.0 and mem: 119855.0 >>>>> Status update: task 0 is in state TASK_LOST >>>>> Aborting because task 0 is in unexpected state TASK_LOST with reason >>>>> 'REASON_EXECUTOR_TERMINATED' from source 'SOURCE_SLAVE' with message >>>>> 'Executor terminated' >>>>> I0908 18:40:12.466081 31996 sched.cpp:1625] Asked to abort the driver >>>>> I0908 18:40:12.467051 31996 sched.cpp:861] Aborting framework >>>>> '20150908-182014-2093760522-5050-15313-0000' >>>>> I0908 18:40:12.468053 31959 sched.cpp:1591] Asked to stop the driver >>>>> I0908 18:40:12.468683 31991 sched.cpp:835] Stopping framework >>>>> '20150908-182014-2093760522-5050-15313-0000' >>>>> >>>>> >>>>> Why did the task transition to TASK_LOST ? Is there a >>>>> misconfiguration on the cluster? >>>>> >>>> >>>> >>> >> >

