It looks like the Marathon framework is continually failing over, have you sought help from the Marathon developers?
On Mon, Sep 23, 2013 at 2:52 AM, Damien Hardy <[email protected]> wrote: > Hello there, > > I might miss something about framework deployment on mesos. > > I try to get chronos or marathon frameworks working with HEAD of mesos > running distributed. > > I mesos topology seams OK slaves report to master and I can see offers of > resources (total available) on the mesos HTTP interface. > > 192.168.255.1 : marathon or chronos > 192.168.255.2 : zookeeper + mesos master > 192.168.255.3 : mesos slave > > Then I start marathon or chornos (HEAD version for both with pom.xml using > "<mesos.version>0.15.0-20130910-2</mesos.version>" for example. > > It seams succeed in finding master, I can see the frameworks listed. > But mesos services seams complain permanently, flooding logs on slave with > : > > ``` > 2013-09-23 11:35:37,405:2264(0x7faf54a73700):ZOO_DEBUG@zookeeper_process@1983: > Got ping response in 0 ms > W0923 11:35:38.002933 2267 slave.cpp:1322] Ignoring updating pid for > framework marathon-0.0.6 because it does not exist > W0923 11:35:38.359627 2269 slave.cpp:1322] Ignoring updating pid for > framework marathon-0.0.6 because it does not exist > W0923 11:35:39.003171 2266 slave.cpp:1322] Ignoring updating pid for > framework marathon-0.0.6 because it does not exist > ``` > > and master also with : > > I0923 11:35:33.420017 3685 master.cpp:734] Re-registering framework > marathon-0.0.6 at scheduler(1)@192.168.3.224:58107 > I0923 11:35:33.420178 3685 master.cpp:753] Framework marathon-0.0.6 > failed over > I0923 11:35:33.668504 3683 master.cpp:1445] Sending 1 offers to framework > marathon-0.0.6 > W0923 11:35:33.708227 3686 master.cpp:80] No whitelist given. Advertising > offers for all slaves > I0923 11:35:33.776002 3686 master.cpp:734] Re-registering framework > marathon-0.0.6 at scheduler(1)@192.168.3.224:58107 > I0923 11:35:33.776146 3686 master.cpp:753] Framework marathon-0.0.6 > failed over > I0923 11:35:33.776432 3684 hierarchical_allocator_process.hpp:598] > Recovered cpus(*):2; mem(*):2942; disk(*):35195; ports(*):[31000-32000] > (total allocatable: cpus(*):2; mem(*):2942; disk(*):35195; > ports(*):[31000-32000]) on slave 201309231034-50309312-5050-1111-2 from > framework marathon-0.0.6 > I0923 11:35:34.419661 3686 master.cpp:734] Re-registering framework > marathon-0.0.6 at scheduler(1)@192.168.3.224:58107 > I0923 11:35:34.419801 3686 master.cpp:753] Framework marathon-0.0.6 > failed over > I0923 11:35:34.669680 3684 master.cpp:1445] Sending 1 offers to framework > marathon-0.0.6 > I0923 11:35:34.776325 3684 master.cpp:734] Re-registering framework > marathon-0.0.6 at scheduler(1)@192.168.3.224:58107 > I0923 11:35:34.776445 3684 master.cpp:753] Framework marathon-0.0.6 > failed over > I0923 11:35:34.776748 3684 hierarchical_allocator_process.hpp:598] > Recovered cpus(*):2; mem(*):2942; disk(*):35195; ports(*):[31000-32000] > (total allocatable: cpus(*):2; mem(*):2942; disk(*):35195; > ports(*):[31000-32000]) on slave 201309231034-50309312-5050-1111-2 from > framework marathon-0.0.6 > > When I try to start a service with marathon : base on the example given : > > marathon -H http://192.168.255.1:8080 start -i chronos -u > https://s3.amazonaws.com/mesosphere-binaries-public/chronos/chronos.tgz-C > "./chronos/bin/demo ./chronos/config/nomail.yml > ./chronos/target/chronos-1.0-SNAPSHOT.jar" > Starting app 'chronos' > ERROR: > > Seams to be there : > > marathon -H http://192.168.255.1:8080 list > App ID: chronos > Command: ./chronos/bin/demo ./chronos/config/nomail.yml > ./chronos/target/chronos-1.0-SNAPSHOT.jar > Instances: 1 > CPUs: 1.0 > Memory: 10.0 MB > URI: > https://s3.amazonaws.com/mesosphere-binaries-public/chronos/chronos.tgz > > chronos have the same problem about non existing id on slave, I can create > scheduled command but it is never executed. > > Thank you for any help understanding this. > > -- > Damien HARDY >

