Yes, you're right, it was a networking issue. The problem was with the port that group process is using to connect:
Group process ((2)@10.0.0.3:49355) connected to ZooKeeper A connection to the port 49355 wasn't allowed, I've fixed it with exporting port for the framework: export LIBPROCESS_PORT=5055 and allowing connections to that port. On 28 May 2014 19:39, Vinod Kone <[email protected]> wrote: > Tomas, in your case the framework keeps disconnecting and re-connecting > with the master (likely a networking issue). Since the reconnect happens > within the failover timeout (1 week) the master doesn't remove it. What IPs > are your framework and master using? > > > On Wed, May 28, 2014 at 7:46 AM, Tomas Barton <[email protected]> > wrote: > >> Hi, >> >> I have similar issue, Mesos is trying to keep alive a framework that is >> crashing: >> >> I0528 16:42:52.487659 6009 master.cpp:929] Framework >> 20140528-054038-316558480-5050-17117-0003 failed over >> I0528 16:42:52.487927 6009 hierarchical_allocator_process.hpp:378] >> Activated framework 20140528-054038-316558480-5050-17117-0003 >> I0528 16:42:52.488483 6009 master.cpp:2282] Sending 2 offers to >> framework 20140528-054038-316558480-5050-17117-0003 >> I0528 16:42:52.488873 6009 master.cpp:592] Framework >> 20140528-054038-316558480-5050-17117-0003 disconnected >> I0528 16:42:52.488914 6009 master.cpp:1076] Deactivating framework >> 20140528-054038-316558480-5050-17117-0003 >> I0528 16:42:52.489202 6009 master.cpp:614] Giving framework >> 20140528-054038-316558480-5050-17117-0003 1weeks to failover >> I0528 16:42:52.489279 6009 hierarchical_allocator_process.hpp:408] >> Deactivated framework 20140528-054038-316558480-5050-17117-0003 >> >> it's trying to recover the framework few times per second. Is there >> currently a way how to remove that framework? >> >> Probably delete framework state from zookeeper? >> >> Tomas >> >> >> On 28 May 2014 05:56, Manivannan <[email protected]> wrote: >> >>> Hi Vinod, >>> >>> Thanks for your reply. Please see inline. >>> >>> Thanks, >>> Mani >>> >>> >>> On Wed, May 28, 2014 at 3:57 AM, Vinod Kone <[email protected]> wrote: >>> >>>> Hi Mani, >>>> >>>> What do you mean by "stuck" framework? If the framework disconnects >>>> from master and the failover timeout (configurable) has passed master >>>> should remove the framework. - *I have a Mesos cluster and lot of >>>> Jenkins instances talking to the cluster to provision slaves. Although I >>>> have killed the Jenkins instanes, I still see that they are listed as >>>> frameworks in Mesos(that is what I mentioned as stuck frameworks). What is >>>> the default fail over timeout ? * >>>> >>> >>> >>>> >>>> Also, there is currently work in progress to give operators the ability >>>> to force remove a framework. See : >>>> https://issues.apache.org/jira/browse/MESOS-1390 - *I believe this fix >>>> would help me out.* >>>> >>> >>> >>>> >>>> >>>> On Tue, May 27, 2014 at 5:01 AM, Manivannan <[email protected]> >>>> wrote: >>>> >>>>> Hi , >>>>> >>>>> My issue is similar to : >>>>> https://issues.apache.org/jira/browse/MESOS-108 >>>>> Couple of frameworks were stuck forever in my Mesos cluster. Is there >>>>> a way to kill those frameworks ? >>>>> >>>>> Thanks, >>>>> Mani >>>>> >>>> >>>> >>> >> >

