Yes, you're right, it was a networking issue. The problem was with the port
that group process is using to connect:

Group process ((2)@10.0.0.3:49355) connected to ZooKeeper

A connection to the port 49355 wasn't allowed, I've fixed it with exporting
port for the framework:

export LIBPROCESS_PORT=5055

and allowing connections to that port.


On 28 May 2014 19:39, Vinod Kone <[email protected]> wrote:

> Tomas, in your case the framework keeps disconnecting and re-connecting
> with the master (likely a networking issue). Since the reconnect happens
> within the failover timeout (1 week) the master doesn't remove it. What IPs
> are your framework and master using?
>
>
> On Wed, May 28, 2014 at 7:46 AM, Tomas Barton <[email protected]>
> wrote:
>
>> Hi,
>>
>> I have similar issue, Mesos is trying to keep alive a framework that is
>> crashing:
>>
>> I0528 16:42:52.487659  6009 master.cpp:929] Framework
>> 20140528-054038-316558480-5050-17117-0003 failed over
>> I0528 16:42:52.487927  6009 hierarchical_allocator_process.hpp:378]
>> Activated framework 20140528-054038-316558480-5050-17117-0003
>> I0528 16:42:52.488483  6009 master.cpp:2282] Sending 2 offers to
>> framework 20140528-054038-316558480-5050-17117-0003
>> I0528 16:42:52.488873  6009 master.cpp:592] Framework
>> 20140528-054038-316558480-5050-17117-0003 disconnected
>> I0528 16:42:52.488914  6009 master.cpp:1076] Deactivating framework
>> 20140528-054038-316558480-5050-17117-0003
>> I0528 16:42:52.489202  6009 master.cpp:614] Giving framework
>> 20140528-054038-316558480-5050-17117-0003 1weeks to failover
>> I0528 16:42:52.489279  6009 hierarchical_allocator_process.hpp:408]
>> Deactivated framework 20140528-054038-316558480-5050-17117-0003
>>
>> it's trying to recover the framework few times per second. Is there
>> currently a way how to remove that framework?
>>
>> Probably delete framework state from zookeeper?
>>
>> Tomas
>>
>>
>> On 28 May 2014 05:56, Manivannan <[email protected]> wrote:
>>
>>> Hi Vinod,
>>>
>>> Thanks for your reply. Please see inline.
>>>
>>> Thanks,
>>> Mani
>>>
>>>
>>> On Wed, May 28, 2014 at 3:57 AM, Vinod Kone <[email protected]> wrote:
>>>
>>>> Hi Mani,
>>>>
>>>> What do you mean by "stuck" framework? If the framework disconnects
>>>> from master and the failover timeout (configurable) has passed master
>>>> should remove the framework. - *I have a Mesos cluster and lot of
>>>> Jenkins instances talking to the cluster to provision slaves. Although I
>>>> have killed the Jenkins instanes, I still see that they are listed as
>>>> frameworks in Mesos(that is what I mentioned as stuck frameworks). What is
>>>> the default fail over timeout ? *
>>>>
>>>
>>>
>>>>
>>>> Also, there is currently work in progress to give operators the ability
>>>> to force remove a framework. See :
>>>> https://issues.apache.org/jira/browse/MESOS-1390 - *I believe this fix
>>>> would help me out.*
>>>>
>>>
>>>
>>>>
>>>>
>>>> On Tue, May 27, 2014 at 5:01 AM, Manivannan <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi ,
>>>>>
>>>>> My issue is similar to :
>>>>> https://issues.apache.org/jira/browse/MESOS-108
>>>>> Couple of  frameworks were stuck forever in my Mesos cluster. Is there
>>>>> a way to kill those frameworks ?
>>>>>
>>>>> Thanks,
>>>>> Mani
>>>>>
>>>>
>>>>
>>>
>>
>

Reply via email to