I haven't yet looked at the logs for the mesos master when this issue happens. 
Just the logs of my framework. I will investigate further.

However, about 70% of the time everything works just fine. So I'm not sure if 
this is the issue, unless one of the containers didn't get configured correctly 
by minimesos.

I also run the scheduler in a container, and use docker-java to start it in my 
tests. 

Thanks
Eli

> On 28 Sep. 2016, at 02:00, Hendrik Haddorp <[email protected]> wrote:
> 
> On mini mesos you might have the same problem, like libprocess might bind to 
> the loopback device. The other tricky thing is what the individual docker 
> containers can reach and by what host name then can. I think my setup is 
> quite similar I just opted to not use mini mesos and created something quite 
> similar. In that setup I create a docker network for my containers so that 
> they can see each other by name. the driver for my framework is also running 
> in a container so that I don't need the native library on my system. This 
> also has the effect that libprocess should have no problems with the 
> communication as it is all using IPs from the same network.
> 
> You might also want to check the master log. In my case I saw a framework 
> registration immediately followed by a disconnect. I believe this indicates 
> that there is a communication issue.
> 
>> On 27.09.2016 15:09, Gmail wrote:
>> Thanks Hendrik
>> 
>> I have solved that particular problem, when running a different framework in 
>> docker. It was a bit of a challenge to get the right incantation of 
>> environment variables and ports defined, but is working reliably now.
>> 
>> I mainly hit this issue when running my integration tests, where I also run 
>> the mesos master and agent in docker using mini mesos
>> 
>> Sent from my iPad
>> 
>>> On 27 Sep 2016, at 22:51, Hendrik Haddorp <[email protected]> wrote:
>>> 
>>> Hi,
>>> 
>>> this sounds quite like a problem I had hit a few days ago. If you are using 
>>> the mesos native library you need to make sure that the LIBPROCESS 
>>> environment variables are set correctly. Otherwise the Mesos master can not 
>>> communicate back to your process, especially if you are not running on the 
>>> same node as the master. Things gets slightly more tricky if your scheduler 
>>> is running in a docker container.
>>> 
>>> regards,
>>> Hendrik
>>> 
>>>> On 27.09.2016 14:34, Eli Jordan wrote:
>>>> Yes, it appears in the mesos ui, and stays there. I log all messages from 
>>>> the mesos master, including resource offers and disconnected. I don't 
>>>> receive offers or disconnected.
>>>> 
>>>> I know I need to accept or decline the offers, the problem is that I never 
>>>> receive the resource offer, but the master thinks I have.
>>>> 
>>>> This only happens sometimes, sometimes the framework starts just fine, and 
>>>> can launch tasks. Which is what led me to think it might be a timing issue.
>>>> 
>>>> Thanks
>>>> Eli
>>>> 
>>>>> On 27 Sep. 2016, at 22:25, Olivier Sallou <[email protected] 
>>>>> <mailto:[email protected]>> wrote:
>>>>> 
>>>>> 
>>>>> 
>>>>>> On 09/27/2016 02:08 PM, Gmail wrote:
>>>>>> Hi
>>>>>> 
>>>>>> I am implementing a mesos framework, and have hit a strange issue that I 
>>>>>> can't make sense of. Intermittently, my framework will receive the 
>>>>>> registered message, and is shown as registered in the mesos ui.
>>>>>> 
>>>>>> I never see any resource offer messages being processed by the 
>>>>>> framework, however, the mesos master indicates that it has offered 
>>>>>> resources to the framework (on the frameworks page in the ui). In this 
>>>>>> case, I only have one slave, and all the resources are apparently being 
>>>>>> consumed by the framework, so no tasks can be launched.
>>>>> Does your framework appear in mesos UI in the list fo frameworks ? (and
>>>>> remains in the list)
>>>>> 
>>>>> Maybe your framework is registered then disconnected.
>>>>>> Anyone have an idea what the problem might be?
>>>>>> 
>>>>>> One thought I had, is that the MesosSchedulerDriver isn't expecting the 
>>>>>> scheduler implementation to process messages asynchronously, but I 
>>>>>> couldn't find any documentation indicating one way or the other. In my 
>>>>>> case, I'm using akka actors, and all the scheduler implementation does 
>>>>>> is dispatch a message.
>>>>> Do you log when you received offers? When you receive an offer you must
>>>>> accept or decline the offers.
>>>>> 
>>>>> Olivier
>>>>>> Is this a possibility?
>>>>>> 
>>>>>> Thanks
>>>>>> Eli
>>>>> -- 
>>>>> Olivier Sallou
>>>>> IRISA / University of Rennes 1
>>>>> Campus de Beaulieu, 35000 RENNES - FRANCE
>>>>> Tel: 02.99.84.71.95
>>>>> 
>>>>> gpg key id: 4096R/326D8438  (keyring.debian.org 
>>>>> <http://keyring.debian.org>)
>>>>> Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438
> 

Reply via email to