I haven't yet looked at the logs for the mesos master when this issue happens. Just the logs of my framework. I will investigate further.
However, about 70% of the time everything works just fine. So I'm not sure if this is the issue, unless one of the containers didn't get configured correctly by minimesos. I also run the scheduler in a container, and use docker-java to start it in my tests. Thanks Eli > On 28 Sep. 2016, at 02:00, Hendrik Haddorp <[email protected]> wrote: > > On mini mesos you might have the same problem, like libprocess might bind to > the loopback device. The other tricky thing is what the individual docker > containers can reach and by what host name then can. I think my setup is > quite similar I just opted to not use mini mesos and created something quite > similar. In that setup I create a docker network for my containers so that > they can see each other by name. the driver for my framework is also running > in a container so that I don't need the native library on my system. This > also has the effect that libprocess should have no problems with the > communication as it is all using IPs from the same network. > > You might also want to check the master log. In my case I saw a framework > registration immediately followed by a disconnect. I believe this indicates > that there is a communication issue. > >> On 27.09.2016 15:09, Gmail wrote: >> Thanks Hendrik >> >> I have solved that particular problem, when running a different framework in >> docker. It was a bit of a challenge to get the right incantation of >> environment variables and ports defined, but is working reliably now. >> >> I mainly hit this issue when running my integration tests, where I also run >> the mesos master and agent in docker using mini mesos >> >> Sent from my iPad >> >>> On 27 Sep 2016, at 22:51, Hendrik Haddorp <[email protected]> wrote: >>> >>> Hi, >>> >>> this sounds quite like a problem I had hit a few days ago. If you are using >>> the mesos native library you need to make sure that the LIBPROCESS >>> environment variables are set correctly. Otherwise the Mesos master can not >>> communicate back to your process, especially if you are not running on the >>> same node as the master. Things gets slightly more tricky if your scheduler >>> is running in a docker container. >>> >>> regards, >>> Hendrik >>> >>>> On 27.09.2016 14:34, Eli Jordan wrote: >>>> Yes, it appears in the mesos ui, and stays there. I log all messages from >>>> the mesos master, including resource offers and disconnected. I don't >>>> receive offers or disconnected. >>>> >>>> I know I need to accept or decline the offers, the problem is that I never >>>> receive the resource offer, but the master thinks I have. >>>> >>>> This only happens sometimes, sometimes the framework starts just fine, and >>>> can launch tasks. Which is what led me to think it might be a timing issue. >>>> >>>> Thanks >>>> Eli >>>> >>>>> On 27 Sep. 2016, at 22:25, Olivier Sallou <[email protected] >>>>> <mailto:[email protected]>> wrote: >>>>> >>>>> >>>>> >>>>>> On 09/27/2016 02:08 PM, Gmail wrote: >>>>>> Hi >>>>>> >>>>>> I am implementing a mesos framework, and have hit a strange issue that I >>>>>> can't make sense of. Intermittently, my framework will receive the >>>>>> registered message, and is shown as registered in the mesos ui. >>>>>> >>>>>> I never see any resource offer messages being processed by the >>>>>> framework, however, the mesos master indicates that it has offered >>>>>> resources to the framework (on the frameworks page in the ui). In this >>>>>> case, I only have one slave, and all the resources are apparently being >>>>>> consumed by the framework, so no tasks can be launched. >>>>> Does your framework appear in mesos UI in the list fo frameworks ? (and >>>>> remains in the list) >>>>> >>>>> Maybe your framework is registered then disconnected. >>>>>> Anyone have an idea what the problem might be? >>>>>> >>>>>> One thought I had, is that the MesosSchedulerDriver isn't expecting the >>>>>> scheduler implementation to process messages asynchronously, but I >>>>>> couldn't find any documentation indicating one way or the other. In my >>>>>> case, I'm using akka actors, and all the scheduler implementation does >>>>>> is dispatch a message. >>>>> Do you log when you received offers? When you receive an offer you must >>>>> accept or decline the offers. >>>>> >>>>> Olivier >>>>>> Is this a possibility? >>>>>> >>>>>> Thanks >>>>>> Eli >>>>> -- >>>>> Olivier Sallou >>>>> IRISA / University of Rennes 1 >>>>> Campus de Beaulieu, 35000 RENNES - FRANCE >>>>> Tel: 02.99.84.71.95 >>>>> >>>>> gpg key id: 4096R/326D8438 (keyring.debian.org >>>>> <http://keyring.debian.org>) >>>>> Key fingerprint = 5FB4 6F83 D3B9 5204 6335 D26D 78DC 68DB 326D 8438 >

