On mini mesos you might have the same problem, like libprocess might
bind to the loopback device. The other tricky thing is what the
individual docker containers can reach and by what host name then can. I
think my setup is quite similar I just opted to not use mini mesos and
created something quite similar. In that setup I create a docker network
for my containers so that they can see each other by name. the driver
for my framework is also running in a container so that I don't need the
native library on my system. This also has the effect that libprocess
should have no problems with the communication as it is all using IPs
from the same network.
You might also want to check the master log. In my case I saw a
framework registration immediately followed by a disconnect. I believe
this indicates that there is a communication issue.
On 27.09.2016 15:09, Gmail wrote:
Thanks Hendrik
I have solved that particular problem, when running a different framework in
docker. It was a bit of a challenge to get the right incantation of environment
variables and ports defined, but is working reliably now.
I mainly hit this issue when running my integration tests, where I also run the
mesos master and agent in docker using mini mesos
Sent from my iPad
On 27 Sep 2016, at 22:51, Hendrik Haddorp <[email protected]> wrote:
Hi,
this sounds quite like a problem I had hit a few days ago. If you are using the
mesos native library you need to make sure that the LIBPROCESS environment
variables are set correctly. Otherwise the Mesos master can not communicate
back to your process, especially if you are not running on the same node as the
master. Things gets slightly more tricky if your scheduler is running in a
docker container.
regards,
Hendrik
On 27.09.2016 14:34, Eli Jordan wrote:
Yes, it appears in the mesos ui, and stays there. I log all messages from the
mesos master, including resource offers and disconnected. I don't receive
offers or disconnected.
I know I need to accept or decline the offers, the problem is that I never
receive the resource offer, but the master thinks I have.
This only happens sometimes, sometimes the framework starts just fine, and can
launch tasks. Which is what led me to think it might be a timing issue.
Thanks
Eli
On 27 Sep. 2016, at 22:25, Olivier Sallou <[email protected]
<mailto:[email protected]>> wrote:
On 09/27/2016 02:08 PM, Gmail wrote:
Hi
I am implementing a mesos framework, and have hit a strange issue that I can't
make sense of. Intermittently, my framework will receive the registered
message, and is shown as registered in the mesos ui.
I never see any resource offer messages being processed by the framework,
however, the mesos master indicates that it has offered resources to the
framework (on the frameworks page in the ui). In this case, I only have one
slave, and all the resources are apparently being consumed by the framework, so
no tasks can be launched.
Does your framework appear in mesos UI in the list fo frameworks ? (and
remains in the list)
Maybe your framework is registered then disconnected.
Anyone have an idea what the problem might be?
One thought I had, is that the MesosSchedulerDriver isn't expecting the
scheduler implementation to process messages asynchronously, but I couldn't
find any documentation indicating one way or the other. In my case, I'm using
akka actors, and all the scheduler implementation does is dispatch a message.
Do you log when you received offers? When you receive an offer you must
accept or decline the offers.
Olivier
Is this a possibility?
Thanks
Eli
--
Olivier Sallou
IRISA / University of Rennes 1
Campus de Beaulieu, 35000 RENNES - FRANCE
Tel: 02.99.84.71.95
gpg key id: 4096R/326D8438 (keyring.debian.org <http://keyring.debian.org>)
Key fingerprint = 5FB4 6F83 D3B9 5204 6335 D26D 78DC 68DB 326D 8438