Re: Agent won't start

2016-03-30 Thread Pradeep Chhetri
Hello Paul, Few things to note here: 1. Whenever, you change value of any *resource *or any *attribute* (Description: http://mesos.apache.org/documentation/latest/attributes-resources/), you need to cleanup the work_dir (rm -rf /tmp/mesos) and restart the slave. 2. You muse be already knowing

Re: Agent won't start

2016-03-30 Thread Paul Bell
Greg, thanks again - I am planning on moving my work_dir. Pradeep, thanks again. In a slightly different scenario, namely, service mesos-slave stop edit /etc/default/mesos-slave (add a port resource) service mesos-slave start I noticed that slave did not start and - again - the log shows

Re: Agent won't start

2016-03-29 Thread Greg Mann
Check out this link for info on /tmp cleanup in Ubuntu: http://askubuntu.com/questions/20783/how-is-the-tmp-directory-cleaned-up And check out this link for information on some of the work_dir's contents on a Mesos agent: http://mesos.apache.org/documentation/latest/sandbox/ The work_dir

Re: Agent won't start

2016-03-29 Thread Paul Bell
Hi Pradeep, And thank you for your reply! That, too, is very interesting. I think I need to synthesize what you and Greg are telling me and come up with a clean solution. Agent nodes can crash. Moreover, I can stop the mesos-slave service, and start it later with a reboot in between. So I am

Re: Agent won't start

2016-03-29 Thread Paul Bell
Whoa...interessant! The node *may* have been rebooted. Uptime says 2 days. I'll need to check my notes. Can you point me to reference re Ubuntu behavior? Based on what you've told me so far, it sounds as if the sequence: stop service reboot agent node start service could lead to trouble - or

Re: Agent won't start

2016-03-29 Thread Pradeep Chhetri
Hello Paul, >From the logs, it looks like, on starting the mesos slave, it is trying to do slave recovery ( http://mesos.apache.org/documentation/latest/slave-recovery/) but since the resources.info is unavailable, it is unable to perform the recovery & hence end up killing itself. If you are

Re: Agent won't start

2016-03-29 Thread Greg Mann
Paul, This would be relevant for any system which is automatically deleting files in /tmp. It looks like in Ubuntu, the default behavior is for /tmp to be completely nuked at boot time. Was the agent node rebooted prior to this problem? On Tue, Mar 29, 2016 at 2:29 PM, Paul Bell

Re: Agent won't start

2016-03-29 Thread Paul Bell
Hi Greg, Thanks very much for your quick reply. I simply forgot to mention platform. It's Ubuntu 14.04 LTS and it's not systemd. I will look at the link you provide. Is there any chance that it might apply to non-systemd platforms? Cordially, Paul On Tue, Mar 29, 2016 at 5:18 PM, Greg Mann

Re: Agent won't start

2016-03-29 Thread Greg Mann
Hi Paul, Noticing the logging output, "Failed to find resources file '/tmp/mesos/meta/resources/resources.info'", I wonder if your trouble may be related to the location of your agent's work_dir. See this ticket: https://issues.apache.org/jira/browse/MESOS-4541 Some users have reported issues

Agent won't start

2016-03-29 Thread Paul Bell
Hi, I am hoping someone can shed some light on this. An agent node failed to start, that is, when I did "service mesos-slave start" the service came up briefly & then stopped. Before stopping it produced the log shown below. The last thing it wrote is "Trying to create path '/mesos' in