Re: Troubles with slave recovery via Docker containerizer on 0.23.0

2015-08-06 Thread Benjamin Anderson
Hi Tim,

That's the output from `docker inspect`. I've gisted the full contents
of the container's log file (in all of its JSON-encoded glory) here:

https://gist.githubusercontent.com/banjiewen/6450a06f958a2e7630bf/raw/12183fe891c1ddaf7019b478278c47c479d77c01/gistfile1.txt

The slave itself isn't logging much of interest, just various
Executor has terminated with unknown status messages, etc.

For context, my container is running 0.23.0 installed from packages on
Ubuntu 14.04. Docker is at 1.6.2.

--
b

On Wed, Aug 5, 2015 at 4:28 PM, Tim Chen t...@mesosphere.io wrote:
 Hi Ben,

 Did you get the command from docker inspect or from the slave log?

 If it's from the slave log then we don't actually print out the exact way we
 exec the command, but just joining the exec arguments with a space in
 between.

 What's the exact error in the slave/sandbox stderr log?

 Tim


 On Wed, Aug 5, 2015 at 4:18 PM, Benjamin Anderson
 benja...@ivysoftworks.com wrote:

 Hi there - I'm working on setting up a Mesos environment with the
 Docker containerizer and can't seem to get the recovery feature
 working. I'm running CoreOS, so the slave processes themselves are
 containerized. I have no issues running jobs without the recovery
 features enabled, but all jobs fail to boot when I add the following
 flags:

 MESOS_DOCKER_KILL_ORPHANS=false
 MESOS_DOCKER_MESOS_IMAGE=myrepo/my-slave-container

 Inspecting the Docker images and their log output reveals that the
 container invocation appears to be flawed - see this gist:

 https://gist.github.com/banjiewen/a2dc1784a82ed87edd6b

 The containerizer is attempting to invoke an unquoted command via
 `/bin/sh -c`, which, predictably, fails to pass the complete command.
 This results in the error message shown in the second file in the
 linked gist.

 This is reproducible manually; quoting the arguments to `/bin/sh -c`
 results in success (at least, it correctly receives the supplied
 arguments).

 I gather that this is related to MESOS-2115, and it's clear that this
 patch[1] changed that behavior significantly, but if it introduced a
 bug I can't see it. It's possible that my instance is configured
 incorrectly as well; the documentation here is a bit vague and there
 aren't many examples on the web.

 Thanks in advance,
 --
 b

 [1]:
 https://github.com/apache/mesos/commit/3baa60965407bf0c3eb9c3da1b2ba7c0a4fee968




Re: Troubles with slave recovery via Docker containerizer on 0.23.0

2015-08-06 Thread Tim Chen
Got it, this shouldn't happen. Can you open a JIRA ticket? I'll try to
repro today.

Tim

On Thu, Aug 6, 2015 at 9:37 AM, Benjamin Anderson benja...@ivysoftworks.com
 wrote:

 Hi Tim,

 That's the output from `docker inspect`. I've gisted the full contents
 of the container's log file (in all of its JSON-encoded glory) here:


 https://gist.githubusercontent.com/banjiewen/6450a06f958a2e7630bf/raw/12183fe891c1ddaf7019b478278c47c479d77c01/gistfile1.txt

 The slave itself isn't logging much of interest, just various
 Executor has terminated with unknown status messages, etc.

 For context, my container is running 0.23.0 installed from packages on
 Ubuntu 14.04. Docker is at 1.6.2.

 --
 b

 On Wed, Aug 5, 2015 at 4:28 PM, Tim Chen t...@mesosphere.io wrote:
  Hi Ben,
 
  Did you get the command from docker inspect or from the slave log?
 
  If it's from the slave log then we don't actually print out the exact
 way we
  exec the command, but just joining the exec arguments with a space in
  between.
 
  What's the exact error in the slave/sandbox stderr log?
 
  Tim
 
 
  On Wed, Aug 5, 2015 at 4:18 PM, Benjamin Anderson
  benja...@ivysoftworks.com wrote:
 
  Hi there - I'm working on setting up a Mesos environment with the
  Docker containerizer and can't seem to get the recovery feature
  working. I'm running CoreOS, so the slave processes themselves are
  containerized. I have no issues running jobs without the recovery
  features enabled, but all jobs fail to boot when I add the following
  flags:
 
  MESOS_DOCKER_KILL_ORPHANS=false
  MESOS_DOCKER_MESOS_IMAGE=myrepo/my-slave-container
 
  Inspecting the Docker images and their log output reveals that the
  container invocation appears to be flawed - see this gist:
 
  https://gist.github.com/banjiewen/a2dc1784a82ed87edd6b
 
  The containerizer is attempting to invoke an unquoted command via
  `/bin/sh -c`, which, predictably, fails to pass the complete command.
  This results in the error message shown in the second file in the
  linked gist.
 
  This is reproducible manually; quoting the arguments to `/bin/sh -c`
  results in success (at least, it correctly receives the supplied
  arguments).
 
  I gather that this is related to MESOS-2115, and it's clear that this
  patch[1] changed that behavior significantly, but if it introduced a
  bug I can't see it. It's possible that my instance is configured
  incorrectly as well; the documentation here is a bit vague and there
  aren't many examples on the web.
 
  Thanks in advance,
  --
  b
 
  [1]:
 
 https://github.com/apache/mesos/commit/3baa60965407bf0c3eb9c3da1b2ba7c0a4fee968
 
 



Re: Troubles with slave recovery via Docker containerizer on 0.23.0

2015-08-06 Thread Benjamin Anderson
Awesome, thanks Tim.

https://issues.apache.org/jira/browse/MESOS-3219

--
b

On Thu, Aug 6, 2015 at 10:02 AM, Tim Chen t...@mesosphere.io wrote:
 Got it, this shouldn't happen. Can you open a JIRA ticket? I'll try to repro
 today.

 Tim

 On Thu, Aug 6, 2015 at 9:37 AM, Benjamin Anderson
 benja...@ivysoftworks.com wrote:

 Hi Tim,

 That's the output from `docker inspect`. I've gisted the full contents
 of the container's log file (in all of its JSON-encoded glory) here:


 https://gist.githubusercontent.com/banjiewen/6450a06f958a2e7630bf/raw/12183fe891c1ddaf7019b478278c47c479d77c01/gistfile1.txt

 The slave itself isn't logging much of interest, just various
 Executor has terminated with unknown status messages, etc.

 For context, my container is running 0.23.0 installed from packages on
 Ubuntu 14.04. Docker is at 1.6.2.

 --
 b

 On Wed, Aug 5, 2015 at 4:28 PM, Tim Chen t...@mesosphere.io wrote:
  Hi Ben,
 
  Did you get the command from docker inspect or from the slave log?
 
  If it's from the slave log then we don't actually print out the exact
  way we
  exec the command, but just joining the exec arguments with a space in
  between.
 
  What's the exact error in the slave/sandbox stderr log?
 
  Tim
 
 
  On Wed, Aug 5, 2015 at 4:18 PM, Benjamin Anderson
  benja...@ivysoftworks.com wrote:
 
  Hi there - I'm working on setting up a Mesos environment with the
  Docker containerizer and can't seem to get the recovery feature
  working. I'm running CoreOS, so the slave processes themselves are
  containerized. I have no issues running jobs without the recovery
  features enabled, but all jobs fail to boot when I add the following
  flags:
 
  MESOS_DOCKER_KILL_ORPHANS=false
  MESOS_DOCKER_MESOS_IMAGE=myrepo/my-slave-container
 
  Inspecting the Docker images and their log output reveals that the
  container invocation appears to be flawed - see this gist:
 
  https://gist.github.com/banjiewen/a2dc1784a82ed87edd6b
 
  The containerizer is attempting to invoke an unquoted command via
  `/bin/sh -c`, which, predictably, fails to pass the complete command.
  This results in the error message shown in the second file in the
  linked gist.
 
  This is reproducible manually; quoting the arguments to `/bin/sh -c`
  results in success (at least, it correctly receives the supplied
  arguments).
 
  I gather that this is related to MESOS-2115, and it's clear that this
  patch[1] changed that behavior significantly, but if it introduced a
  bug I can't see it. It's possible that my instance is configured
  incorrectly as well; the documentation here is a bit vague and there
  aren't many examples on the web.
 
  Thanks in advance,
  --
  b
 
  [1]:
 
  https://github.com/apache/mesos/commit/3baa60965407bf0c3eb9c3da1b2ba7c0a4fee968
 
 




Troubles with slave recovery via Docker containerizer on 0.23.0

2015-08-05 Thread Benjamin Anderson
Hi there - I'm working on setting up a Mesos environment with the
Docker containerizer and can't seem to get the recovery feature
working. I'm running CoreOS, so the slave processes themselves are
containerized. I have no issues running jobs without the recovery
features enabled, but all jobs fail to boot when I add the following
flags:

MESOS_DOCKER_KILL_ORPHANS=false
MESOS_DOCKER_MESOS_IMAGE=myrepo/my-slave-container

Inspecting the Docker images and their log output reveals that the
container invocation appears to be flawed - see this gist:

https://gist.github.com/banjiewen/a2dc1784a82ed87edd6b

The containerizer is attempting to invoke an unquoted command via
`/bin/sh -c`, which, predictably, fails to pass the complete command.
This results in the error message shown in the second file in the
linked gist.

This is reproducible manually; quoting the arguments to `/bin/sh -c`
results in success (at least, it correctly receives the supplied
arguments).

I gather that this is related to MESOS-2115, and it's clear that this
patch[1] changed that behavior significantly, but if it introduced a
bug I can't see it. It's possible that my instance is configured
incorrectly as well; the documentation here is a bit vague and there
aren't many examples on the web.

Thanks in advance,
--
b

[1]: 
https://github.com/apache/mesos/commit/3baa60965407bf0c3eb9c3da1b2ba7c0a4fee968


Re: Troubles with slave recovery via Docker containerizer on 0.23.0

2015-08-05 Thread Tim Chen
Hi Ben,

Did you get the command from docker inspect or from the slave log?

If it's from the slave log then we don't actually print out the exact way
we exec the command, but just joining the exec arguments with a space in
between.

What's the exact error in the slave/sandbox stderr log?

Tim


On Wed, Aug 5, 2015 at 4:18 PM, Benjamin Anderson benja...@ivysoftworks.com
 wrote:

 Hi there - I'm working on setting up a Mesos environment with the
 Docker containerizer and can't seem to get the recovery feature
 working. I'm running CoreOS, so the slave processes themselves are
 containerized. I have no issues running jobs without the recovery
 features enabled, but all jobs fail to boot when I add the following
 flags:

 MESOS_DOCKER_KILL_ORPHANS=false
 MESOS_DOCKER_MESOS_IMAGE=myrepo/my-slave-container

 Inspecting the Docker images and their log output reveals that the
 container invocation appears to be flawed - see this gist:

 https://gist.github.com/banjiewen/a2dc1784a82ed87edd6b

 The containerizer is attempting to invoke an unquoted command via
 `/bin/sh -c`, which, predictably, fails to pass the complete command.
 This results in the error message shown in the second file in the
 linked gist.

 This is reproducible manually; quoting the arguments to `/bin/sh -c`
 results in success (at least, it correctly receives the supplied
 arguments).

 I gather that this is related to MESOS-2115, and it's clear that this
 patch[1] changed that behavior significantly, but if it introduced a
 bug I can't see it. It's possible that my instance is configured
 incorrectly as well; the documentation here is a bit vague and there
 aren't many examples on the web.

 Thanks in advance,
 --
 b

 [1]:
 https://github.com/apache/mesos/commit/3baa60965407bf0c3eb9c3da1b2ba7c0a4fee968