Re: Troubles with slave recovery via Docker containerizer on 0.23.0
Hi Tim, That's the output from `docker inspect`. I've gisted the full contents of the container's log file (in all of its JSON-encoded glory) here: https://gist.githubusercontent.com/banjiewen/6450a06f958a2e7630bf/raw/12183fe891c1ddaf7019b478278c47c479d77c01/gistfile1.txt The slave itself isn't logging much of interest, just various Executor has terminated with unknown status messages, etc. For context, my container is running 0.23.0 installed from packages on Ubuntu 14.04. Docker is at 1.6.2. -- b On Wed, Aug 5, 2015 at 4:28 PM, Tim Chen t...@mesosphere.io wrote: Hi Ben, Did you get the command from docker inspect or from the slave log? If it's from the slave log then we don't actually print out the exact way we exec the command, but just joining the exec arguments with a space in between. What's the exact error in the slave/sandbox stderr log? Tim On Wed, Aug 5, 2015 at 4:18 PM, Benjamin Anderson benja...@ivysoftworks.com wrote: Hi there - I'm working on setting up a Mesos environment with the Docker containerizer and can't seem to get the recovery feature working. I'm running CoreOS, so the slave processes themselves are containerized. I have no issues running jobs without the recovery features enabled, but all jobs fail to boot when I add the following flags: MESOS_DOCKER_KILL_ORPHANS=false MESOS_DOCKER_MESOS_IMAGE=myrepo/my-slave-container Inspecting the Docker images and their log output reveals that the container invocation appears to be flawed - see this gist: https://gist.github.com/banjiewen/a2dc1784a82ed87edd6b The containerizer is attempting to invoke an unquoted command via `/bin/sh -c`, which, predictably, fails to pass the complete command. This results in the error message shown in the second file in the linked gist. This is reproducible manually; quoting the arguments to `/bin/sh -c` results in success (at least, it correctly receives the supplied arguments). I gather that this is related to MESOS-2115, and it's clear that this patch[1] changed that behavior significantly, but if it introduced a bug I can't see it. It's possible that my instance is configured incorrectly as well; the documentation here is a bit vague and there aren't many examples on the web. Thanks in advance, -- b [1]: https://github.com/apache/mesos/commit/3baa60965407bf0c3eb9c3da1b2ba7c0a4fee968
Re: Troubles with slave recovery via Docker containerizer on 0.23.0
Got it, this shouldn't happen. Can you open a JIRA ticket? I'll try to repro today. Tim On Thu, Aug 6, 2015 at 9:37 AM, Benjamin Anderson benja...@ivysoftworks.com wrote: Hi Tim, That's the output from `docker inspect`. I've gisted the full contents of the container's log file (in all of its JSON-encoded glory) here: https://gist.githubusercontent.com/banjiewen/6450a06f958a2e7630bf/raw/12183fe891c1ddaf7019b478278c47c479d77c01/gistfile1.txt The slave itself isn't logging much of interest, just various Executor has terminated with unknown status messages, etc. For context, my container is running 0.23.0 installed from packages on Ubuntu 14.04. Docker is at 1.6.2. -- b On Wed, Aug 5, 2015 at 4:28 PM, Tim Chen t...@mesosphere.io wrote: Hi Ben, Did you get the command from docker inspect or from the slave log? If it's from the slave log then we don't actually print out the exact way we exec the command, but just joining the exec arguments with a space in between. What's the exact error in the slave/sandbox stderr log? Tim On Wed, Aug 5, 2015 at 4:18 PM, Benjamin Anderson benja...@ivysoftworks.com wrote: Hi there - I'm working on setting up a Mesos environment with the Docker containerizer and can't seem to get the recovery feature working. I'm running CoreOS, so the slave processes themselves are containerized. I have no issues running jobs without the recovery features enabled, but all jobs fail to boot when I add the following flags: MESOS_DOCKER_KILL_ORPHANS=false MESOS_DOCKER_MESOS_IMAGE=myrepo/my-slave-container Inspecting the Docker images and their log output reveals that the container invocation appears to be flawed - see this gist: https://gist.github.com/banjiewen/a2dc1784a82ed87edd6b The containerizer is attempting to invoke an unquoted command via `/bin/sh -c`, which, predictably, fails to pass the complete command. This results in the error message shown in the second file in the linked gist. This is reproducible manually; quoting the arguments to `/bin/sh -c` results in success (at least, it correctly receives the supplied arguments). I gather that this is related to MESOS-2115, and it's clear that this patch[1] changed that behavior significantly, but if it introduced a bug I can't see it. It's possible that my instance is configured incorrectly as well; the documentation here is a bit vague and there aren't many examples on the web. Thanks in advance, -- b [1]: https://github.com/apache/mesos/commit/3baa60965407bf0c3eb9c3da1b2ba7c0a4fee968
Re: Troubles with slave recovery via Docker containerizer on 0.23.0
Awesome, thanks Tim. https://issues.apache.org/jira/browse/MESOS-3219 -- b On Thu, Aug 6, 2015 at 10:02 AM, Tim Chen t...@mesosphere.io wrote: Got it, this shouldn't happen. Can you open a JIRA ticket? I'll try to repro today. Tim On Thu, Aug 6, 2015 at 9:37 AM, Benjamin Anderson benja...@ivysoftworks.com wrote: Hi Tim, That's the output from `docker inspect`. I've gisted the full contents of the container's log file (in all of its JSON-encoded glory) here: https://gist.githubusercontent.com/banjiewen/6450a06f958a2e7630bf/raw/12183fe891c1ddaf7019b478278c47c479d77c01/gistfile1.txt The slave itself isn't logging much of interest, just various Executor has terminated with unknown status messages, etc. For context, my container is running 0.23.0 installed from packages on Ubuntu 14.04. Docker is at 1.6.2. -- b On Wed, Aug 5, 2015 at 4:28 PM, Tim Chen t...@mesosphere.io wrote: Hi Ben, Did you get the command from docker inspect or from the slave log? If it's from the slave log then we don't actually print out the exact way we exec the command, but just joining the exec arguments with a space in between. What's the exact error in the slave/sandbox stderr log? Tim On Wed, Aug 5, 2015 at 4:18 PM, Benjamin Anderson benja...@ivysoftworks.com wrote: Hi there - I'm working on setting up a Mesos environment with the Docker containerizer and can't seem to get the recovery feature working. I'm running CoreOS, so the slave processes themselves are containerized. I have no issues running jobs without the recovery features enabled, but all jobs fail to boot when I add the following flags: MESOS_DOCKER_KILL_ORPHANS=false MESOS_DOCKER_MESOS_IMAGE=myrepo/my-slave-container Inspecting the Docker images and their log output reveals that the container invocation appears to be flawed - see this gist: https://gist.github.com/banjiewen/a2dc1784a82ed87edd6b The containerizer is attempting to invoke an unquoted command via `/bin/sh -c`, which, predictably, fails to pass the complete command. This results in the error message shown in the second file in the linked gist. This is reproducible manually; quoting the arguments to `/bin/sh -c` results in success (at least, it correctly receives the supplied arguments). I gather that this is related to MESOS-2115, and it's clear that this patch[1] changed that behavior significantly, but if it introduced a bug I can't see it. It's possible that my instance is configured incorrectly as well; the documentation here is a bit vague and there aren't many examples on the web. Thanks in advance, -- b [1]: https://github.com/apache/mesos/commit/3baa60965407bf0c3eb9c3da1b2ba7c0a4fee968
Troubles with slave recovery via Docker containerizer on 0.23.0
Hi there - I'm working on setting up a Mesos environment with the Docker containerizer and can't seem to get the recovery feature working. I'm running CoreOS, so the slave processes themselves are containerized. I have no issues running jobs without the recovery features enabled, but all jobs fail to boot when I add the following flags: MESOS_DOCKER_KILL_ORPHANS=false MESOS_DOCKER_MESOS_IMAGE=myrepo/my-slave-container Inspecting the Docker images and their log output reveals that the container invocation appears to be flawed - see this gist: https://gist.github.com/banjiewen/a2dc1784a82ed87edd6b The containerizer is attempting to invoke an unquoted command via `/bin/sh -c`, which, predictably, fails to pass the complete command. This results in the error message shown in the second file in the linked gist. This is reproducible manually; quoting the arguments to `/bin/sh -c` results in success (at least, it correctly receives the supplied arguments). I gather that this is related to MESOS-2115, and it's clear that this patch[1] changed that behavior significantly, but if it introduced a bug I can't see it. It's possible that my instance is configured incorrectly as well; the documentation here is a bit vague and there aren't many examples on the web. Thanks in advance, -- b [1]: https://github.com/apache/mesos/commit/3baa60965407bf0c3eb9c3da1b2ba7c0a4fee968
Re: Troubles with slave recovery via Docker containerizer on 0.23.0
Hi Ben, Did you get the command from docker inspect or from the slave log? If it's from the slave log then we don't actually print out the exact way we exec the command, but just joining the exec arguments with a space in between. What's the exact error in the slave/sandbox stderr log? Tim On Wed, Aug 5, 2015 at 4:18 PM, Benjamin Anderson benja...@ivysoftworks.com wrote: Hi there - I'm working on setting up a Mesos environment with the Docker containerizer and can't seem to get the recovery feature working. I'm running CoreOS, so the slave processes themselves are containerized. I have no issues running jobs without the recovery features enabled, but all jobs fail to boot when I add the following flags: MESOS_DOCKER_KILL_ORPHANS=false MESOS_DOCKER_MESOS_IMAGE=myrepo/my-slave-container Inspecting the Docker images and their log output reveals that the container invocation appears to be flawed - see this gist: https://gist.github.com/banjiewen/a2dc1784a82ed87edd6b The containerizer is attempting to invoke an unquoted command via `/bin/sh -c`, which, predictably, fails to pass the complete command. This results in the error message shown in the second file in the linked gist. This is reproducible manually; quoting the arguments to `/bin/sh -c` results in success (at least, it correctly receives the supplied arguments). I gather that this is related to MESOS-2115, and it's clear that this patch[1] changed that behavior significantly, but if it introduced a bug I can't see it. It's possible that my instance is configured incorrectly as well; the documentation here is a bit vague and there aren't many examples on the web. Thanks in advance, -- b [1]: https://github.com/apache/mesos/commit/3baa60965407bf0c3eb9c3da1b2ba7c0a4fee968