[jira] [Commented] (MESOS-6200) Hope mesos support soft and hard cpu/memory resource in the task

2019-07-31 Thread Deshi Xiao (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-6200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16897570#comment-16897570
 ] 

Deshi Xiao commented on MESOS-6200:
---

thanks.got it.

Benjamin Mahler (JIRA)  于 2019年7月31日周三 上午6:25写道:



> Hope mesos support soft and hard cpu/memory resource in the task
> 
>
> Key: MESOS-6200
> URL: https://issues.apache.org/jira/browse/MESOS-6200
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization, docker, scheduler api
>Affects Versions: 0.28.2
> Environment: CentOS 7 
> Kernel 3.10.0-327.28.3.el7.x86_64
> Mesos 0.28.2
> Docker 1.11.2
>Reporter: Lei Xu
>Priority: Major
>  Labels: resource-management
>
> The Docker executor maybe could support soft/hard resource limit to enable 
> more flexible resources sharing among the applications.
> ||  || CPU || Memory ||
> | hard limit| --cpu-period & --cpu-quota | --memory & --memory-swap|
> | soft limit| --cpu-shares | --memory-reservation|
> And now the task protobuf message has only one resource struct that used to 
> describe the cgroup limit, and the docker executor handle is like the 
> following, only --memory and --cpu-shares were set:
> {code}
>   if (resources.isSome()) {
> // TODO(yifan): Support other resources (e.g. disk).
> Option cpus = resources.get().cpus();
> if (cpus.isSome()) {
>   uint64_t cpuShare =
> std::max((uint64_t) (CPU_SHARES_PER_CPU * cpus.get()), 
> MIN_CPU_SHARES);
>   argv.push_back("--cpu-shares");
>   argv.push_back(stringify(cpuShare));
> }
> Option mem = resources.get().mem();
> if (mem.isSome()) {
>   Bytes memLimit = std::max(mem.get(), MIN_MEMORY);
>   argv.push_back("--memory");
>   argv.push_back(stringify(memLimit.bytes()));
> }
>   }
> {code}
> I hope that the executor and the protobuf message could separate the resource 
> to the two parts: soft and hard. Then the user could set 2 levels resource 
> limits for the docker.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (MESOS-5368) Consider introducing persistent agent ID

2017-09-06 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16154992#comment-16154992
 ] 

Deshi Xiao commented on MESOS-5368:
---

anyone can guide me to working on it. i have some cycles to work on it.

> Consider introducing persistent agent ID
> 
>
> Key: MESOS-5368
> URL: https://issues.apache.org/jira/browse/MESOS-5368
> Project: Mesos
>  Issue Type: Improvement
>Affects Versions: 1.2.1, 1.3.0
>Reporter: Neil Conway
>  Labels: mesosphere
>
> Currently, agent IDs identify a single "session" by an agent: that is, an 
> agent receives an agent ID when it registers with the master; it reuses that 
> agent ID if it disconnects and successfully reregisters; if the agent shuts 
> down and restarts, it registers anew and receives a new agent ID.
> It would be convenient to have a "persistent agent ID" that remains the same 
> for the duration of a given agent {{work_dir}}. This would mean that a given 
> persistent volume would not migrate between different persistent agent IDs 
> over time, for example (see MESOS-4894). If we supported permanently removing 
> an agent from the cluster (i.e., the {{work_dir}} and any volumes used by the 
> agent will never be reused), we could use the persistent agent ID to report 
> which agent has been removed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7801) Retry logic for unsuccessful `docker rm` during agent recovery

2017-08-30 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16146809#comment-16146809
 ] 

Deshi Xiao commented on MESOS-7801:
---

[~gilbert]  thanks for your clarify.  i will join it asap.

> Retry logic for unsuccessful `docker rm` during agent recovery
> --
>
> Key: MESOS-7801
> URL: https://issues.apache.org/jira/browse/MESOS-7801
> Project: Mesos
>  Issue Type: Improvement
>  Components: docker
>Reporter: Chun-Hung Hsiao
>Assignee: Chun-Hung Hsiao
>
> In MESOS- we skip the failure when `docker rm` fails due to mount leakage 
> during agent recovery. In order not to leave residual docker containers in 
> the docker daemon, we could do a best-effort `docker rm` retry with an 
> exponential backoff since we cannot control when the leakage would be 
> terminated.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7801) Retry logic for unsuccessful `docker rm` during agent recovery

2017-08-28 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16144522#comment-16144522
 ] 

Deshi Xiao commented on MESOS-7801:
---

MESOS- already resolve my case, i have not provide another issue right now, 
i just curious that why the optimization patch is always pending? it let me 
confuse abt the general workflow.

> Retry logic for unsuccessful `docker rm` during agent recovery
> --
>
> Key: MESOS-7801
> URL: https://issues.apache.org/jira/browse/MESOS-7801
> Project: Mesos
>  Issue Type: Improvement
>  Components: docker
>Reporter: Chun-Hung Hsiao
>Assignee: Chun-Hung Hsiao
>
> In MESOS- we skip the failure when `docker rm` fails due to mount leakage 
> during agent recovery. In order not to leave residual docker containers in 
> the docker daemon, we could do a best-effort `docker rm` retry with an 
> exponential backoff since we cannot control when the leakage would be 
> terminated.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-1871) Sending SIGTERM to a task command may render it orphaned

2017-08-28 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16144518#comment-16144518
 ] 

Deshi Xiao commented on MESOS-1871:
---

[~idownes] i have some cycle to work on this issue, could you please shepherd 
me. where is best start on fix it?

> Sending SIGTERM to a task command may render it orphaned
> 
>
> Key: MESOS-1871
> URL: https://issues.apache.org/jira/browse/MESOS-1871
> Project: Mesos
>  Issue Type: Bug
>  Components: agent
>Reporter: Alexander Rukletsov
>Priority: Minor
>
> {{CommandExecutor}} launches tasks wrapping them into {{sh -c}}. That means 
> signals are sent to the top process—that is {{sh -c}}—and not to the task 
> directly. Though {{SIGTERM}} is propagated by {{sh -c}} down the process 
> tree, if the task is unresponsive to {{SIGTERM}}, {{sh -c}} terminates 
> reporting success to the {{CommandExecutor}}, rendering the task detached 
> from the parent process and still running. Because the {{CommandExecutor}} 
> thinks the command terminated normally, its OS process exits normally and may 
> not trigger containerizer's escalation which destroys cgroups.
> Here is the test related to the first part: 
> [https://gist.github.com/rukletsov/68259dfb02421813f9e6].
> Here is the test related to the second part: 
> [https://gist.github.com/rukletsov/3f19ecc7389fa51e65c0].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7643) The order of isolators provided in '--isolation' flag is not preserved and instead sorted alphabetically

2017-08-28 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16143589#comment-16143589
 ] 

Deshi Xiao commented on MESOS-7643:
---

this issue already fixed ? 
https://github.com/apache/mesos/blob/master/src/slave/containerizer/mesos/containerizer.cpp#L156

> The order of isolators provided in '--isolation' flag is not preserved and 
> instead sorted alphabetically
> 
>
> Key: MESOS-7643
> URL: https://issues.apache.org/jira/browse/MESOS-7643
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 1.1.2, 1.2.0, 1.3.0
>Reporter: Michael Cherny
>Assignee: James Peach
>Priority: Critical
>  Labels: isolation
>
> According to documentation and comments in code the order of the entries in 
> the --isolation flag should specify the ordering of the isolators. 
> Specifically, the `create` and `prepare` calls for each isolator should run 
> serially in the order in which they appear in the --isolation flag, while the 
> `cleanup` call should be serialized in reverse order (with exception of 
> filesystem isolator which is always first).
> But in fact, the isolators provided in '--isolation' flag are sorted 
> alphabetically.
> That happens in [this line of 
> code|https://github.com/apache/mesos/blob/master/src/slave/containerizer/mesos/containerizer.cpp#L377].
>  In this line use of 'set' is done (apparently instead of list or 
> vector) and set is a sorted container.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7801) Retry logic for unsuccessful `docker rm` during agent recovery

2017-08-28 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16143577#comment-16143577
 ] 

Deshi Xiao commented on MESOS-7801:
---

[~jieyu] does this patch can summited?

> Retry logic for unsuccessful `docker rm` during agent recovery
> --
>
> Key: MESOS-7801
> URL: https://issues.apache.org/jira/browse/MESOS-7801
> Project: Mesos
>  Issue Type: Improvement
>  Components: docker
>Reporter: Chun-Hung Hsiao
>Assignee: Chun-Hung Hsiao
>
> In MESOS- we skip the failure when `docker rm` fails due to mount leakage 
> during agent recovery. In order not to leave residual docker containers in 
> the docker daemon, we could do a best-effort `docker rm` retry with an 
> exponential backoff since we cannot control when the leakage would be 
> terminated.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-1871) Sending SIGTERM to a task command may render it orphaned

2017-08-28 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16143572#comment-16143572
 ] 

Deshi Xiao commented on MESOS-1871:
---

[~idownes]  this ticket is duplicated by  MESOS-6933, close it ?

> Sending SIGTERM to a task command may render it orphaned
> 
>
> Key: MESOS-1871
> URL: https://issues.apache.org/jira/browse/MESOS-1871
> Project: Mesos
>  Issue Type: Bug
>  Components: agent
>Reporter: Alexander Rukletsov
>Priority: Minor
>
> {{CommandExecutor}} launches tasks wrapping them into {{sh -c}}. That means 
> signals are sent to the top process—that is {{sh -c}}—and not to the task 
> directly. Though {{SIGTERM}} is propagated by {{sh -c}} down the process 
> tree, if the task is unresponsive to {{SIGTERM}}, {{sh -c}} terminates 
> reporting success to the {{CommandExecutor}}, rendering the task detached 
> from the parent process and still running. Because the {{CommandExecutor}} 
> thinks the command terminated normally, its OS process exits normally and may 
> not trigger containerizer's escalation which destroys cgroups.
> Here is the test related to the first part: 
> [https://gist.github.com/rukletsov/68259dfb02421813f9e6].
> Here is the test related to the second part: 
> [https://gist.github.com/rukletsov/3f19ecc7389fa51e65c0].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-6933) Executor does not respect grace period

2017-08-24 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16141201#comment-16141201
 ] 

Deshi Xiao commented on MESOS-6933:
---

[~alexr] hi guy, do you have cycles to shepherd me. i want to fix it in my try.

> Executor does not respect grace period
> --
>
> Key: MESOS-6933
> URL: https://issues.apache.org/jira/browse/MESOS-6933
> Project: Mesos
>  Issue Type: Bug
>  Components: executor
>Reporter: Tomasz Janiszewski
> Attachments: 屏幕快照 2017-07-17 下午2.19.03.png
>
>
> Mesos Command Executor try to support grace period with escalate but 
> unfortunately it does not work. It launches {{command}} by wrapping it in 
> {{sh -c}} this cause process tree to look like this
> {code}
> Received killTask
> Shutting down
> Sending SIGTERM to process tree at pid 18
> Sent SIGTERM to the following process trees:
> [ 
> -+- 18 sh -c cd offer-i18n-0.1.24 && LD_PRELOAD=../librealresources.so 
> ./bin/offer-i18n -e prod -p $PORT0 
>  \--- 19 command...
> ]
> Command terminated with signal Terminated (pid: 18)
> {code}
> This cause {{sh}} to immediately close and so executor, while wrapped 
> {{command}} might need some more time to finish. Finally, executor thinks 
> command executed gracefully so it won't 
> [escalate|https://github.com/apache/mesos/blob/1.1.0/src/launcher/executor.cpp#L695]
>  to SIGKILL.
> This cause leaks when POSIX containerizer is used because if command ignores 
> SIGTERM it will be attached to initialize and never get killed. Using 
> pid/namespace only masks the problem because hanging process is captured 
> before it can gracefully shutdown.
> Fix for this is to sent SIGTERM only to {{sh}} children. {{sh}} will exit 
> when all children processes finish. If not they will be killed by escalation 
> to SIGKILL.
> All versions from 0.20 are affected.
> This test should pass 
> [src/tests/command_executor_tests.cpp:342|https://github.com/apache/mesos/blob/2c856178b59593ff8068ea8d6c6593943c33008c/src/tests/command_executor_tests.cpp#L342-L343]
> [Mailing list 
> thread|https://lists.apache.org/thread.html/1025dca0cf4418aee50b14330711500af864f08b53eb82d10cd5c04c@%3Cuser.mesos.apache.org%3E]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-6804) Running 'tty' inside a debug container that has a tty reports "Not a tty"

2017-08-23 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16138805#comment-16138805
 ] 

Deshi Xiao commented on MESOS-6804:
---

[~klueska]  Hi, any progress to this issue?

> Running 'tty' inside a debug container that has a tty reports "Not a tty"
> -
>
> Key: MESOS-6804
> URL: https://issues.apache.org/jira/browse/MESOS-6804
> Project: Mesos
>  Issue Type: Bug
>Reporter: Kevin Klues
>Assignee: Kevin Klues
>Priority: Critical
>  Labels: debugging, mesosphere
>
> We need to inject `/dev/console` into the container and map it to the slave 
> end of the TTY we are attached to.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-6213) Build failure on macOS Sierra: Protobuf atomics deprecated.

2017-08-22 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136416#comment-16136416
 ] 

Deshi Xiao commented on MESOS-6213:
---

this ticket is only affect on 1.0.x branch, please give a result on this issue, 
if anyone don't care this version, we can close it asap.

> Build failure on macOS Sierra: Protobuf atomics deprecated.
> ---
>
> Key: MESOS-6213
> URL: https://issues.apache.org/jira/browse/MESOS-6213
> Project: Mesos
>  Issue Type: Bug
>  Components: build
>Reporter: Charles Allen
>
> Building on OSX is giving the following error.
> {code}
> In file included from 
> ../3rdparty/protobuf-2.6.1/src/google/protobuf/stubs/atomicops.h:184:
> ../3rdparty/protobuf-2.6.1/src/google/protobuf/stubs/atomicops_internals_macosx.h:173:9:
>  error: 'OSAtomicCompareAndSwap64Barrier' is deprecated: first
>   deprecated in macOS 10.12 - Use std::atomic_compare_exchange_strong() 
> from  instead [-Werror,-Wdeprecated-declarations]
> if (OSAtomicCompareAndSwap64Barrier(
> ^
> /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.12.sdk/usr/include/libkern/OSAtomicDeprecated.h:645:9:
>  note:
>   'OSAtomicCompareAndSwap64Barrier' has been explicitly marked deprecated 
> here
> boolOSAtomicCompareAndSwap64Barrier( int64_t __oldValue, int64_t 
> __newValue,
> ^
> {code}
> Protobuf is not listed as a component so I just set it as {{build}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7801) Retry logic for unsuccessful `docker rm` during agent recovery

2017-08-17 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16131670#comment-16131670
 ] 

Deshi Xiao commented on MESOS-7801:
---

please update it asap.

> Retry logic for unsuccessful `docker rm` during agent recovery
> --
>
> Key: MESOS-7801
> URL: https://issues.apache.org/jira/browse/MESOS-7801
> Project: Mesos
>  Issue Type: Improvement
>  Components: docker
>Reporter: Chun-Hung Hsiao
>Assignee: Chun-Hung Hsiao
>
> In MESOS- we skip the failure when `docker rm` fails due to mount leakage 
> during agent recovery. In order not to leave residual docker containers in 
> the docker daemon, we could do a best-effort `docker rm` retry with an 
> exponential backoff since we cannot control when the leakage would be 
> terminated.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7337) DefaultExecutorCheckTest.CommandCheckTimeout becomes flaky under load

2017-07-27 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103255#comment-16103255
 ] 

Deshi Xiao commented on MESOS-7337:
---

howto reproduce the testing case.

> DefaultExecutorCheckTest.CommandCheckTimeout becomes flaky under load
> -
>
> Key: MESOS-7337
> URL: https://issues.apache.org/jira/browse/MESOS-7337
> Project: Mesos
>  Issue Type: Bug
>  Components: flaky, test
> Environment: Mac OS 10.12.4 (16E195), SSL debug build w/o 
> optimizations, clang version 5.0.0 (http://llvm.org/git/clang 
> c511a96ffe744933459ef64bf963629538057a90) (http://llvm.org/git/llvm 
> 0cd81d8a1055f167e0f588dd1b476863b00da3d5)
>Reporter: Benjamin Bannier
>  Labels: flaky-test, mesosphere
> Attachments: DefaultExecutorCheckTest.CommandCheckTimeout.log
>
>
> The test {{DefaultExecutorCheckTest.CommandCheckTimeout}} randomly fails for 
> me when executing tests in parallel, e.g.,
> {code}
> [ RUN  ] DefaultExecutorCheckTest.CommandCheckTimeout
> ../../src/tests/check_tests.cpp:1374: Failure
> Failed to wait 15secs for updateCheckResultTimeout
> ../../src/tests/check_tests.cpp:1334: Failure
> Actual function call count doesn't match EXPECT_CALL(*scheduler, update(_, 
> _))...
>  Expected: to be called at least 3 times
>Actual: called twice - unsatisfied and active
> [  FAILED  ] DefaultExecutorCheckTest.CommandCheckTimeout (25351 ms)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-6804) Running 'tty' inside a debug container that has a tty reports "Not a tty"

2017-07-27 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103192#comment-16103192
 ] 

Deshi Xiao commented on MESOS-6804:
---

[~klueska] could you please explain what reason for that. 

> Running 'tty' inside a debug container that has a tty reports "Not a tty"
> -
>
> Key: MESOS-6804
> URL: https://issues.apache.org/jira/browse/MESOS-6804
> Project: Mesos
>  Issue Type: Bug
>Reporter: Kevin Klues
>Assignee: Kevin Klues
>Priority: Critical
>  Labels: debugging, mesosphere
>
> We need to inject `/dev/console` into the container and map it to the slave 
> end of the TTY we are attached to.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-3435) Add containerizer support for hyper

2017-07-17 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16090776#comment-16090776
 ] 

Deshi Xiao commented on MESOS-3435:
---

this issue is out of date. [~haosd...@gmail.com]  do you have any update on it? 
close it?

> Add containerizer support for hyper
> ---
>
> Key: MESOS-3435
> URL: https://issues.apache.org/jira/browse/MESOS-3435
> Project: Mesos
>  Issue Type: Story
>Reporter: Deshi Xiao
>Assignee: haosdent
>
> Secure as hypervisor, fast and easily used as Docker. This is hyper. 
> https://docs.hyper.sh/Introduction/what_is_hyper_.html We could implement 
> this through module way once MESOS-3709 finished.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-4812) Mesos fails to escape command health checks

2017-07-17 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16090744#comment-16090744
 ] 

Deshi Xiao commented on MESOS-4812:
---

any update?

> Mesos fails to escape command health checks
> ---
>
> Key: MESOS-4812
> URL: https://issues.apache.org/jira/browse/MESOS-4812
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.25.0
>Reporter: Lukas Loesche
>Assignee: haosdent
>  Labels: health-check, mesosphere, tech-debt
> Attachments: health_task.gif
>
>
> As described in https://github.com/mesosphere/marathon/issues/
> I would like to run a command health check
> {noformat}
> /bin/bash -c " {noformat}
> The health check fails because Mesos, while running the command inside double 
> quotes of a sh -c "" doesn't escape the double quotes in the command.
> If I escape the double quotes myself the command health check succeeds. But 
> this would mean that the user needs intimate knowledge of how Mesos executes 
> his commands which can't be right.
> I was told this is not a Marathon but a Mesos issue so am opening this JIRA. 
> I don't know if this only affects the command health check.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7067) Add the `OnTerminationPolicy` to the TaskInfo protobuf.

2017-07-17 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16090727#comment-16090727
 ] 

Deshi Xiao commented on MESOS-7067:
---

This change has been discarded?  any update on it.

> Add the `OnTerminationPolicy` to the TaskInfo protobuf.
> ---
>
> Key: MESOS-7067
> URL: https://issues.apache.org/jira/browse/MESOS-7067
> Project: Mesos
>  Issue Type: Task
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>  Labels: mesosphere
>
> As outlined in the [design doc | 
> https://docs.google.com/document/d/1VxfoZ-DzMHnKY0gzoccHEhx1rvdC2-RATJfJUfiAwGY/edit?usp=sharing]
>  , we need to introduce the {{OnTerminationPolicy}} to the {{TaskInfo}} 
> protobuf allowing every task to specify what would an executor do upon task 
> termination. 
> Note that this issue won't introduce the {{RestartPolicy}} message and those 
> would be added via a separate issue.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7305) Adjust the recover logic of MesosContainerizer to allow standalone containers.

2017-07-17 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16090724#comment-16090724
 ] 

Deshi Xiao commented on MESOS-7305:
---

where to start the issue.

> Adjust the recover logic of MesosContainerizer to allow standalone containers.
> --
>
> Key: MESOS-7305
> URL: https://issues.apache.org/jira/browse/MESOS-7305
> Project: Mesos
>  Issue Type: Task
>  Components: containerization
>Reporter: Jie Yu
>Assignee: Joseph Wu
>  Labels: mesosphere, storage
>
> The current recovery logic in MesosContainerizer assumes that all top level 
> containers are tied to some Mesos executors. Adding standalone containers 
> will invalid this assumption. The recovery logic must be changed to adapt to 
> that.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-6933) Executor does not respect grace period

2017-07-17 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16089848#comment-16089848
 ] 

Deshi Xiao commented on MESOS-6933:
---

[~janisz]  do you can write a testing to cover it? i have no clues to check 
where code to start the fixing.

> Executor does not respect grace period
> --
>
> Key: MESOS-6933
> URL: https://issues.apache.org/jira/browse/MESOS-6933
> Project: Mesos
>  Issue Type: Bug
>  Components: executor
>Reporter: Tomasz Janiszewski
> Attachments: 屏幕快照 2017-07-17 下午2.19.03.png
>
>
> Mesos Command Executor try to support grace period with escalate but 
> unfortunately it does not work. It launches {{command}} by wrapping it in 
> {{sh -c}} this cause process tree to look like this
> {code}
> Received killTask
> Shutting down
> Sending SIGTERM to process tree at pid 18
> Sent SIGTERM to the following process trees:
> [ 
> -+- 18 sh -c cd offer-i18n-0.1.24 && LD_PRELOAD=../librealresources.so 
> ./bin/offer-i18n -e prod -p $PORT0 
>  \--- 19 command...
> ]
> Command terminated with signal Terminated (pid: 18)
> {code}
> This cause {{sh}} to immediately close and so executor, while wrapped 
> {{command}} might need some more time to finish. Finally, executor thinks 
> command executed gracefully so it won't 
> [escalate|https://github.com/apache/mesos/blob/1.1.0/src/launcher/executor.cpp#L695]
>  to SIGKILL.
> This cause leaks when POSIX containerizer is used because if command ignores 
> SIGTERM it will be attached to initialize and never get killed. Using 
> pid/namespace only masks the problem because hanging process is captured 
> before it can gracefully shutdown.
> Fix for this is to sent SIGTERM only to {{sh}} children. {{sh}} will exit 
> when all children processes finish. If not they will be killed by escalation 
> to SIGKILL.
> All versions from 0.20 are affected.
> This test should pass 
> [src/tests/command_executor_tests.cpp:342|https://github.com/apache/mesos/blob/2c856178b59593ff8068ea8d6c6593943c33008c/src/tests/command_executor_tests.cpp#L342-L343]
> [Mailing list 
> thread|https://lists.apache.org/thread.html/1025dca0cf4418aee50b14330711500af864f08b53eb82d10cd5c04c@%3Cuser.mesos.apache.org%3E]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-6933) Executor does not respect grace period

2017-07-17 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16089452#comment-16089452
 ] 

Deshi Xiao commented on MESOS-6933:
---

[~janisz] yes, i build on upstream mesos code base. it is 1.4.  

> Executor does not respect grace period
> --
>
> Key: MESOS-6933
> URL: https://issues.apache.org/jira/browse/MESOS-6933
> Project: Mesos
>  Issue Type: Bug
>  Components: executor
>Reporter: Tomasz Janiszewski
> Attachments: 屏幕快照 2017-07-17 下午2.19.03.png
>
>
> Mesos Command Executor try to support grace period with escalate but 
> unfortunately it does not work. It launches {{command}} by wrapping it in 
> {{sh -c}} this cause process tree to look like this
> {code}
> Received killTask
> Shutting down
> Sending SIGTERM to process tree at pid 18
> Sent SIGTERM to the following process trees:
> [ 
> -+- 18 sh -c cd offer-i18n-0.1.24 && LD_PRELOAD=../librealresources.so 
> ./bin/offer-i18n -e prod -p $PORT0 
>  \--- 19 command...
> ]
> Command terminated with signal Terminated (pid: 18)
> {code}
> This cause {{sh}} to immediately close and so executor, while wrapped 
> {{command}} might need some more time to finish. Finally, executor thinks 
> command executed gracefully so it won't 
> [escalate|https://github.com/apache/mesos/blob/1.1.0/src/launcher/executor.cpp#L695]
>  to SIGKILL.
> This cause leaks when POSIX containerizer is used because if command ignores 
> SIGTERM it will be attached to initialize and never get killed. Using 
> pid/namespace only masks the problem because hanging process is captured 
> before it can gracefully shutdown.
> Fix for this is to sent SIGTERM only to {{sh}} children. {{sh}} will exit 
> when all children processes finish. If not they will be killed by escalation 
> to SIGKILL.
> All versions from 0.20 are affected.
> This test should pass 
> [src/tests/command_executor_tests.cpp:342|https://github.com/apache/mesos/blob/2c856178b59593ff8068ea8d6c6593943c33008c/src/tests/command_executor_tests.cpp#L342-L343]
> [Mailing list 
> thread|https://lists.apache.org/thread.html/1025dca0cf4418aee50b14330711500af864f08b53eb82d10cd5c04c@%3Cuser.mesos.apache.org%3E]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-6933) Executor does not respect grace period

2017-07-17 Thread Deshi Xiao (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deshi Xiao updated MESOS-6933:
--
Attachment: 屏幕快照 2017-07-17 下午2.19.03.png

please check the screenshot. [~janisz]

> Executor does not respect grace period
> --
>
> Key: MESOS-6933
> URL: https://issues.apache.org/jira/browse/MESOS-6933
> Project: Mesos
>  Issue Type: Bug
>  Components: executor
>Reporter: Tomasz Janiszewski
> Attachments: 屏幕快照 2017-07-17 下午2.19.03.png
>
>
> Mesos Command Executor try to support grace period with escalate but 
> unfortunately it does not work. It launches {{command}} by wrapping it in 
> {{sh -c}} this cause process tree to look like this
> {code}
> Received killTask
> Shutting down
> Sending SIGTERM to process tree at pid 18
> Sent SIGTERM to the following process trees:
> [ 
> -+- 18 sh -c cd offer-i18n-0.1.24 && LD_PRELOAD=../librealresources.so 
> ./bin/offer-i18n -e prod -p $PORT0 
>  \--- 19 command...
> ]
> Command terminated with signal Terminated (pid: 18)
> {code}
> This cause {{sh}} to immediately close and so executor, while wrapped 
> {{command}} might need some more time to finish. Finally, executor thinks 
> command executed gracefully so it won't 
> [escalate|https://github.com/apache/mesos/blob/1.1.0/src/launcher/executor.cpp#L695]
>  to SIGKILL.
> This cause leaks when POSIX containerizer is used because if command ignores 
> SIGTERM it will be attached to initialize and never get killed. Using 
> pid/namespace only masks the problem because hanging process is captured 
> before it can gracefully shutdown.
> Fix for this is to sent SIGTERM only to {{sh}} children. {{sh}} will exit 
> when all children processes finish. If not they will be killed by escalation 
> to SIGKILL.
> All versions from 0.20 are affected.
> This test should pass 
> [src/tests/command_executor_tests.cpp:342|https://github.com/apache/mesos/blob/2c856178b59593ff8068ea8d6c6593943c33008c/src/tests/command_executor_tests.cpp#L342-L343]
> [Mailing list 
> thread|https://lists.apache.org/thread.html/1025dca0cf4418aee50b14330711500af864f08b53eb82d10cd5c04c@%3Cuser.mesos.apache.org%3E]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-6933) Executor does not respect grace period

2017-07-15 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16088603#comment-16088603
 ] 

Deshi Xiao commented on MESOS-6933:
---

[~janisz] i have reproduce the step. and not sure to check if " the shell has 
excited but script is still running and producing output." is happened. so 
cloud you please give a patient comments in the mesos log will prefer way to 
let me understand. sorry for the request.

> Executor does not respect grace period
> --
>
> Key: MESOS-6933
> URL: https://issues.apache.org/jira/browse/MESOS-6933
> Project: Mesos
>  Issue Type: Bug
>  Components: executor
>Reporter: Tomasz Janiszewski
>
> Mesos Command Executor try to support grace period with escalate but 
> unfortunately it does not work. It launches {{command}} by wrapping it in 
> {{sh -c}} this cause process tree to look like this
> {code}
> Received killTask
> Shutting down
> Sending SIGTERM to process tree at pid 18
> Sent SIGTERM to the following process trees:
> [ 
> -+- 18 sh -c cd offer-i18n-0.1.24 && LD_PRELOAD=../librealresources.so 
> ./bin/offer-i18n -e prod -p $PORT0 
>  \--- 19 command...
> ]
> Command terminated with signal Terminated (pid: 18)
> {code}
> This cause {{sh}} to immediately close and so executor, while wrapped 
> {{command}} might need some more time to finish. Finally, executor thinks 
> command executed gracefully so it won't 
> [escalate|https://github.com/apache/mesos/blob/1.1.0/src/launcher/executor.cpp#L695]
>  to SIGKILL.
> This cause leaks when POSIX containerizer is used because if command ignores 
> SIGTERM it will be attached to initialize and never get killed. Using 
> pid/namespace only masks the problem because hanging process is captured 
> before it can gracefully shutdown.
> Fix for this is to sent SIGTERM only to {{sh}} children. {{sh}} will exit 
> when all children processes finish. If not they will be killed by escalation 
> to SIGKILL.
> All versions from 0.20 are affected.
> This test should pass 
> [src/tests/command_executor_tests.cpp:342|https://github.com/apache/mesos/blob/2c856178b59593ff8068ea8d6c6593943c33008c/src/tests/command_executor_tests.cpp#L342-L343]
> [Mailing list 
> thread|https://lists.apache.org/thread.html/1025dca0cf4418aee50b14330711500af864f08b53eb82d10cd5c04c@%3Cuser.mesos.apache.org%3E]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7302) Support launching standalone containers.

2017-07-13 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16085552#comment-16085552
 ] 

Deshi Xiao commented on MESOS-7302:
---

[~kaysoky]  any update?

> Support launching standalone containers.
> 
>
> Key: MESOS-7302
> URL: https://issues.apache.org/jira/browse/MESOS-7302
> Project: Mesos
>  Issue Type: Epic
>  Components: containerization
>Reporter: Jie Yu
>Assignee: Joseph Wu
>  Labels: mesosphere
>
> Containerizer should support launching containers (both top level and nested) 
> that are not tied to a particular Mesos task or executor. This is for the 
> case where the agent wants to launch some system containers (e.g., for CSI 
> plugin) that will be managed by Mesos.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-6213) Build failure on macOS Sierra: Protobuf atomics deprecated.

2017-07-06 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16076763#comment-16076763
 ] 

Deshi Xiao commented on MESOS-6213:
---

[~drcrallen] please update it, thanks a lot.

> Build failure on macOS Sierra: Protobuf atomics deprecated.
> ---
>
> Key: MESOS-6213
> URL: https://issues.apache.org/jira/browse/MESOS-6213
> Project: Mesos
>  Issue Type: Bug
>  Components: build
>Reporter: Charles Allen
>
> Building on OSX is giving the following error.
> {code}
> In file included from 
> ../3rdparty/protobuf-2.6.1/src/google/protobuf/stubs/atomicops.h:184:
> ../3rdparty/protobuf-2.6.1/src/google/protobuf/stubs/atomicops_internals_macosx.h:173:9:
>  error: 'OSAtomicCompareAndSwap64Barrier' is deprecated: first
>   deprecated in macOS 10.12 - Use std::atomic_compare_exchange_strong() 
> from  instead [-Werror,-Wdeprecated-declarations]
> if (OSAtomicCompareAndSwap64Barrier(
> ^
> /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.12.sdk/usr/include/libkern/OSAtomicDeprecated.h:645:9:
>  note:
>   'OSAtomicCompareAndSwap64Barrier' has been explicitly marked deprecated 
> here
> boolOSAtomicCompareAndSwap64Barrier( int64_t __oldValue, int64_t 
> __newValue,
> ^
> {code}
> Protobuf is not listed as a component so I just set it as {{build}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-6200) Hope mesos support soft and hard cpu/memory resource in the task

2017-07-06 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16076757#comment-16076757
 ] 

Deshi Xiao commented on MESOS-6200:
---

see above comments, the containerized have already support posix/rlimit 
feature. so this issue is can review and close it.

> Hope mesos support soft and hard cpu/memory resource in the task
> 
>
> Key: MESOS-6200
> URL: https://issues.apache.org/jira/browse/MESOS-6200
> Project: Mesos
>  Issue Type: Improvement
>  Components: cgroups, containerization, docker, scheduler api
>Affects Versions: 0.28.2
> Environment: CentOS 7 
> Kernel 3.10.0-327.28.3.el7.x86_64
> Mesos 0.28.2
> Docker 1.11.2
>Reporter: Lei Xu
>
> The Docker executor maybe could support soft/hard resource limit to enable 
> more flexible resources sharing among the applications.
> ||  || CPU || Memory ||
> | hard limit| --cpu-period & --cpu-quota | --memory & --memory-swap|
> | soft limit| --cpu-shares | --memory-reservation|
> And now the task protobuf message has only one resource struct that used to 
> describe the cgroup limit, and the docker executor handle is like the 
> following, only --memory and --cpu-shares were set:
> {code}
>   if (resources.isSome()) {
> // TODO(yifan): Support other resources (e.g. disk).
> Option cpus = resources.get().cpus();
> if (cpus.isSome()) {
>   uint64_t cpuShare =
> std::max((uint64_t) (CPU_SHARES_PER_CPU * cpus.get()), 
> MIN_CPU_SHARES);
>   argv.push_back("--cpu-shares");
>   argv.push_back(stringify(cpuShare));
> }
> Option mem = resources.get().mem();
> if (mem.isSome()) {
>   Bytes memLimit = std::max(mem.get(), MIN_MEMORY);
>   argv.push_back("--memory");
>   argv.push_back(stringify(memLimit.bytes()));
> }
>   }
> {code}
> I hope that the executor and the protobuf message could separate the resource 
> to the two parts: soft and hard. Then the user could set 2 levels resource 
> limits for the docker.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-6213) Build failure on macOS Sierra: Protobuf atomics deprecated.

2017-06-21 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058553#comment-16058553
 ] 

Deshi Xiao commented on MESOS-6213:
---

[~drcrallen]  Hi, this bug seems fix, please check again. thanks a lot.

> Build failure on macOS Sierra: Protobuf atomics deprecated.
> ---
>
> Key: MESOS-6213
> URL: https://issues.apache.org/jira/browse/MESOS-6213
> Project: Mesos
>  Issue Type: Bug
>  Components: build
>Reporter: Charles Allen
>
> Building on OSX is giving the following error.
> {code}
> In file included from 
> ../3rdparty/protobuf-2.6.1/src/google/protobuf/stubs/atomicops.h:184:
> ../3rdparty/protobuf-2.6.1/src/google/protobuf/stubs/atomicops_internals_macosx.h:173:9:
>  error: 'OSAtomicCompareAndSwap64Barrier' is deprecated: first
>   deprecated in macOS 10.12 - Use std::atomic_compare_exchange_strong() 
> from  instead [-Werror,-Wdeprecated-declarations]
> if (OSAtomicCompareAndSwap64Barrier(
> ^
> /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.12.sdk/usr/include/libkern/OSAtomicDeprecated.h:645:9:
>  note:
>   'OSAtomicCompareAndSwap64Barrier' has been explicitly marked deprecated 
> here
> boolOSAtomicCompareAndSwap64Barrier( int64_t __oldValue, int64_t 
> __newValue,
> ^
> {code}
> Protobuf is not listed as a component so I just set it as {{build}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-6213) Build failure on macOS Sierra: Protobuf atomics deprecated.

2017-06-17 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16052782#comment-16052782
 ] 

Deshi Xiao commented on MESOS-6213:
---

Got it. I have re-download the mesos source code, and build againg. it works 
like a charm. not errors final. thanks a lot.

I have not above error again in current code base. it can close the issue.

> Build failure on macOS Sierra: Protobuf atomics deprecated.
> ---
>
> Key: MESOS-6213
> URL: https://issues.apache.org/jira/browse/MESOS-6213
> Project: Mesos
>  Issue Type: Bug
>  Components: build
>Reporter: Charles Allen
>
> Building on OSX is giving the following error.
> {code}
> In file included from 
> ../3rdparty/protobuf-2.6.1/src/google/protobuf/stubs/atomicops.h:184:
> ../3rdparty/protobuf-2.6.1/src/google/protobuf/stubs/atomicops_internals_macosx.h:173:9:
>  error: 'OSAtomicCompareAndSwap64Barrier' is deprecated: first
>   deprecated in macOS 10.12 - Use std::atomic_compare_exchange_strong() 
> from  instead [-Werror,-Wdeprecated-declarations]
> if (OSAtomicCompareAndSwap64Barrier(
> ^
> /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.12.sdk/usr/include/libkern/OSAtomicDeprecated.h:645:9:
>  note:
>   'OSAtomicCompareAndSwap64Barrier' has been explicitly marked deprecated 
> here
> boolOSAtomicCompareAndSwap64Barrier( int64_t __oldValue, int64_t 
> __newValue,
> ^
> {code}
> Protobuf is not listed as a component so I just set it as {{build}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-6213) Build failure on macOS Sierra: Protobuf atomics deprecated.

2017-06-17 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16052760#comment-16052760
 ] 

Deshi Xiao commented on MESOS-6213:
---

howto disable libprocess firewall?

> Build failure on macOS Sierra: Protobuf atomics deprecated.
> ---
>
> Key: MESOS-6213
> URL: https://issues.apache.org/jira/browse/MESOS-6213
> Project: Mesos
>  Issue Type: Bug
>  Components: build
>Reporter: Charles Allen
>
> Building on OSX is giving the following error.
> {code}
> In file included from 
> ../3rdparty/protobuf-2.6.1/src/google/protobuf/stubs/atomicops.h:184:
> ../3rdparty/protobuf-2.6.1/src/google/protobuf/stubs/atomicops_internals_macosx.h:173:9:
>  error: 'OSAtomicCompareAndSwap64Barrier' is deprecated: first
>   deprecated in macOS 10.12 - Use std::atomic_compare_exchange_strong() 
> from  instead [-Werror,-Wdeprecated-declarations]
> if (OSAtomicCompareAndSwap64Barrier(
> ^
> /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.12.sdk/usr/include/libkern/OSAtomicDeprecated.h:645:9:
>  note:
>   'OSAtomicCompareAndSwap64Barrier' has been explicitly marked deprecated 
> here
> boolOSAtomicCompareAndSwap64Barrier( int64_t __oldValue, int64_t 
> __newValue,
> ^
> {code}
> Protobuf is not listed as a component so I just set it as {{build}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-6213) Build failure on macOS Sierra: Protobuf atomics deprecated.

2017-06-16 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16052656#comment-16052656
 ] 

Deshi Xiao commented on MESOS-6213:
---

 libprocess firewall  it not working on macos Sierra.

> Build failure on macOS Sierra: Protobuf atomics deprecated.
> ---
>
> Key: MESOS-6213
> URL: https://issues.apache.org/jira/browse/MESOS-6213
> Project: Mesos
>  Issue Type: Bug
>  Components: build
>Reporter: Charles Allen
>
> Building on OSX is giving the following error.
> {code}
> In file included from 
> ../3rdparty/protobuf-2.6.1/src/google/protobuf/stubs/atomicops.h:184:
> ../3rdparty/protobuf-2.6.1/src/google/protobuf/stubs/atomicops_internals_macosx.h:173:9:
>  error: 'OSAtomicCompareAndSwap64Barrier' is deprecated: first
>   deprecated in macOS 10.12 - Use std::atomic_compare_exchange_strong() 
> from  instead [-Werror,-Wdeprecated-declarations]
> if (OSAtomicCompareAndSwap64Barrier(
> ^
> /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.12.sdk/usr/include/libkern/OSAtomicDeprecated.h:645:9:
>  note:
>   'OSAtomicCompareAndSwap64Barrier' has been explicitly marked deprecated 
> here
> boolOSAtomicCompareAndSwap64Barrier( int64_t __oldValue, int64_t 
> __newValue,
> ^
> {code}
> Protobuf is not listed as a component so I just set it as {{build}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-6213) Build failure on macOS Sierra: Protobuf atomics deprecated.

2017-06-16 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16051879#comment-16051879
 ] 

Deshi Xiao commented on MESOS-6213:
---

libtool: error: 'libprocess_la-firewall.lo' is not a valid libtool object , it 
not clues on this error.

> Build failure on macOS Sierra: Protobuf atomics deprecated.
> ---
>
> Key: MESOS-6213
> URL: https://issues.apache.org/jira/browse/MESOS-6213
> Project: Mesos
>  Issue Type: Bug
>  Components: build
>Reporter: Charles Allen
>
> Building on OSX is giving the following error.
> {code}
> In file included from 
> ../3rdparty/protobuf-2.6.1/src/google/protobuf/stubs/atomicops.h:184:
> ../3rdparty/protobuf-2.6.1/src/google/protobuf/stubs/atomicops_internals_macosx.h:173:9:
>  error: 'OSAtomicCompareAndSwap64Barrier' is deprecated: first
>   deprecated in macOS 10.12 - Use std::atomic_compare_exchange_strong() 
> from  instead [-Werror,-Wdeprecated-declarations]
> if (OSAtomicCompareAndSwap64Barrier(
> ^
> /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.12.sdk/usr/include/libkern/OSAtomicDeprecated.h:645:9:
>  note:
>   'OSAtomicCompareAndSwap64Barrier' has been explicitly marked deprecated 
> here
> boolOSAtomicCompareAndSwap64Barrier( int64_t __oldValue, int64_t 
> __newValue,
> ^
> {code}
> Protobuf is not listed as a component so I just set it as {{build}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-6200) Hope mesos support soft and hard cpu/memory resource in the task

2017-06-14 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16048767#comment-16048767
 ] 

Deshi Xiao commented on MESOS-6200:
---

[~haosd...@gmail.com]  any update on this comments.

> Hope mesos support soft and hard cpu/memory resource in the task
> 
>
> Key: MESOS-6200
> URL: https://issues.apache.org/jira/browse/MESOS-6200
> Project: Mesos
>  Issue Type: Improvement
>  Components: cgroups, containerization, docker, scheduler api
>Affects Versions: 0.28.2
> Environment: CentOS 7 
> Kernel 3.10.0-327.28.3.el7.x86_64
> Mesos 0.28.2
> Docker 1.11.2
>Reporter: Lei Xu
>
> The Docker executor maybe could support soft/hard resource limit to enable 
> more flexible resources sharing among the applications.
> ||  || CPU || Memory ||
> | hard limit| --cpu-period & --cpu-quota | --memory & --memory-swap|
> | soft limit| --cpu-shares | --memory-reservation|
> And now the task protobuf message has only one resource struct that used to 
> describe the cgroup limit, and the docker executor handle is like the 
> following, only --memory and --cpu-shares were set:
> {code}
>   if (resources.isSome()) {
> // TODO(yifan): Support other resources (e.g. disk).
> Option cpus = resources.get().cpus();
> if (cpus.isSome()) {
>   uint64_t cpuShare =
> std::max((uint64_t) (CPU_SHARES_PER_CPU * cpus.get()), 
> MIN_CPU_SHARES);
>   argv.push_back("--cpu-shares");
>   argv.push_back(stringify(cpuShare));
> }
> Option mem = resources.get().mem();
> if (mem.isSome()) {
>   Bytes memLimit = std::max(mem.get(), MIN_MEMORY);
>   argv.push_back("--memory");
>   argv.push_back(stringify(memLimit.bytes()));
> }
>   }
> {code}
> I hope that the executor and the protobuf message could separate the resource 
> to the two parts: soft and hard. Then the user could set 2 levels resource 
> limits for the docker.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-6223) Allow agents to re-register post a host reboot

2017-06-14 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16048741#comment-16048741
 ] 

Deshi Xiao commented on MESOS-6223:
---

need refactor the patch. [~megha.sharma]

> Allow agents to re-register post a host reboot
> --
>
> Key: MESOS-6223
> URL: https://issues.apache.org/jira/browse/MESOS-6223
> Project: Mesos
>  Issue Type: Improvement
>  Components: agent
>Reporter: Megha Sharma
>Assignee: Megha Sharma
>
> Agent does’t recover its state post a host reboot, it registers with the 
> master and gets a new SlaveID. With partition awareness, the agents are now 
> allowed to re-register after they have been marked Unreachable. The executors 
> are anyway terminated on the agent when it reboots so there is no harm in 
> letting the agent keep its SlaveID, re-register with the master and reconcile 
> the lost executors. This is a pre-requisite for supporting 
> persistent/restartable tasks in mesos (MESOS-3545).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-5962) Support multiple health checks per task.

2017-06-13 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16048726#comment-16048726
 ] 

Deshi Xiao commented on MESOS-5962:
---

[~alexr]  i have review the docs, could you please highlight the reason to hold 
on the proposal. i don't understand at all.

> Support multiple health checks per task.
> 
>
> Key: MESOS-5962
> URL: https://issues.apache.org/jira/browse/MESOS-5962
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Alexander Rukletsov
>  Labels: check, health-check, mesosphere
>
> Currently, only a single check and a single health check per task is 
> supported. Consider supporting multiple checks and/or health checks. There 
> are various approaches how to treat them:
> * do health aggregation in Mesos or delegate it to a frameworks,
> * have a single or multiple restart policies (one per health check),
> * introduce health check ids or not.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-6213) Build failure on macOS Sierra: Protobuf atomics deprecated.

2017-06-10 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16045447#comment-16045447
 ] 

Deshi Xiao commented on MESOS-6213:
---

/bin/sh ../../libtool  --tag=CXX   --mode=link g++ -Wall -Wsign-compare 
-Wformat-security -fstack-protector-strong -fPIC -fPIE -g -O0 
-Wno-unused-local-typedef -std=c++11 -stdlib=libc++ -static -fpic 
-L/usr/local/opt/subversion/lib -L/usr/local/opt/openssl/lib 
-L/usr/local/opt/libevent/lib -L/usr/local/opt/apr/libexec/lib  -o 
libprocess.la  libprocess_la-authenticator_manager.lo 
libprocess_la-authenticator.lo libprocess_la-clock.lo libprocess_la-firewall.lo 
libprocess_la-help.lo libprocess_la-http.lo libprocess_la-io.lo 
libprocess_la-latch.lo libprocess_la-logging.lo libprocess_la-metrics.lo 
libprocess_la-mime.lo libprocess_la-pid.lo libprocess_la-poll_socket.lo 
libprocess_la-profiler.lo libprocess_la-process.lo libprocess_la-reap.lo 
libprocess_la-socket.lo libprocess_la-subprocess.lo 
libprocess_la-subprocess_posix.lo libprocess_la-time.lo 
libprocess_la-timeseries.lo   libprocess_la-libev.lo 
libprocess_la-libev_poll.lo ../glog-0.3.3/libglog.la  ../libry_http_parser.la 
../libev-4.22/libev.la -lz -lsvn_delta-1 -lsvn_subr-1 -lsasl2 -lcurl -lapr-1 
-lz 
libtool:   error: 'libprocess_la-firewall.lo' is not a valid libtool object
make[5]: *** [libprocess.la] Error 1
make[4]: *** [all-recursive] Error 1
make[3]: *** [all] Error 2
make[2]: *** [all-recursive] Error 1
make[1]: *** [all] Error 2
make: *** [all-recursive] Error 1

> Build failure on macOS Sierra: Protobuf atomics deprecated.
> ---
>
> Key: MESOS-6213
> URL: https://issues.apache.org/jira/browse/MESOS-6213
> Project: Mesos
>  Issue Type: Bug
>  Components: build
>Reporter: Charles Allen
>
> Building on OSX is giving the following error.
> {code}
> In file included from 
> ../3rdparty/protobuf-2.6.1/src/google/protobuf/stubs/atomicops.h:184:
> ../3rdparty/protobuf-2.6.1/src/google/protobuf/stubs/atomicops_internals_macosx.h:173:9:
>  error: 'OSAtomicCompareAndSwap64Barrier' is deprecated: first
>   deprecated in macOS 10.12 - Use std::atomic_compare_exchange_strong() 
> from  instead [-Werror,-Wdeprecated-declarations]
> if (OSAtomicCompareAndSwap64Barrier(
> ^
> /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.12.sdk/usr/include/libkern/OSAtomicDeprecated.h:645:9:
>  note:
>   'OSAtomicCompareAndSwap64Barrier' has been explicitly marked deprecated 
> here
> boolOSAtomicCompareAndSwap64Barrier( int64_t __oldValue, int64_t 
> __newValue,
> ^
> {code}
> Protobuf is not listed as a component so I just set it as {{build}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-5962) Support multiple health checks per task.

2017-06-07 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16041861#comment-16041861
 ] 

Deshi Xiao commented on MESOS-5962:
---

interesting. need design docs

> Support multiple health checks per task.
> 
>
> Key: MESOS-5962
> URL: https://issues.apache.org/jira/browse/MESOS-5962
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Alexander Rukletsov
>  Labels: check, health-check, mesosphere
>
> Currently, only a single check and a single health check per task is 
> supported. Consider supporting multiple checks and/or health checks. There 
> are various approaches how to treat them:
> * do health aggregation in Mesos or delegate it to a frameworks,
> * have a single or multiple restart policies (one per health check),
> * introduce health check ids or not.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-7635) portmappings bug of docker container

2017-06-06 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16040010#comment-16040010
 ] 

Deshi Xiao commented on MESOS-7635:
---

it seem's marathon's scope issue. are you sure this is mesos bug?

> portmappings bug of docker container
> 
>
> Key: MESOS-7635
> URL: https://issues.apache.org/jira/browse/MESOS-7635
> Project: Mesos
>  Issue Type: Bug
>  Components: docker, framework
>Affects Versions: 1.2.0
>Reporter: guanyu
>
> To prove this ,I did  two tests;
> case 1:
> This is a part of  configuration of marathon:
> "cmd": "python -m SimpleHTTPServer $PORT0",
> "container": {
> "type": "DOCKER",
> "volumes": [],
> "docker": {
>   "image": "qihoo.cloud/cloud/x4python:1.0.0",
>   "network": "BRIDGE",
>   "portMappings": [
> {
>   "containerPort": 2000,
>   "hostPort": 0,
>   "servicePort": 2000,
>   "protocol": "tcp",
>   "labels": {}
> }
>   ],
>   "privileged": false,
>   "parameters": [],
>   "forcePullImage": false
> }
>   },
>   "labels": {
> "HAPROXY_GROUP": "test"
>   },
> After launch success,I tried to access haproxy port 2000,but there was no 
> result returned,Then I checked the runtime status, The relevant information 
> is follows:
> Marathon Endpoints: mesos20.xxx.xxx.xxx.xxx:63132
> \# docker ps
> CONTAINER IDIMAGE COMMAND 
>  CREATED STATUS  PORTS 
> NAMES
> c85ff0427da4cloud/x4python:1.0.0  "/bin/sh -c 'pytho..."   7 
> minutes ago   Up 7 minutes0.0.0.0:63132->2000/tcp   
> mesos-fc86fdb5-7a6f-4ba4-96d3-d082fcfc1236-S21.4c242985-3967-44ed-8c5d-4844ec0733c0
> \# docker exec -it c85ff0427da4 bash
> [root@c85ff0427da4 ~]# netstat -atunp
> Active Internet connections (servers and established)
> Proto Recv-Q Send-Q Local Address   Foreign Address 
> State   PID/Program name
> tcp0  0 0.0.0.0:63132   0.0.0.0:*   
> LISTEN  1/python
> case 2:
> I tried to update the configuration of app, and  assigned containerPort to 0, 
> then repeated the above operations:
> "network": "BRIDGE",
> "portMappings": [
>   {
> "containerPort": 0,
> "hostPort": 0,
> "servicePort": 2000,
> "protocol": "tcp",
> "labels": {}
>   }
> ]
> \# docker ps
> CONTAINER IDIMAGE COMMAND 
>  CREATED  STATUS  PORTS  
> NAMES
> b43e84644a8acloud/x4python:1.0.0  "/bin/sh -c 'pytho..."   
> About a minute ago   Up About a minute   0.0.0.0:18797->18797/tcp   
> mesos-fc86fdb5-7a6f-4ba4-96d3-d082fcfc1236-S21.1a0bfb53-1fb8-41db-98c9-6d4714587085
> Then I accessed the service by haproxy port 2000,service is ok, that's what I 
> expected; So I think that there is a bug  in the first case was triggered, 
> Looking forward to a reply,thanks.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-7418) Add support for file-based secrets

2017-05-06 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999664#comment-15999664
 ] 

Deshi Xiao commented on MESOS-7418:
---

@Kapil Arya any update on it?


> Add support for file-based secrets
> --
>
> Key: MESOS-7418
> URL: https://issues.apache.org/jira/browse/MESOS-7418
> Project: Mesos
>  Issue Type: Improvement
>  Components: modules, security
>Reporter: Kapil Arya
>Assignee: Kapil Arya
>  Labels: mesosphere
>
> The goal is to allow users to populate files within a task's environment with 
> contents fetched from a backend secret store. A new secret fetching module 
> interface is proposed to allow interaction with arbitrary third-party secret 
> stores.
> Further details are covered in the attached design doc



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (MESOS-7418) Add support for file-based secrets

2017-05-06 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999664#comment-15999664
 ] 

Deshi Xiao edited comment on MESOS-7418 at 5/7/17 3:26 AM:
---

[~karya] any update on it?



was (Author: xds2000):
@Kapil Arya any update on it?


> Add support for file-based secrets
> --
>
> Key: MESOS-7418
> URL: https://issues.apache.org/jira/browse/MESOS-7418
> Project: Mesos
>  Issue Type: Improvement
>  Components: modules, security
>Reporter: Kapil Arya
>Assignee: Kapil Arya
>  Labels: mesosphere
>
> The goal is to allow users to populate files within a task's environment with 
> contents fetched from a backend secret store. A new secret fetching module 
> interface is proposed to allow interaction with arbitrary third-party secret 
> stores.
> Further details are covered in the attached design doc



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-6223) Allow agents to re-register post a host reboot

2017-05-05 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999209#comment-15999209
 ] 

Deshi Xiao commented on MESOS-6223:
---

[~megha.sharma] could you please refresh the patch based on Vinod Kone suggests?

> Allow agents to re-register post a host reboot
> --
>
> Key: MESOS-6223
> URL: https://issues.apache.org/jira/browse/MESOS-6223
> Project: Mesos
>  Issue Type: Improvement
>  Components: agent
>Reporter: Megha Sharma
>Assignee: Megha Sharma
>
> Agent does’t recover its state post a host reboot, it registers with the 
> master and gets a new SlaveID. With partition awareness, the agents are now 
> allowed to re-register after they have been marked Unreachable. The executors 
> are anyway terminated on the agent when it reboots so there is no harm in 
> letting the agent keep its SlaveID, re-register with the master and reconcile 
> the lost executors. This is a pre-requisite for supporting 
> persistent/restartable tasks in mesos (MESOS-3545).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-6184) Health checks should use a general mechanism to enter namespaces of the task.

2017-05-05 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999204#comment-15999204
 ] 

Deshi Xiao commented on MESOS-6184:
---

it should be rebase and re-submit the patch. [~haosd...@gmail.com]

> Health checks should use a general mechanism to enter namespaces of the task.
> -
>
> Key: MESOS-6184
> URL: https://issues.apache.org/jira/browse/MESOS-6184
> Project: Mesos
>  Issue Type: Improvement
>Reporter: haosdent
>Assignee: haosdent
>  Labels: check, health-check, mesosphere
>
> To perform health checks for tasks, we need to enter the corresponding 
> namespaces of the container. For now health check use custom clone to 
> implement this
> {code}
>   return process::defaultClone([=]() -> int {
> if (taskPid.isSome()) {
>   foreach (const string& ns, namespaces) {
> Try setns = ns::setns(taskPid.get(), ns);
> if (setns.isError()) {
>   ...
> }
>   }
> }
> return func();
>   });
> {code}
> After the childHooks patches merged, we could change the health check to use 
> childHooks to call {{setns}} and make {{process::defaultClone}} private 
> again.  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-3545) Investigate restoring tasks/executors after machine reboot.

2017-05-05 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999201#comment-15999201
 ] 

Deshi Xiao commented on MESOS-3545:
---

[~xujyan] do we have any process?

> Investigate restoring tasks/executors after machine reboot.
> ---
>
> Key: MESOS-3545
> URL: https://issues.apache.org/jira/browse/MESOS-3545
> Project: Mesos
>  Issue Type: Epic
>  Components: agent
>Reporter: Benjamin Hindman
>Assignee: Megha Sharma
>
> If a task/executor is restartable (see MESOS-3544) it might make sense to 
> force an agent to restart these tasks/executors _before_ after a machine 
> reboot in the event that the machine is network partitioned away from the 
> master (or the master has failed) but we'd like to get these services running 
> again. Assuming the agent(s) running on the machine has not been disconnected 
> from the master for longer than the master's agent re-registration timeout 
> the agent should be able to re-register (i.e., after a network partition is 
> resolved) without a problem. However, in the same way that a framework would 
> be interested in knowing that it's tasks/executors were restarted we'd want 
> to send something like a TASK_RESTARTED status update.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-6933) Executor does not respect grace period

2017-05-05 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999199#comment-15999199
 ] 

Deshi Xiao commented on MESOS-6933:
---

how to reproduce this bug? let me understand where can do a patch.

> Executor does not respect grace period
> --
>
> Key: MESOS-6933
> URL: https://issues.apache.org/jira/browse/MESOS-6933
> Project: Mesos
>  Issue Type: Bug
>  Components: executor
>Reporter: Tomasz Janiszewski
>
> Mesos Command Executor try to support grace period with escalate but 
> unfortunately it does not work. It launches {{command}} by wrapping it in 
> {{sh -c}} this cause process tree to look like this
> {code}
> Received killTask
> Shutting down
> Sending SIGTERM to process tree at pid 18
> Sent SIGTERM to the following process trees:
> [ 
> -+- 18 sh -c cd offer-i18n-0.1.24 && LD_PRELOAD=../librealresources.so 
> ./bin/offer-i18n -e prod -p $PORT0 
>  \--- 19 command...
> ]
> Command terminated with signal Terminated (pid: 18)
> {code}
> This cause {{sh}} to immediately close and so executor, while wrapped 
> {{command}} might need some more time to finish. Finally, executor thinks 
> command executed gracefully so it won't 
> [escalate|https://github.com/apache/mesos/blob/1.1.0/src/launcher/executor.cpp#L695]
>  to SIGKILL.
> This cause leaks when POSIX containerizer is used because if command ignores 
> SIGTERM it will be attached to initialize and never get killed. Using 
> pid/namespace only masks the problem because hanging process is captured 
> before it can gracefully shutdown.
> Fix for this is to sent SIGTERM only to {{sh}} children. {{sh}} will exit 
> when all children processes finish. If not they will be killed by escalation 
> to SIGKILL.
> All versions from 0.20 are affected.
> This test should pass 
> [src/tests/command_executor_tests.cpp:342|https://github.com/apache/mesos/blob/2c856178b59593ff8068ea8d6c6593943c33008c/src/tests/command_executor_tests.cpp#L342-L343]
> [Mailing list 
> thread|https://lists.apache.org/thread.html/1025dca0cf4418aee50b14330711500af864f08b53eb82d10cd5c04c@%3Cuser.mesos.apache.org%3E]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-7068) Add OnTerminationPolicy handling to the default executor.

2017-04-28 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15988947#comment-15988947
 ] 

Deshi Xiao commented on MESOS-7068:
---

why this ticket is outdate? does mean this ticket is out date? please give some 
feedback. 
[~anandmazumdar]


> Add OnTerminationPolicy handling to the default executor.
> -
>
> Key: MESOS-7068
> URL: https://issues.apache.org/jira/browse/MESOS-7068
> Project: Mesos
>  Issue Type: Task
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>  Labels: mesosphere
>
> We should support handling {{OnTerminationPolicy}} specified in {{TaskInfo}} 
> to the default executor. Currently, the default policy for the default 
> executor is to kill the entire task group when a task in the task group 
> fails. This would allow framework developers to specify a custom policy e.g., 
> keep the executor still alive when a back up task in the task group fails etc.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-5544) Support running Mesos agent in a Docker container.

2017-04-28 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15988902#comment-15988902
 ] 

Deshi Xiao commented on MESOS-5544:
---

i think this feature is implemented.

> Support running Mesos agent in a Docker container.
> --
>
> Key: MESOS-5544
> URL: https://issues.apache.org/jira/browse/MESOS-5544
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Jie Yu
>
> Currently, this does not work if one tries to use Mesos containerizer.
> The main problem is that we want to make sure the executor is not killed when 
> agent crashes. So we have to use --pid=host so that the agent is in the host 
> pid namespace.
> But that is not sufficient, Docker daemon will put agent into all cgroups 
> available on the host. We need to make sure we migrate the executor pid out 
> of those cgroups so that when agent crashes, executors are not killed.
> Also, when start the agent container, volumes need to be setup properly so 
> that any mounts under agent's work_dir will be propagate back to the host 
> mount table. This is to make sure we can recover those mounts after agent 
> restarts. This is also true for those mounts that are needed by some isolator 
> (e.g., network/cni isolator).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-5573) Executor Driver does not invoke the `disconnected` callback upon disconnection with the agent.

2017-04-28 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15988651#comment-15988651
 ] 

Deshi Xiao commented on MESOS-5573:
---

any update on it?

> Executor Driver does not invoke the `disconnected` callback upon 
> disconnection with the agent.
> --
>
> Key: MESOS-5573
> URL: https://issues.apache.org/jira/browse/MESOS-5573
> Project: Mesos
>  Issue Type: Bug
>Reporter: Anand Mazumdar
>Assignee: deshna jain
>  Labels: mesosphere, newbie
>
> The executor driver must invoke the {{disconnected}} callback upon 
> disconnecting with the agent i.e. if the agent process restarts as per 
> documentation:
> https://mesos.apache.org/api/latest/java/org/apache/mesos/Executor.html#disconnected(org.apache.mesos.ExecutorDriver)
> It does not seem to be the case that is being done currently. Also, this 
> callback should only be invoked for frameworks with checkpointing enabled as 
> for non-checkpointed frameworks the executor is shutdown upon a disconnection.
> There might already be a JIRA for this. But, I was not able to spot any.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (MESOS-5573) Executor Driver does not invoke the `disconnected` callback upon disconnection with the agent.

2017-04-28 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15988651#comment-15988651
 ] 

Deshi Xiao edited comment on MESOS-5573 at 4/28/17 11:06 AM:
-

any update on it?

@deshna jain


was (Author: xds2000):
any update on it?

> Executor Driver does not invoke the `disconnected` callback upon 
> disconnection with the agent.
> --
>
> Key: MESOS-5573
> URL: https://issues.apache.org/jira/browse/MESOS-5573
> Project: Mesos
>  Issue Type: Bug
>Reporter: Anand Mazumdar
>Assignee: deshna jain
>  Labels: mesosphere, newbie
>
> The executor driver must invoke the {{disconnected}} callback upon 
> disconnecting with the agent i.e. if the agent process restarts as per 
> documentation:
> https://mesos.apache.org/api/latest/java/org/apache/mesos/Executor.html#disconnected(org.apache.mesos.ExecutorDriver)
> It does not seem to be the case that is being done currently. Also, this 
> callback should only be invoked for frameworks with checkpointing enabled as 
> for non-checkpointed frameworks the executor is shutdown upon a disconnection.
> There might already be a JIRA for this. But, I was not able to spot any.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-2717) Qemu/KVM containerizer

2017-04-28 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15988529#comment-15988529
 ] 

Deshi Xiao commented on MESOS-2717:
---

[~a10gupta] yes, any update on it.

> Qemu/KVM containerizer
> --
>
> Key: MESOS-2717
> URL: https://issues.apache.org/jira/browse/MESOS-2717
> Project: Mesos
>  Issue Type: Wish
>  Components: containerization
>Reporter: Pierre-Yves Ritschard
>Assignee: Abhishek Dasgupta
>
> I think it would make sense for Mesos to have the ability to treat 
> hypervisors as containerizers and the most sensible one to start with would 
> probably be Qemu/KVM.
> There are a few workloads that can require full-fledged VMs (the most obvious 
> one being Windows workloads).
> The containerization code is well decoupled and seems simple enough, I can 
> definitely take a shot at it. VMs do bring some questions with them here is 
> my take on them:
> 1. Routing, network strategy
> ==
> The simplest approach here might very well be to go for bridged networks
> and leave the setup and inter slave routing up to the administrator
> 2. IP Address assignment
> 
> At first, it can be up to the Frameworks to deal with IP assignment.
> The simplest way to address this could be to have an executor running
> on slaves providing the qemu/kvm containerizer which would instrument a DHCP 
> server and collect IP + Mac address resources from slaves. While it may be up 
> to the frameworks to provide this, an example should most likely be provided.
> 3. VM Templates
> ==
> VM templates should probably leverage the fetcher and could thus be copied 
> locally or fetch from HTTP(s) / HDFS.
> 4. Resource limiting
> 
> Mapping resouce constraints to the qemu command line is probably the easiest 
> part, Additional command line should also be fetchable. For Unix VMs, the 
> sandbox could show the output of the serial console
> 5. Libvirt / plain Qemu
> =
> I tend to favor limiting the amount of necessary hoops to jump through and 
> would thus investigate working directly with Qemu, maintaining an open 
> connection to the monitor to assert status.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-6223) Allow agents to re-register post a host reboot

2017-04-28 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15988379#comment-15988379
 ] 

Deshi Xiao commented on MESOS-6223:
---

Thanks Neil.

> Allow agents to re-register post a host reboot
> --
>
> Key: MESOS-6223
> URL: https://issues.apache.org/jira/browse/MESOS-6223
> Project: Mesos
>  Issue Type: Improvement
>  Components: agent
>Reporter: Megha Sharma
>Assignee: Megha Sharma
>
> Agent does’t recover its state post a host reboot, it registers with the 
> master and gets a new SlaveID. With partition awareness, the agents are now 
> allowed to re-register after they have been marked Unreachable. The executors 
> are anyway terminated on the agent when it reboots so there is no harm in 
> letting the agent keep its SlaveID, re-register with the master and reconcile 
> the lost executors. This is a pre-requisite for supporting 
> persistent/restartable tasks in mesos (MESOS-3545).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-6223) Allow agents to re-register post a host reboot

2017-04-26 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15984224#comment-15984224
 ] 

Deshi Xiao commented on MESOS-6223:
---

[~neilc]  any progress on it.

> Allow agents to re-register post a host reboot
> --
>
> Key: MESOS-6223
> URL: https://issues.apache.org/jira/browse/MESOS-6223
> Project: Mesos
>  Issue Type: Improvement
>  Components: agent
>Reporter: Megha Sharma
>Assignee: Megha Sharma
>
> Agent does’t recover its state post a host reboot, it registers with the 
> master and gets a new SlaveID. With partition awareness, the agents are now 
> allowed to re-register after they have been marked Unreachable. The executors 
> are anyway terminated on the agent when it reboots so there is no harm in 
> letting the agent keep its SlaveID, re-register with the master and reconcile 
> the lost executors. This is a pre-requisite for supporting 
> persistent/restartable tasks in mesos (MESOS-3545).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-5163) LKVM Containerization

2017-04-26 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15984223#comment-15984223
 ] 

Deshi Xiao commented on MESOS-5163:
---

anyone interesting this topic, let me discussion howto do that.

> LKVM Containerization
> -
>
> Key: MESOS-5163
> URL: https://issues.apache.org/jira/browse/MESOS-5163
> Project: Mesos
>  Issue Type: Epic
>  Components: containerization
>Reporter: Vaibhav Khanduja
>  Labels: container, containerizer
>
> LKVM is lightweight kernel based hypervisors. The hypervisor is eventually 
> designed to land inside kernel code, it may be good step to consider 
> supporting as one the container option. LKVM comes with the advantage of been 
> light weight container along with its own kernel footprint. Having a separate 
> kernel footprint goes way forward in solving issue of security with 
> containers.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-6223) Allow agents to re-register post a host reboot

2017-04-17 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15970939#comment-15970939
 ] 

Deshi Xiao commented on MESOS-6223:
---

[~neilc]  do you have any update on this patch: 
https://reviews.apache.org/r/56895/

> Allow agents to re-register post a host reboot
> --
>
> Key: MESOS-6223
> URL: https://issues.apache.org/jira/browse/MESOS-6223
> Project: Mesos
>  Issue Type: Improvement
>  Components: agent
>Reporter: Megha Sharma
>Assignee: Megha Sharma
>
> Agent does’t recover its state post a host reboot, it registers with the 
> master and gets a new SlaveID. With partition awareness, the agents are now 
> allowed to re-register after they have been marked Unreachable. The executors 
> are anyway terminated on the agent when it reboots so there is no harm in 
> letting the agent keep its SlaveID, re-register with the master and reconcile 
> the lost executors. This is a pre-requisite for supporting 
> persistent/restartable tasks in mesos (MESOS-3545).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (MESOS-7210) MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( pid namespace mismatch )

2017-04-16 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15967207#comment-15967207
 ] 

Deshi Xiao edited comment on MESOS-7210 at 4/16/17 9:49 AM:


for avoid of abuse priviledges, just use --cap-add SYS_ADMIN to resolve the net 
operation issue.

```
Failed to enter the net namespace of task (pid: '78851'): Operation not 
permitted
```


was (Author: xds2000):
for avoid of abuse priviledges, just use --cap-add NET_ADMIN to resolve the net 
operation issue.

```
Failed to enter the net namespace of task (pid: '78851'): Operation not 
permitted
```

> MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( 
> pid namespace mismatch )
> ---
>
> Key: MESOS-7210
> URL: https://issues.apache.org/jira/browse/MESOS-7210
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 1.1.0, 1.1.1, 1.2.0
> Environment: Ubuntu 16.04.02
> Docker version 1.13.1
> mesos 1.1.0, runs from container
> docker containers  spawned by marathon 1.4.1
>Reporter: Wojciech Sielski
>Assignee: Deshi Xiao
>Priority: Critical
>
> When running mesos-slave with option "docker_mesos_image" like:
> {code}
> --master=zk://standalone:2181/mesos  --containerizers=docker,mesos  
> --executor_registration_timeout=5mins  --hostname=standalone  --ip=0.0.0.0  
> --docker_stop_timeout=5secs  --gc_delay=1days  
> --docker_socket=/var/run/docker.sock  --no-systemd_enable_support  
> --work_dir=/tmp/mesos  --docker_mesos_image=panteras/paas-in-a-box:0.4.0
> {code}
> from the container that was started with option "pid: host" like:
> {code}
>   net:host
>   privileged: true
>   pid:host
> {code}
> and example marathon job, that use MESOS_HTTP checks like:
> {code}
> {
>  "id": "python-example-stable",
>  "cmd": "python3 -m http.server 8080",
>  "mem": 16,
>  "cpus": 0.1,
>  "instances": 2,
>  "container": {
>"type": "DOCKER",
>"docker": {
>  "image": "python:alpine",
>  "network": "BRIDGE",
>  "portMappings": [
> { "containerPort": 8080, "hostPort": 0, "protocol": "tcp" }
>  ]
>}
>  },
>  "env": {
>"SERVICE_NAME" : "python"
>  },
>  "healthChecks": [
>{
>  "path": "/",
>  "portIndex": 0,
>  "protocol": "MESOS_HTTP",
>  "gracePeriodSeconds": 30,
>  "intervalSeconds": 10,
>  "timeoutSeconds": 30,
>  "maxConsecutiveFailures": 3
>}
>  ]
> }
> {code}
> I see the errors like:
> {code}
> F0306 07:41:58.84429335 health_checker.cpp:94] Failed to enter the net 
> namespace of task (pid: '13527'): Pid 13527 does not exist
> *** Check failure stack trace: ***
> @ 0x7f51770b0c1d  google::LogMessage::Fail()
> @ 0x7f51770b29d0  google::LogMessage::SendToLog()
> @ 0x7f51770b0803  google::LogMessage::Flush()
> @ 0x7f51770b33f9  google::LogMessageFatal::~LogMessageFatal()
> @ 0x7f517647ce46  
> _ZNSt17_Function_handlerIFivEZN5mesos8internal6health14cloneWithSetnsERKSt8functionIS0_E6OptionIiERKSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaISG_EEEUlvE_E9_M_invokeERKSt9_Any_data
> @ 0x7f517647bf2b  mesos::internal::health::cloneWithSetns()
> @ 0x7f517648374b  std::_Function_handler<>::_M_invoke()
> @ 0x7f5177068167  process::internal::cloneChild()
> @ 0x7f5177065c32  process::subprocess()
> @ 0x7f5176481a9d  
> mesos::internal::health::HealthCheckerProcess::_httpHealthCheck()
> @ 0x7f51764831f7  
> mesos::internal::health::HealthCheckerProcess::_healthCheck()
> @ 0x7f517701f38c  process::ProcessBase::visit()
> @ 0x7f517702c8b3  process::ProcessManager::resume()
> @ 0x7f517702fb77  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
> @ 0x7f51754ddc80  (unknown)
> @ 0x7f5174cf06ba  start_thread
> @ 0x7f5174a2682d  (unknown)
> I0306 07:41:59.077986 9 health_checker.cpp:199] Ignoring failure as 
> health check still in grace period
> {code}
> Looks like option docker_mesos_image makes, that newly started mesos job is 
> not using "pid host" option same as mother container was started, but has his 
> own PID namespace (so it doesn't matter if mother container was started with 
> "pid host" or not it will never be able to find PID)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-7210) MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( pid namespace mismatch )

2017-04-13 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15967207#comment-15967207
 ] 

Deshi Xiao commented on MESOS-7210:
---

for avoid of abuse priviledges, just use --cap-add NET_ADMIN to resolve the net 
operation issue.

```
Failed to enter the net namespace of task (pid: '78851'): Operation not 
permitted
```

> MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( 
> pid namespace mismatch )
> ---
>
> Key: MESOS-7210
> URL: https://issues.apache.org/jira/browse/MESOS-7210
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 1.1.0, 1.1.1, 1.2.0
> Environment: Ubuntu 16.04.02
> Docker version 1.13.1
> mesos 1.1.0, runs from container
> docker containers  spawned by marathon 1.4.1
>Reporter: Wojciech Sielski
>Assignee: Deshi Xiao
>Priority: Critical
>
> When running mesos-slave with option "docker_mesos_image" like:
> {code}
> --master=zk://standalone:2181/mesos  --containerizers=docker,mesos  
> --executor_registration_timeout=5mins  --hostname=standalone  --ip=0.0.0.0  
> --docker_stop_timeout=5secs  --gc_delay=1days  
> --docker_socket=/var/run/docker.sock  --no-systemd_enable_support  
> --work_dir=/tmp/mesos  --docker_mesos_image=panteras/paas-in-a-box:0.4.0
> {code}
> from the container that was started with option "pid: host" like:
> {code}
>   net:host
>   privileged: true
>   pid:host
> {code}
> and example marathon job, that use MESOS_HTTP checks like:
> {code}
> {
>  "id": "python-example-stable",
>  "cmd": "python3 -m http.server 8080",
>  "mem": 16,
>  "cpus": 0.1,
>  "instances": 2,
>  "container": {
>"type": "DOCKER",
>"docker": {
>  "image": "python:alpine",
>  "network": "BRIDGE",
>  "portMappings": [
> { "containerPort": 8080, "hostPort": 0, "protocol": "tcp" }
>  ]
>}
>  },
>  "env": {
>"SERVICE_NAME" : "python"
>  },
>  "healthChecks": [
>{
>  "path": "/",
>  "portIndex": 0,
>  "protocol": "MESOS_HTTP",
>  "gracePeriodSeconds": 30,
>  "intervalSeconds": 10,
>  "timeoutSeconds": 30,
>  "maxConsecutiveFailures": 3
>}
>  ]
> }
> {code}
> I see the errors like:
> {code}
> F0306 07:41:58.84429335 health_checker.cpp:94] Failed to enter the net 
> namespace of task (pid: '13527'): Pid 13527 does not exist
> *** Check failure stack trace: ***
> @ 0x7f51770b0c1d  google::LogMessage::Fail()
> @ 0x7f51770b29d0  google::LogMessage::SendToLog()
> @ 0x7f51770b0803  google::LogMessage::Flush()
> @ 0x7f51770b33f9  google::LogMessageFatal::~LogMessageFatal()
> @ 0x7f517647ce46  
> _ZNSt17_Function_handlerIFivEZN5mesos8internal6health14cloneWithSetnsERKSt8functionIS0_E6OptionIiERKSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaISG_EEEUlvE_E9_M_invokeERKSt9_Any_data
> @ 0x7f517647bf2b  mesos::internal::health::cloneWithSetns()
> @ 0x7f517648374b  std::_Function_handler<>::_M_invoke()
> @ 0x7f5177068167  process::internal::cloneChild()
> @ 0x7f5177065c32  process::subprocess()
> @ 0x7f5176481a9d  
> mesos::internal::health::HealthCheckerProcess::_httpHealthCheck()
> @ 0x7f51764831f7  
> mesos::internal::health::HealthCheckerProcess::_healthCheck()
> @ 0x7f517701f38c  process::ProcessBase::visit()
> @ 0x7f517702c8b3  process::ProcessManager::resume()
> @ 0x7f517702fb77  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
> @ 0x7f51754ddc80  (unknown)
> @ 0x7f5174cf06ba  start_thread
> @ 0x7f5174a2682d  (unknown)
> I0306 07:41:59.077986 9 health_checker.cpp:199] Ignoring failure as 
> health check still in grace period
> {code}
> Looks like option docker_mesos_image makes, that newly started mesos job is 
> not using "pid host" option same as mother container was started, but has his 
> own PID namespace (so it doesn't matter if mother container was started with 
> "pid host" or not it will never be able to find PID)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-7088) Support private registry credential per container.

2017-04-12 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15965531#comment-15965531
 ] 

Deshi Xiao commented on MESOS-7088:
---

what design document on it? anyone can share it.

> Support private registry credential per container.
> --
>
> Key: MESOS-7088
> URL: https://issues.apache.org/jira/browse/MESOS-7088
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
>Reporter: Gilbert Song
>Assignee: Gilbert Song
>  Labels: containerizer
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-7210) MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( pid namespace mismatch )

2017-04-08 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15961797#comment-15961797
 ] 

Deshi Xiao commented on MESOS-7210:
---

in second try. i have subimt new patch to 58200. let me testing it again.

> MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( 
> pid namespace mismatch )
> ---
>
> Key: MESOS-7210
> URL: https://issues.apache.org/jira/browse/MESOS-7210
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 1.1.0, 1.1.1, 1.2.0
> Environment: Ubuntu 16.04.02
> Docker version 1.13.1
> mesos 1.1.0, runs from container
> docker containers  spawned by marathon 1.4.1
>Reporter: Wojciech Sielski
>Assignee: Deshi Xiao
>Priority: Critical
>
> When running mesos-slave with option "docker_mesos_image" like:
> {code}
> --master=zk://standalone:2181/mesos  --containerizers=docker,mesos  
> --executor_registration_timeout=5mins  --hostname=standalone  --ip=0.0.0.0  
> --docker_stop_timeout=5secs  --gc_delay=1days  
> --docker_socket=/var/run/docker.sock  --no-systemd_enable_support  
> --work_dir=/tmp/mesos  --docker_mesos_image=panteras/paas-in-a-box:0.4.0
> {code}
> from the container that was started with option "pid: host" like:
> {code}
>   net:host
>   privileged: true
>   pid:host
> {code}
> and example marathon job, that use MESOS_HTTP checks like:
> {code}
> {
>  "id": "python-example-stable",
>  "cmd": "python3 -m http.server 8080",
>  "mem": 16,
>  "cpus": 0.1,
>  "instances": 2,
>  "container": {
>"type": "DOCKER",
>"docker": {
>  "image": "python:alpine",
>  "network": "BRIDGE",
>  "portMappings": [
> { "containerPort": 8080, "hostPort": 0, "protocol": "tcp" }
>  ]
>}
>  },
>  "env": {
>"SERVICE_NAME" : "python"
>  },
>  "healthChecks": [
>{
>  "path": "/",
>  "portIndex": 0,
>  "protocol": "MESOS_HTTP",
>  "gracePeriodSeconds": 30,
>  "intervalSeconds": 10,
>  "timeoutSeconds": 30,
>  "maxConsecutiveFailures": 3
>}
>  ]
> }
> {code}
> I see the errors like:
> {code}
> F0306 07:41:58.84429335 health_checker.cpp:94] Failed to enter the net 
> namespace of task (pid: '13527'): Pid 13527 does not exist
> *** Check failure stack trace: ***
> @ 0x7f51770b0c1d  google::LogMessage::Fail()
> @ 0x7f51770b29d0  google::LogMessage::SendToLog()
> @ 0x7f51770b0803  google::LogMessage::Flush()
> @ 0x7f51770b33f9  google::LogMessageFatal::~LogMessageFatal()
> @ 0x7f517647ce46  
> _ZNSt17_Function_handlerIFivEZN5mesos8internal6health14cloneWithSetnsERKSt8functionIS0_E6OptionIiERKSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaISG_EEEUlvE_E9_M_invokeERKSt9_Any_data
> @ 0x7f517647bf2b  mesos::internal::health::cloneWithSetns()
> @ 0x7f517648374b  std::_Function_handler<>::_M_invoke()
> @ 0x7f5177068167  process::internal::cloneChild()
> @ 0x7f5177065c32  process::subprocess()
> @ 0x7f5176481a9d  
> mesos::internal::health::HealthCheckerProcess::_httpHealthCheck()
> @ 0x7f51764831f7  
> mesos::internal::health::HealthCheckerProcess::_healthCheck()
> @ 0x7f517701f38c  process::ProcessBase::visit()
> @ 0x7f517702c8b3  process::ProcessManager::resume()
> @ 0x7f517702fb77  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
> @ 0x7f51754ddc80  (unknown)
> @ 0x7f5174cf06ba  start_thread
> @ 0x7f5174a2682d  (unknown)
> I0306 07:41:59.077986 9 health_checker.cpp:199] Ignoring failure as 
> health check still in grace period
> {code}
> Looks like option docker_mesos_image makes, that newly started mesos job is 
> not using "pid host" option same as mother container was started, but has his 
> own PID namespace (so it doesn't matter if mother container was started with 
> "pid host" or not it will never be able to find PID)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-7285) Implement a plugin to list container's on a given agent.

2017-04-06 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15960027#comment-15960027
 ] 

Deshi Xiao commented on MESOS-7285:
---

i dont' think this is need work on mesos cli.

https://github.com/vektorlab/mesos-cli

the alternative cli can match your request.

> Implement a plugin to list container's on a given agent.
> 
>
> Key: MESOS-7285
> URL: https://issues.apache.org/jira/browse/MESOS-7285
> Project: Mesos
>  Issue Type: Task
>Reporter: Avinash Sridharan
>Assignee: Armand Grillet
>
> We need the CLI to support a `list` command to enumerate the containers 
> running on a given agent. To achieve this we will need to implement a 
> container plugin that will be implement this list method.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-6589) Document DockerInfo.Parameter usage in the docker containerizer document

2017-04-05 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15958131#comment-15958131
 ] 

Deshi Xiao commented on MESOS-6589:
---

Parameter* parameter = dockerInfo.add_parameters();
parameter->set_key("pid");
parameter->set_value("host");



> Document DockerInfo.Parameter usage in the docker containerizer document 
> -
>
> Key: MESOS-6589
> URL: https://issues.apache.org/jira/browse/MESOS-6589
> Project: Mesos
>  Issue Type: Improvement
>  Components: docker, documentation
>Reporter: haosdent
>Assignee: haosdent
>Priority: Minor
>
> Some users would like to pass extra parameters when launch docker container 
> by Mesos. Apart from reading the mesos protobuf message, user are not aware 
> of how to do that in Mesos via reading documentation.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-7294) Cleanup Docker executor logging.

2017-04-05 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15958129#comment-15958129
 ] 

Deshi Xiao commented on MESOS-7294:
---

[~tillt] could you please highlight the file, let me clarify which section need 
enhance. thanks a lot.

> Cleanup Docker executor logging.
> 
>
> Key: MESOS-7294
> URL: https://issues.apache.org/jira/browse/MESOS-7294
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Till Toenshoff
>Priority: Minor
>  Labels: newbie, tech-debt
>
> The Docker executor currently uses a mixture of {{cout}}/{{cerr}} as well as 
> {{GLOG}} logging. 
> It appears to make little sense and we should go either one or the other way.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-7210) MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( pid namespace mismatch )

2017-04-05 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15958028#comment-15958028
 ] 

Deshi Xiao commented on MESOS-7210:
---

first testing:
https://gist.github.com/xiaods/c5a11e3ab51e89a9609edc2c477f7ea8


> MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( 
> pid namespace mismatch )
> ---
>
> Key: MESOS-7210
> URL: https://issues.apache.org/jira/browse/MESOS-7210
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 1.1.0, 1.1.1, 1.2.0
> Environment: Ubuntu 16.04.02
> Docker version 1.13.1
> mesos 1.1.0, runs from container
> docker containers  spawned by marathon 1.4.1
>Reporter: Wojciech Sielski
>Assignee: haosdent
>Priority: Critical
>
> When running mesos-slave with option "docker_mesos_image" like:
> {code}
> --master=zk://standalone:2181/mesos  --containerizers=docker,mesos  
> --executor_registration_timeout=5mins  --hostname=standalone  --ip=0.0.0.0  
> --docker_stop_timeout=5secs  --gc_delay=1days  
> --docker_socket=/var/run/docker.sock  --no-systemd_enable_support  
> --work_dir=/tmp/mesos  --docker_mesos_image=panteras/paas-in-a-box:0.4.0
> {code}
> from the container that was started with option "pid: host" like:
> {code}
>   net:host
>   privileged: true
>   pid:host
> {code}
> and example marathon job, that use MESOS_HTTP checks like:
> {code}
> {
>  "id": "python-example-stable",
>  "cmd": "python3 -m http.server 8080",
>  "mem": 16,
>  "cpus": 0.1,
>  "instances": 2,
>  "container": {
>"type": "DOCKER",
>"docker": {
>  "image": "python:alpine",
>  "network": "BRIDGE",
>  "portMappings": [
> { "containerPort": 8080, "hostPort": 0, "protocol": "tcp" }
>  ]
>}
>  },
>  "env": {
>"SERVICE_NAME" : "python"
>  },
>  "healthChecks": [
>{
>  "path": "/",
>  "portIndex": 0,
>  "protocol": "MESOS_HTTP",
>  "gracePeriodSeconds": 30,
>  "intervalSeconds": 10,
>  "timeoutSeconds": 30,
>  "maxConsecutiveFailures": 3
>}
>  ]
> }
> {code}
> I see the errors like:
> {code}
> F0306 07:41:58.84429335 health_checker.cpp:94] Failed to enter the net 
> namespace of task (pid: '13527'): Pid 13527 does not exist
> *** Check failure stack trace: ***
> @ 0x7f51770b0c1d  google::LogMessage::Fail()
> @ 0x7f51770b29d0  google::LogMessage::SendToLog()
> @ 0x7f51770b0803  google::LogMessage::Flush()
> @ 0x7f51770b33f9  google::LogMessageFatal::~LogMessageFatal()
> @ 0x7f517647ce46  
> _ZNSt17_Function_handlerIFivEZN5mesos8internal6health14cloneWithSetnsERKSt8functionIS0_E6OptionIiERKSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaISG_EEEUlvE_E9_M_invokeERKSt9_Any_data
> @ 0x7f517647bf2b  mesos::internal::health::cloneWithSetns()
> @ 0x7f517648374b  std::_Function_handler<>::_M_invoke()
> @ 0x7f5177068167  process::internal::cloneChild()
> @ 0x7f5177065c32  process::subprocess()
> @ 0x7f5176481a9d  
> mesos::internal::health::HealthCheckerProcess::_httpHealthCheck()
> @ 0x7f51764831f7  
> mesos::internal::health::HealthCheckerProcess::_healthCheck()
> @ 0x7f517701f38c  process::ProcessBase::visit()
> @ 0x7f517702c8b3  process::ProcessManager::resume()
> @ 0x7f517702fb77  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
> @ 0x7f51754ddc80  (unknown)
> @ 0x7f5174cf06ba  start_thread
> @ 0x7f5174a2682d  (unknown)
> I0306 07:41:59.077986 9 health_checker.cpp:199] Ignoring failure as 
> health check still in grace period
> {code}
> Looks like option docker_mesos_image makes, that newly started mesos job is 
> not using "pid host" option same as mother container was started, but has his 
> own PID namespace (so it doesn't matter if mother container was started with 
> "pid host" or not it will never be able to find PID)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-7210) MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( pid namespace mismatch )

2017-04-05 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15956401#comment-15956401
 ] 

Deshi Xiao commented on MESOS-7210:
---

patch: https://reviews.apache.org/r/58200/

let me testing it asap.

> MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( 
> pid namespace mismatch )
> ---
>
> Key: MESOS-7210
> URL: https://issues.apache.org/jira/browse/MESOS-7210
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 1.1.0, 1.1.1, 1.2.0
> Environment: Ubuntu 16.04.02
> Docker version 1.13.1
> mesos 1.1.0, runs from container
> docker containers  spawned by marathon 1.4.1
>Reporter: Wojciech Sielski
>Assignee: haosdent
>Priority: Critical
>
> When running mesos-slave with option "docker_mesos_image" like:
> {code}
> --master=zk://standalone:2181/mesos  --containerizers=docker,mesos  
> --executor_registration_timeout=5mins  --hostname=standalone  --ip=0.0.0.0  
> --docker_stop_timeout=5secs  --gc_delay=1days  
> --docker_socket=/var/run/docker.sock  --no-systemd_enable_support  
> --work_dir=/tmp/mesos  --docker_mesos_image=panteras/paas-in-a-box:0.4.0
> {code}
> from the container that was started with option "pid: host" like:
> {code}
>   net:host
>   privileged: true
>   pid:host
> {code}
> and example marathon job, that use MESOS_HTTP checks like:
> {code}
> {
>  "id": "python-example-stable",
>  "cmd": "python3 -m http.server 8080",
>  "mem": 16,
>  "cpus": 0.1,
>  "instances": 2,
>  "container": {
>"type": "DOCKER",
>"docker": {
>  "image": "python:alpine",
>  "network": "BRIDGE",
>  "portMappings": [
> { "containerPort": 8080, "hostPort": 0, "protocol": "tcp" }
>  ]
>}
>  },
>  "env": {
>"SERVICE_NAME" : "python"
>  },
>  "healthChecks": [
>{
>  "path": "/",
>  "portIndex": 0,
>  "protocol": "MESOS_HTTP",
>  "gracePeriodSeconds": 30,
>  "intervalSeconds": 10,
>  "timeoutSeconds": 30,
>  "maxConsecutiveFailures": 3
>}
>  ]
> }
> {code}
> I see the errors like:
> {code}
> F0306 07:41:58.84429335 health_checker.cpp:94] Failed to enter the net 
> namespace of task (pid: '13527'): Pid 13527 does not exist
> *** Check failure stack trace: ***
> @ 0x7f51770b0c1d  google::LogMessage::Fail()
> @ 0x7f51770b29d0  google::LogMessage::SendToLog()
> @ 0x7f51770b0803  google::LogMessage::Flush()
> @ 0x7f51770b33f9  google::LogMessageFatal::~LogMessageFatal()
> @ 0x7f517647ce46  
> _ZNSt17_Function_handlerIFivEZN5mesos8internal6health14cloneWithSetnsERKSt8functionIS0_E6OptionIiERKSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaISG_EEEUlvE_E9_M_invokeERKSt9_Any_data
> @ 0x7f517647bf2b  mesos::internal::health::cloneWithSetns()
> @ 0x7f517648374b  std::_Function_handler<>::_M_invoke()
> @ 0x7f5177068167  process::internal::cloneChild()
> @ 0x7f5177065c32  process::subprocess()
> @ 0x7f5176481a9d  
> mesos::internal::health::HealthCheckerProcess::_httpHealthCheck()
> @ 0x7f51764831f7  
> mesos::internal::health::HealthCheckerProcess::_healthCheck()
> @ 0x7f517701f38c  process::ProcessBase::visit()
> @ 0x7f517702c8b3  process::ProcessManager::resume()
> @ 0x7f517702fb77  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
> @ 0x7f51754ddc80  (unknown)
> @ 0x7f5174cf06ba  start_thread
> @ 0x7f5174a2682d  (unknown)
> I0306 07:41:59.077986 9 health_checker.cpp:199] Ignoring failure as 
> health check still in grace period
> {code}
> Looks like option docker_mesos_image makes, that newly started mesos job is 
> not using "pid host" option same as mother container was started, but has his 
> own PID namespace (so it doesn't matter if mother container was started with 
> "pid host" or not it will never be able to find PID)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-7210) MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( pid namespace mismatch )

2017-04-05 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15956356#comment-15956356
 ] 

Deshi Xiao commented on MESOS-7210:
---

thanks [~haosd...@gmail.com] it works.

> MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( 
> pid namespace mismatch )
> ---
>
> Key: MESOS-7210
> URL: https://issues.apache.org/jira/browse/MESOS-7210
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 1.1.0, 1.1.1, 1.2.0
> Environment: Ubuntu 16.04.02
> Docker version 1.13.1
> mesos 1.1.0, runs from container
> docker containers  spawned by marathon 1.4.1
>Reporter: Wojciech Sielski
>Assignee: haosdent
>Priority: Critical
>
> When running mesos-slave with option "docker_mesos_image" like:
> {code}
> --master=zk://standalone:2181/mesos  --containerizers=docker,mesos  
> --executor_registration_timeout=5mins  --hostname=standalone  --ip=0.0.0.0  
> --docker_stop_timeout=5secs  --gc_delay=1days  
> --docker_socket=/var/run/docker.sock  --no-systemd_enable_support  
> --work_dir=/tmp/mesos  --docker_mesos_image=panteras/paas-in-a-box:0.4.0
> {code}
> from the container that was started with option "pid: host" like:
> {code}
>   net:host
>   privileged: true
>   pid:host
> {code}
> and example marathon job, that use MESOS_HTTP checks like:
> {code}
> {
>  "id": "python-example-stable",
>  "cmd": "python3 -m http.server 8080",
>  "mem": 16,
>  "cpus": 0.1,
>  "instances": 2,
>  "container": {
>"type": "DOCKER",
>"docker": {
>  "image": "python:alpine",
>  "network": "BRIDGE",
>  "portMappings": [
> { "containerPort": 8080, "hostPort": 0, "protocol": "tcp" }
>  ]
>}
>  },
>  "env": {
>"SERVICE_NAME" : "python"
>  },
>  "healthChecks": [
>{
>  "path": "/",
>  "portIndex": 0,
>  "protocol": "MESOS_HTTP",
>  "gracePeriodSeconds": 30,
>  "intervalSeconds": 10,
>  "timeoutSeconds": 30,
>  "maxConsecutiveFailures": 3
>}
>  ]
> }
> {code}
> I see the errors like:
> {code}
> F0306 07:41:58.84429335 health_checker.cpp:94] Failed to enter the net 
> namespace of task (pid: '13527'): Pid 13527 does not exist
> *** Check failure stack trace: ***
> @ 0x7f51770b0c1d  google::LogMessage::Fail()
> @ 0x7f51770b29d0  google::LogMessage::SendToLog()
> @ 0x7f51770b0803  google::LogMessage::Flush()
> @ 0x7f51770b33f9  google::LogMessageFatal::~LogMessageFatal()
> @ 0x7f517647ce46  
> _ZNSt17_Function_handlerIFivEZN5mesos8internal6health14cloneWithSetnsERKSt8functionIS0_E6OptionIiERKSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaISG_EEEUlvE_E9_M_invokeERKSt9_Any_data
> @ 0x7f517647bf2b  mesos::internal::health::cloneWithSetns()
> @ 0x7f517648374b  std::_Function_handler<>::_M_invoke()
> @ 0x7f5177068167  process::internal::cloneChild()
> @ 0x7f5177065c32  process::subprocess()
> @ 0x7f5176481a9d  
> mesos::internal::health::HealthCheckerProcess::_httpHealthCheck()
> @ 0x7f51764831f7  
> mesos::internal::health::HealthCheckerProcess::_healthCheck()
> @ 0x7f517701f38c  process::ProcessBase::visit()
> @ 0x7f517702c8b3  process::ProcessManager::resume()
> @ 0x7f517702fb77  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
> @ 0x7f51754ddc80  (unknown)
> @ 0x7f5174cf06ba  start_thread
> @ 0x7f5174a2682d  (unknown)
> I0306 07:41:59.077986 9 health_checker.cpp:199] Ignoring failure as 
> health check still in grace period
> {code}
> Looks like option docker_mesos_image makes, that newly started mesos job is 
> not using "pid host" option same as mother container was started, but has his 
> own PID namespace (so it doesn't matter if mother container was started with 
> "pid host" or not it will never be able to find PID)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-7210) MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( pid namespace mismatch )

2017-04-04 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15955755#comment-15955755
 ] 

Deshi Xiao commented on MESOS-7210:
---

found https://issues.apache.org/jira/browse/MESOS-6589 , but not found any 
reference to this docker parameter usage. 

> MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( 
> pid namespace mismatch )
> ---
>
> Key: MESOS-7210
> URL: https://issues.apache.org/jira/browse/MESOS-7210
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 1.1.0, 1.1.1, 1.2.0
> Environment: Ubuntu 16.04.02
> Docker version 1.13.1
> mesos 1.1.0, runs from container
> docker containers  spawned by marathon 1.4.1
>Reporter: Wojciech Sielski
>Assignee: haosdent
>Priority: Critical
>
> When running mesos-slave with option "docker_mesos_image" like:
> {code}
> --master=zk://standalone:2181/mesos  --containerizers=docker,mesos  
> --executor_registration_timeout=5mins  --hostname=standalone  --ip=0.0.0.0  
> --docker_stop_timeout=5secs  --gc_delay=1days  
> --docker_socket=/var/run/docker.sock  --no-systemd_enable_support  
> --work_dir=/tmp/mesos  --docker_mesos_image=panteras/paas-in-a-box:0.4.0
> {code}
> from the container that was started with option "pid: host" like:
> {code}
>   net:host
>   privileged: true
>   pid:host
> {code}
> and example marathon job, that use MESOS_HTTP checks like:
> {code}
> {
>  "id": "python-example-stable",
>  "cmd": "python3 -m http.server 8080",
>  "mem": 16,
>  "cpus": 0.1,
>  "instances": 2,
>  "container": {
>"type": "DOCKER",
>"docker": {
>  "image": "python:alpine",
>  "network": "BRIDGE",
>  "portMappings": [
> { "containerPort": 8080, "hostPort": 0, "protocol": "tcp" }
>  ]
>}
>  },
>  "env": {
>"SERVICE_NAME" : "python"
>  },
>  "healthChecks": [
>{
>  "path": "/",
>  "portIndex": 0,
>  "protocol": "MESOS_HTTP",
>  "gracePeriodSeconds": 30,
>  "intervalSeconds": 10,
>  "timeoutSeconds": 30,
>  "maxConsecutiveFailures": 3
>}
>  ]
> }
> {code}
> I see the errors like:
> {code}
> F0306 07:41:58.84429335 health_checker.cpp:94] Failed to enter the net 
> namespace of task (pid: '13527'): Pid 13527 does not exist
> *** Check failure stack trace: ***
> @ 0x7f51770b0c1d  google::LogMessage::Fail()
> @ 0x7f51770b29d0  google::LogMessage::SendToLog()
> @ 0x7f51770b0803  google::LogMessage::Flush()
> @ 0x7f51770b33f9  google::LogMessageFatal::~LogMessageFatal()
> @ 0x7f517647ce46  
> _ZNSt17_Function_handlerIFivEZN5mesos8internal6health14cloneWithSetnsERKSt8functionIS0_E6OptionIiERKSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaISG_EEEUlvE_E9_M_invokeERKSt9_Any_data
> @ 0x7f517647bf2b  mesos::internal::health::cloneWithSetns()
> @ 0x7f517648374b  std::_Function_handler<>::_M_invoke()
> @ 0x7f5177068167  process::internal::cloneChild()
> @ 0x7f5177065c32  process::subprocess()
> @ 0x7f5176481a9d  
> mesos::internal::health::HealthCheckerProcess::_httpHealthCheck()
> @ 0x7f51764831f7  
> mesos::internal::health::HealthCheckerProcess::_healthCheck()
> @ 0x7f517701f38c  process::ProcessBase::visit()
> @ 0x7f517702c8b3  process::ProcessManager::resume()
> @ 0x7f517702fb77  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
> @ 0x7f51754ddc80  (unknown)
> @ 0x7f5174cf06ba  start_thread
> @ 0x7f5174a2682d  (unknown)
> I0306 07:41:59.077986 9 health_checker.cpp:199] Ignoring failure as 
> health check still in grace period
> {code}
> Looks like option docker_mesos_image makes, that newly started mesos job is 
> not using "pid host" option same as mother container was started, but has his 
> own PID namespace (so it doesn't matter if mother container was started with 
> "pid host" or not it will never be able to find PID)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Issue Comment Deleted] (MESOS-7210) MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( pid namespace mismatch )

2017-04-04 Thread Deshi Xiao (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deshi Xiao updated MESOS-7210:
--
Comment: was deleted

(was: try 
dockerInfo.parameters().push_back("--pid=host");

does it correct?)

> MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( 
> pid namespace mismatch )
> ---
>
> Key: MESOS-7210
> URL: https://issues.apache.org/jira/browse/MESOS-7210
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 1.1.0, 1.1.1, 1.2.0
> Environment: Ubuntu 16.04.02
> Docker version 1.13.1
> mesos 1.1.0, runs from container
> docker containers  spawned by marathon 1.4.1
>Reporter: Wojciech Sielski
>Assignee: haosdent
>Priority: Critical
>
> When running mesos-slave with option "docker_mesos_image" like:
> {code}
> --master=zk://standalone:2181/mesos  --containerizers=docker,mesos  
> --executor_registration_timeout=5mins  --hostname=standalone  --ip=0.0.0.0  
> --docker_stop_timeout=5secs  --gc_delay=1days  
> --docker_socket=/var/run/docker.sock  --no-systemd_enable_support  
> --work_dir=/tmp/mesos  --docker_mesos_image=panteras/paas-in-a-box:0.4.0
> {code}
> from the container that was started with option "pid: host" like:
> {code}
>   net:host
>   privileged: true
>   pid:host
> {code}
> and example marathon job, that use MESOS_HTTP checks like:
> {code}
> {
>  "id": "python-example-stable",
>  "cmd": "python3 -m http.server 8080",
>  "mem": 16,
>  "cpus": 0.1,
>  "instances": 2,
>  "container": {
>"type": "DOCKER",
>"docker": {
>  "image": "python:alpine",
>  "network": "BRIDGE",
>  "portMappings": [
> { "containerPort": 8080, "hostPort": 0, "protocol": "tcp" }
>  ]
>}
>  },
>  "env": {
>"SERVICE_NAME" : "python"
>  },
>  "healthChecks": [
>{
>  "path": "/",
>  "portIndex": 0,
>  "protocol": "MESOS_HTTP",
>  "gracePeriodSeconds": 30,
>  "intervalSeconds": 10,
>  "timeoutSeconds": 30,
>  "maxConsecutiveFailures": 3
>}
>  ]
> }
> {code}
> I see the errors like:
> {code}
> F0306 07:41:58.84429335 health_checker.cpp:94] Failed to enter the net 
> namespace of task (pid: '13527'): Pid 13527 does not exist
> *** Check failure stack trace: ***
> @ 0x7f51770b0c1d  google::LogMessage::Fail()
> @ 0x7f51770b29d0  google::LogMessage::SendToLog()
> @ 0x7f51770b0803  google::LogMessage::Flush()
> @ 0x7f51770b33f9  google::LogMessageFatal::~LogMessageFatal()
> @ 0x7f517647ce46  
> _ZNSt17_Function_handlerIFivEZN5mesos8internal6health14cloneWithSetnsERKSt8functionIS0_E6OptionIiERKSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaISG_EEEUlvE_E9_M_invokeERKSt9_Any_data
> @ 0x7f517647bf2b  mesos::internal::health::cloneWithSetns()
> @ 0x7f517648374b  std::_Function_handler<>::_M_invoke()
> @ 0x7f5177068167  process::internal::cloneChild()
> @ 0x7f5177065c32  process::subprocess()
> @ 0x7f5176481a9d  
> mesos::internal::health::HealthCheckerProcess::_httpHealthCheck()
> @ 0x7f51764831f7  
> mesos::internal::health::HealthCheckerProcess::_healthCheck()
> @ 0x7f517701f38c  process::ProcessBase::visit()
> @ 0x7f517702c8b3  process::ProcessManager::resume()
> @ 0x7f517702fb77  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
> @ 0x7f51754ddc80  (unknown)
> @ 0x7f5174cf06ba  start_thread
> @ 0x7f5174a2682d  (unknown)
> I0306 07:41:59.077986 9 health_checker.cpp:199] Ignoring failure as 
> health check still in grace period
> {code}
> Looks like option docker_mesos_image makes, that newly started mesos job is 
> not using "pid host" option same as mother container was started, but has his 
> own PID namespace (so it doesn't matter if mother container was started with 
> "pid host" or not it will never be able to find PID)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (MESOS-7210) MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( pid namespace mismatch )

2017-04-04 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15955733#comment-15955733
 ] 

Deshi Xiao edited comment on MESOS-7210 at 4/4/17 8:24 PM:
---

try 
dockerInfo.parameters().push_back("--pid=host");

does it correct?


was (Author: xds2000):
try 
dockerInfo.parameters.push_back("--pid=host");

does it correct?

> MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( 
> pid namespace mismatch )
> ---
>
> Key: MESOS-7210
> URL: https://issues.apache.org/jira/browse/MESOS-7210
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 1.1.0, 1.1.1, 1.2.0
> Environment: Ubuntu 16.04.02
> Docker version 1.13.1
> mesos 1.1.0, runs from container
> docker containers  spawned by marathon 1.4.1
>Reporter: Wojciech Sielski
>Assignee: haosdent
>Priority: Critical
>
> When running mesos-slave with option "docker_mesos_image" like:
> {code}
> --master=zk://standalone:2181/mesos  --containerizers=docker,mesos  
> --executor_registration_timeout=5mins  --hostname=standalone  --ip=0.0.0.0  
> --docker_stop_timeout=5secs  --gc_delay=1days  
> --docker_socket=/var/run/docker.sock  --no-systemd_enable_support  
> --work_dir=/tmp/mesos  --docker_mesos_image=panteras/paas-in-a-box:0.4.0
> {code}
> from the container that was started with option "pid: host" like:
> {code}
>   net:host
>   privileged: true
>   pid:host
> {code}
> and example marathon job, that use MESOS_HTTP checks like:
> {code}
> {
>  "id": "python-example-stable",
>  "cmd": "python3 -m http.server 8080",
>  "mem": 16,
>  "cpus": 0.1,
>  "instances": 2,
>  "container": {
>"type": "DOCKER",
>"docker": {
>  "image": "python:alpine",
>  "network": "BRIDGE",
>  "portMappings": [
> { "containerPort": 8080, "hostPort": 0, "protocol": "tcp" }
>  ]
>}
>  },
>  "env": {
>"SERVICE_NAME" : "python"
>  },
>  "healthChecks": [
>{
>  "path": "/",
>  "portIndex": 0,
>  "protocol": "MESOS_HTTP",
>  "gracePeriodSeconds": 30,
>  "intervalSeconds": 10,
>  "timeoutSeconds": 30,
>  "maxConsecutiveFailures": 3
>}
>  ]
> }
> {code}
> I see the errors like:
> {code}
> F0306 07:41:58.84429335 health_checker.cpp:94] Failed to enter the net 
> namespace of task (pid: '13527'): Pid 13527 does not exist
> *** Check failure stack trace: ***
> @ 0x7f51770b0c1d  google::LogMessage::Fail()
> @ 0x7f51770b29d0  google::LogMessage::SendToLog()
> @ 0x7f51770b0803  google::LogMessage::Flush()
> @ 0x7f51770b33f9  google::LogMessageFatal::~LogMessageFatal()
> @ 0x7f517647ce46  
> _ZNSt17_Function_handlerIFivEZN5mesos8internal6health14cloneWithSetnsERKSt8functionIS0_E6OptionIiERKSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaISG_EEEUlvE_E9_M_invokeERKSt9_Any_data
> @ 0x7f517647bf2b  mesos::internal::health::cloneWithSetns()
> @ 0x7f517648374b  std::_Function_handler<>::_M_invoke()
> @ 0x7f5177068167  process::internal::cloneChild()
> @ 0x7f5177065c32  process::subprocess()
> @ 0x7f5176481a9d  
> mesos::internal::health::HealthCheckerProcess::_httpHealthCheck()
> @ 0x7f51764831f7  
> mesos::internal::health::HealthCheckerProcess::_healthCheck()
> @ 0x7f517701f38c  process::ProcessBase::visit()
> @ 0x7f517702c8b3  process::ProcessManager::resume()
> @ 0x7f517702fb77  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
> @ 0x7f51754ddc80  (unknown)
> @ 0x7f5174cf06ba  start_thread
> @ 0x7f5174a2682d  (unknown)
> I0306 07:41:59.077986 9 health_checker.cpp:199] Ignoring failure as 
> health check still in grace period
> {code}
> Looks like option docker_mesos_image makes, that newly started mesos job is 
> not using "pid host" option same as mother container was started, but has his 
> own PID namespace (so it doesn't matter if mother container was started with 
> "pid host" or not it will never be able to find PID)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-7210) MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( pid namespace mismatch )

2017-04-04 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15955733#comment-15955733
 ] 

Deshi Xiao commented on MESOS-7210:
---

try 
dockerInfo.parameters.push_back("--pid=host");

does it correct?

> MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( 
> pid namespace mismatch )
> ---
>
> Key: MESOS-7210
> URL: https://issues.apache.org/jira/browse/MESOS-7210
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 1.1.0, 1.1.1, 1.2.0
> Environment: Ubuntu 16.04.02
> Docker version 1.13.1
> mesos 1.1.0, runs from container
> docker containers  spawned by marathon 1.4.1
>Reporter: Wojciech Sielski
>Assignee: haosdent
>Priority: Critical
>
> When running mesos-slave with option "docker_mesos_image" like:
> {code}
> --master=zk://standalone:2181/mesos  --containerizers=docker,mesos  
> --executor_registration_timeout=5mins  --hostname=standalone  --ip=0.0.0.0  
> --docker_stop_timeout=5secs  --gc_delay=1days  
> --docker_socket=/var/run/docker.sock  --no-systemd_enable_support  
> --work_dir=/tmp/mesos  --docker_mesos_image=panteras/paas-in-a-box:0.4.0
> {code}
> from the container that was started with option "pid: host" like:
> {code}
>   net:host
>   privileged: true
>   pid:host
> {code}
> and example marathon job, that use MESOS_HTTP checks like:
> {code}
> {
>  "id": "python-example-stable",
>  "cmd": "python3 -m http.server 8080",
>  "mem": 16,
>  "cpus": 0.1,
>  "instances": 2,
>  "container": {
>"type": "DOCKER",
>"docker": {
>  "image": "python:alpine",
>  "network": "BRIDGE",
>  "portMappings": [
> { "containerPort": 8080, "hostPort": 0, "protocol": "tcp" }
>  ]
>}
>  },
>  "env": {
>"SERVICE_NAME" : "python"
>  },
>  "healthChecks": [
>{
>  "path": "/",
>  "portIndex": 0,
>  "protocol": "MESOS_HTTP",
>  "gracePeriodSeconds": 30,
>  "intervalSeconds": 10,
>  "timeoutSeconds": 30,
>  "maxConsecutiveFailures": 3
>}
>  ]
> }
> {code}
> I see the errors like:
> {code}
> F0306 07:41:58.84429335 health_checker.cpp:94] Failed to enter the net 
> namespace of task (pid: '13527'): Pid 13527 does not exist
> *** Check failure stack trace: ***
> @ 0x7f51770b0c1d  google::LogMessage::Fail()
> @ 0x7f51770b29d0  google::LogMessage::SendToLog()
> @ 0x7f51770b0803  google::LogMessage::Flush()
> @ 0x7f51770b33f9  google::LogMessageFatal::~LogMessageFatal()
> @ 0x7f517647ce46  
> _ZNSt17_Function_handlerIFivEZN5mesos8internal6health14cloneWithSetnsERKSt8functionIS0_E6OptionIiERKSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaISG_EEEUlvE_E9_M_invokeERKSt9_Any_data
> @ 0x7f517647bf2b  mesos::internal::health::cloneWithSetns()
> @ 0x7f517648374b  std::_Function_handler<>::_M_invoke()
> @ 0x7f5177068167  process::internal::cloneChild()
> @ 0x7f5177065c32  process::subprocess()
> @ 0x7f5176481a9d  
> mesos::internal::health::HealthCheckerProcess::_httpHealthCheck()
> @ 0x7f51764831f7  
> mesos::internal::health::HealthCheckerProcess::_healthCheck()
> @ 0x7f517701f38c  process::ProcessBase::visit()
> @ 0x7f517702c8b3  process::ProcessManager::resume()
> @ 0x7f517702fb77  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
> @ 0x7f51754ddc80  (unknown)
> @ 0x7f5174cf06ba  start_thread
> @ 0x7f5174a2682d  (unknown)
> I0306 07:41:59.077986 9 health_checker.cpp:199] Ignoring failure as 
> health check still in grace period
> {code}
> Looks like option docker_mesos_image makes, that newly started mesos job is 
> not using "pid host" option same as mother container was started, but has his 
> own PID namespace (so it doesn't matter if mother container was started with 
> "pid host" or not it will never be able to find PID)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (MESOS-7210) MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( pid namespace mismatch )

2017-04-04 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15955725#comment-15955725
 ] 

Deshi Xiao edited comment on MESOS-7210 at 4/4/17 8:12 PM:
---

Hi [~alexr] [~haosd...@gmail.com]

i can't found any useful reference for HOWTO use docker parameters fields in 
mesos protobuf

```
ContainerInfo::DockerInfo dockerInfo;
dockerInfo.set_image(flags.docker_mesos_image.get());
```
i need a example to add --pid=host to this dockerInfo.parameters. could you 
please give a help. thanks a lot.


was (Author: xds2000):
Hi [~alexr] [~haosd...@gmail.com]

i can't found any useful reference for HOWTO use docker parameters fields in 
mesos protobuf

```
ContainerInfo::DockerInfo dockerInfo;
dockerInfo.set_image(flags.docker_mesos_image.get());
```
i need add --pid=host to this dockerInfo.parameters. could you please give a 
help. thanks a lot.

> MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( 
> pid namespace mismatch )
> ---
>
> Key: MESOS-7210
> URL: https://issues.apache.org/jira/browse/MESOS-7210
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 1.1.0, 1.1.1, 1.2.0
> Environment: Ubuntu 16.04.02
> Docker version 1.13.1
> mesos 1.1.0, runs from container
> docker containers  spawned by marathon 1.4.1
>Reporter: Wojciech Sielski
>Assignee: haosdent
>Priority: Critical
>
> When running mesos-slave with option "docker_mesos_image" like:
> {code}
> --master=zk://standalone:2181/mesos  --containerizers=docker,mesos  
> --executor_registration_timeout=5mins  --hostname=standalone  --ip=0.0.0.0  
> --docker_stop_timeout=5secs  --gc_delay=1days  
> --docker_socket=/var/run/docker.sock  --no-systemd_enable_support  
> --work_dir=/tmp/mesos  --docker_mesos_image=panteras/paas-in-a-box:0.4.0
> {code}
> from the container that was started with option "pid: host" like:
> {code}
>   net:host
>   privileged: true
>   pid:host
> {code}
> and example marathon job, that use MESOS_HTTP checks like:
> {code}
> {
>  "id": "python-example-stable",
>  "cmd": "python3 -m http.server 8080",
>  "mem": 16,
>  "cpus": 0.1,
>  "instances": 2,
>  "container": {
>"type": "DOCKER",
>"docker": {
>  "image": "python:alpine",
>  "network": "BRIDGE",
>  "portMappings": [
> { "containerPort": 8080, "hostPort": 0, "protocol": "tcp" }
>  ]
>}
>  },
>  "env": {
>"SERVICE_NAME" : "python"
>  },
>  "healthChecks": [
>{
>  "path": "/",
>  "portIndex": 0,
>  "protocol": "MESOS_HTTP",
>  "gracePeriodSeconds": 30,
>  "intervalSeconds": 10,
>  "timeoutSeconds": 30,
>  "maxConsecutiveFailures": 3
>}
>  ]
> }
> {code}
> I see the errors like:
> {code}
> F0306 07:41:58.84429335 health_checker.cpp:94] Failed to enter the net 
> namespace of task (pid: '13527'): Pid 13527 does not exist
> *** Check failure stack trace: ***
> @ 0x7f51770b0c1d  google::LogMessage::Fail()
> @ 0x7f51770b29d0  google::LogMessage::SendToLog()
> @ 0x7f51770b0803  google::LogMessage::Flush()
> @ 0x7f51770b33f9  google::LogMessageFatal::~LogMessageFatal()
> @ 0x7f517647ce46  
> _ZNSt17_Function_handlerIFivEZN5mesos8internal6health14cloneWithSetnsERKSt8functionIS0_E6OptionIiERKSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaISG_EEEUlvE_E9_M_invokeERKSt9_Any_data
> @ 0x7f517647bf2b  mesos::internal::health::cloneWithSetns()
> @ 0x7f517648374b  std::_Function_handler<>::_M_invoke()
> @ 0x7f5177068167  process::internal::cloneChild()
> @ 0x7f5177065c32  process::subprocess()
> @ 0x7f5176481a9d  
> mesos::internal::health::HealthCheckerProcess::_httpHealthCheck()
> @ 0x7f51764831f7  
> mesos::internal::health::HealthCheckerProcess::_healthCheck()
> @ 0x7f517701f38c  process::ProcessBase::visit()
> @ 0x7f517702c8b3  process::ProcessManager::resume()
> @ 0x7f517702fb77  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
> @ 0x7f51754ddc80  (unknown)
> @ 0x7f5174cf06ba  start_thread
> @ 0x7f5174a2682d  (unknown)
> I0306 07:41:59.077986 9 health_checker.cpp:199] Ignoring failure as 
> health check still in grace period
> {code}
> Looks like option docker_mesos_image makes, that newly started mesos job is 
> not using "pid host" option same as mother container was started, but has his 
> own PID namespace (so it doesn't matter if mother container was started with 
> "pid host" or not it will never be able to find PID)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-7210) MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( pid namespace mismatch )

2017-04-04 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15955725#comment-15955725
 ] 

Deshi Xiao commented on MESOS-7210:
---

Hi [~alexr] [~haosd...@gmail.com]

i can't found any useful reference for HOWTO use docker parameters fields in 
mesos protobuf

```
ContainerInfo::DockerInfo dockerInfo;
dockerInfo.set_image(flags.docker_mesos_image.get());
```
i need add --pid=host to this dockerInfo.parameters. could you please give a 
help. thanks a lot.

> MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( 
> pid namespace mismatch )
> ---
>
> Key: MESOS-7210
> URL: https://issues.apache.org/jira/browse/MESOS-7210
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 1.1.0, 1.1.1, 1.2.0
> Environment: Ubuntu 16.04.02
> Docker version 1.13.1
> mesos 1.1.0, runs from container
> docker containers  spawned by marathon 1.4.1
>Reporter: Wojciech Sielski
>Assignee: haosdent
>Priority: Critical
>
> When running mesos-slave with option "docker_mesos_image" like:
> {code}
> --master=zk://standalone:2181/mesos  --containerizers=docker,mesos  
> --executor_registration_timeout=5mins  --hostname=standalone  --ip=0.0.0.0  
> --docker_stop_timeout=5secs  --gc_delay=1days  
> --docker_socket=/var/run/docker.sock  --no-systemd_enable_support  
> --work_dir=/tmp/mesos  --docker_mesos_image=panteras/paas-in-a-box:0.4.0
> {code}
> from the container that was started with option "pid: host" like:
> {code}
>   net:host
>   privileged: true
>   pid:host
> {code}
> and example marathon job, that use MESOS_HTTP checks like:
> {code}
> {
>  "id": "python-example-stable",
>  "cmd": "python3 -m http.server 8080",
>  "mem": 16,
>  "cpus": 0.1,
>  "instances": 2,
>  "container": {
>"type": "DOCKER",
>"docker": {
>  "image": "python:alpine",
>  "network": "BRIDGE",
>  "portMappings": [
> { "containerPort": 8080, "hostPort": 0, "protocol": "tcp" }
>  ]
>}
>  },
>  "env": {
>"SERVICE_NAME" : "python"
>  },
>  "healthChecks": [
>{
>  "path": "/",
>  "portIndex": 0,
>  "protocol": "MESOS_HTTP",
>  "gracePeriodSeconds": 30,
>  "intervalSeconds": 10,
>  "timeoutSeconds": 30,
>  "maxConsecutiveFailures": 3
>}
>  ]
> }
> {code}
> I see the errors like:
> {code}
> F0306 07:41:58.84429335 health_checker.cpp:94] Failed to enter the net 
> namespace of task (pid: '13527'): Pid 13527 does not exist
> *** Check failure stack trace: ***
> @ 0x7f51770b0c1d  google::LogMessage::Fail()
> @ 0x7f51770b29d0  google::LogMessage::SendToLog()
> @ 0x7f51770b0803  google::LogMessage::Flush()
> @ 0x7f51770b33f9  google::LogMessageFatal::~LogMessageFatal()
> @ 0x7f517647ce46  
> _ZNSt17_Function_handlerIFivEZN5mesos8internal6health14cloneWithSetnsERKSt8functionIS0_E6OptionIiERKSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaISG_EEEUlvE_E9_M_invokeERKSt9_Any_data
> @ 0x7f517647bf2b  mesos::internal::health::cloneWithSetns()
> @ 0x7f517648374b  std::_Function_handler<>::_M_invoke()
> @ 0x7f5177068167  process::internal::cloneChild()
> @ 0x7f5177065c32  process::subprocess()
> @ 0x7f5176481a9d  
> mesos::internal::health::HealthCheckerProcess::_httpHealthCheck()
> @ 0x7f51764831f7  
> mesos::internal::health::HealthCheckerProcess::_healthCheck()
> @ 0x7f517701f38c  process::ProcessBase::visit()
> @ 0x7f517702c8b3  process::ProcessManager::resume()
> @ 0x7f517702fb77  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
> @ 0x7f51754ddc80  (unknown)
> @ 0x7f5174cf06ba  start_thread
> @ 0x7f5174a2682d  (unknown)
> I0306 07:41:59.077986 9 health_checker.cpp:199] Ignoring failure as 
> health check still in grace period
> {code}
> Looks like option docker_mesos_image makes, that newly started mesos job is 
> not using "pid host" option same as mother container was started, but has his 
> own PID namespace (so it doesn't matter if mother container was started with 
> "pid host" or not it will never be able to find PID)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Issue Comment Deleted] (MESOS-7210) MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( pid namespace mismatch )

2017-04-04 Thread Deshi Xiao (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deshi Xiao updated MESOS-7210:
--
Comment: was deleted

(was: add below code to src/slave/containerizer/docker.cpp +359

```
dockerInfo.set_pid("host");
```)

> MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( 
> pid namespace mismatch )
> ---
>
> Key: MESOS-7210
> URL: https://issues.apache.org/jira/browse/MESOS-7210
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 1.1.0, 1.1.1, 1.2.0
> Environment: Ubuntu 16.04.02
> Docker version 1.13.1
> mesos 1.1.0, runs from container
> docker containers  spawned by marathon 1.4.1
>Reporter: Wojciech Sielski
>Assignee: haosdent
>Priority: Critical
>
> When running mesos-slave with option "docker_mesos_image" like:
> {code}
> --master=zk://standalone:2181/mesos  --containerizers=docker,mesos  
> --executor_registration_timeout=5mins  --hostname=standalone  --ip=0.0.0.0  
> --docker_stop_timeout=5secs  --gc_delay=1days  
> --docker_socket=/var/run/docker.sock  --no-systemd_enable_support  
> --work_dir=/tmp/mesos  --docker_mesos_image=panteras/paas-in-a-box:0.4.0
> {code}
> from the container that was started with option "pid: host" like:
> {code}
>   net:host
>   privileged: true
>   pid:host
> {code}
> and example marathon job, that use MESOS_HTTP checks like:
> {code}
> {
>  "id": "python-example-stable",
>  "cmd": "python3 -m http.server 8080",
>  "mem": 16,
>  "cpus": 0.1,
>  "instances": 2,
>  "container": {
>"type": "DOCKER",
>"docker": {
>  "image": "python:alpine",
>  "network": "BRIDGE",
>  "portMappings": [
> { "containerPort": 8080, "hostPort": 0, "protocol": "tcp" }
>  ]
>}
>  },
>  "env": {
>"SERVICE_NAME" : "python"
>  },
>  "healthChecks": [
>{
>  "path": "/",
>  "portIndex": 0,
>  "protocol": "MESOS_HTTP",
>  "gracePeriodSeconds": 30,
>  "intervalSeconds": 10,
>  "timeoutSeconds": 30,
>  "maxConsecutiveFailures": 3
>}
>  ]
> }
> {code}
> I see the errors like:
> {code}
> F0306 07:41:58.84429335 health_checker.cpp:94] Failed to enter the net 
> namespace of task (pid: '13527'): Pid 13527 does not exist
> *** Check failure stack trace: ***
> @ 0x7f51770b0c1d  google::LogMessage::Fail()
> @ 0x7f51770b29d0  google::LogMessage::SendToLog()
> @ 0x7f51770b0803  google::LogMessage::Flush()
> @ 0x7f51770b33f9  google::LogMessageFatal::~LogMessageFatal()
> @ 0x7f517647ce46  
> _ZNSt17_Function_handlerIFivEZN5mesos8internal6health14cloneWithSetnsERKSt8functionIS0_E6OptionIiERKSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaISG_EEEUlvE_E9_M_invokeERKSt9_Any_data
> @ 0x7f517647bf2b  mesos::internal::health::cloneWithSetns()
> @ 0x7f517648374b  std::_Function_handler<>::_M_invoke()
> @ 0x7f5177068167  process::internal::cloneChild()
> @ 0x7f5177065c32  process::subprocess()
> @ 0x7f5176481a9d  
> mesos::internal::health::HealthCheckerProcess::_httpHealthCheck()
> @ 0x7f51764831f7  
> mesos::internal::health::HealthCheckerProcess::_healthCheck()
> @ 0x7f517701f38c  process::ProcessBase::visit()
> @ 0x7f517702c8b3  process::ProcessManager::resume()
> @ 0x7f517702fb77  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
> @ 0x7f51754ddc80  (unknown)
> @ 0x7f5174cf06ba  start_thread
> @ 0x7f5174a2682d  (unknown)
> I0306 07:41:59.077986 9 health_checker.cpp:199] Ignoring failure as 
> health check still in grace period
> {code}
> Looks like option docker_mesos_image makes, that newly started mesos job is 
> not using "pid host" option same as mother container was started, but has his 
> own PID namespace (so it doesn't matter if mother container was started with 
> "pid host" or not it will never be able to find PID)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (MESOS-7210) MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( pid namespace mismatch )

2017-04-04 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15955308#comment-15955308
 ] 

Deshi Xiao edited comment on MESOS-7210 at 4/4/17 4:07 PM:
---

add below code to src/slave/containerizer/docker.cpp +359

```
dockerInfo.set_pid("host");
```


was (Author: xds2000):
add 

```
dockerInfo.set_pid("host");
```

> MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( 
> pid namespace mismatch )
> ---
>
> Key: MESOS-7210
> URL: https://issues.apache.org/jira/browse/MESOS-7210
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 1.1.0, 1.1.1, 1.2.0
> Environment: Ubuntu 16.04.02
> Docker version 1.13.1
> mesos 1.1.0, runs from container
> docker containers  spawned by marathon 1.4.1
>Reporter: Wojciech Sielski
>Assignee: haosdent
>Priority: Critical
>
> When running mesos-slave with option "docker_mesos_image" like:
> {code}
> --master=zk://standalone:2181/mesos  --containerizers=docker,mesos  
> --executor_registration_timeout=5mins  --hostname=standalone  --ip=0.0.0.0  
> --docker_stop_timeout=5secs  --gc_delay=1days  
> --docker_socket=/var/run/docker.sock  --no-systemd_enable_support  
> --work_dir=/tmp/mesos  --docker_mesos_image=panteras/paas-in-a-box:0.4.0
> {code}
> from the container that was started with option "pid: host" like:
> {code}
>   net:host
>   privileged: true
>   pid:host
> {code}
> and example marathon job, that use MESOS_HTTP checks like:
> {code}
> {
>  "id": "python-example-stable",
>  "cmd": "python3 -m http.server 8080",
>  "mem": 16,
>  "cpus": 0.1,
>  "instances": 2,
>  "container": {
>"type": "DOCKER",
>"docker": {
>  "image": "python:alpine",
>  "network": "BRIDGE",
>  "portMappings": [
> { "containerPort": 8080, "hostPort": 0, "protocol": "tcp" }
>  ]
>}
>  },
>  "env": {
>"SERVICE_NAME" : "python"
>  },
>  "healthChecks": [
>{
>  "path": "/",
>  "portIndex": 0,
>  "protocol": "MESOS_HTTP",
>  "gracePeriodSeconds": 30,
>  "intervalSeconds": 10,
>  "timeoutSeconds": 30,
>  "maxConsecutiveFailures": 3
>}
>  ]
> }
> {code}
> I see the errors like:
> {code}
> F0306 07:41:58.84429335 health_checker.cpp:94] Failed to enter the net 
> namespace of task (pid: '13527'): Pid 13527 does not exist
> *** Check failure stack trace: ***
> @ 0x7f51770b0c1d  google::LogMessage::Fail()
> @ 0x7f51770b29d0  google::LogMessage::SendToLog()
> @ 0x7f51770b0803  google::LogMessage::Flush()
> @ 0x7f51770b33f9  google::LogMessageFatal::~LogMessageFatal()
> @ 0x7f517647ce46  
> _ZNSt17_Function_handlerIFivEZN5mesos8internal6health14cloneWithSetnsERKSt8functionIS0_E6OptionIiERKSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaISG_EEEUlvE_E9_M_invokeERKSt9_Any_data
> @ 0x7f517647bf2b  mesos::internal::health::cloneWithSetns()
> @ 0x7f517648374b  std::_Function_handler<>::_M_invoke()
> @ 0x7f5177068167  process::internal::cloneChild()
> @ 0x7f5177065c32  process::subprocess()
> @ 0x7f5176481a9d  
> mesos::internal::health::HealthCheckerProcess::_httpHealthCheck()
> @ 0x7f51764831f7  
> mesos::internal::health::HealthCheckerProcess::_healthCheck()
> @ 0x7f517701f38c  process::ProcessBase::visit()
> @ 0x7f517702c8b3  process::ProcessManager::resume()
> @ 0x7f517702fb77  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
> @ 0x7f51754ddc80  (unknown)
> @ 0x7f5174cf06ba  start_thread
> @ 0x7f5174a2682d  (unknown)
> I0306 07:41:59.077986 9 health_checker.cpp:199] Ignoring failure as 
> health check still in grace period
> {code}
> Looks like option docker_mesos_image makes, that newly started mesos job is 
> not using "pid host" option same as mother container was started, but has his 
> own PID namespace (so it doesn't matter if mother container was started with 
> "pid host" or not it will never be able to find PID)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-7210) MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( pid namespace mismatch )

2017-04-04 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15955308#comment-15955308
 ] 

Deshi Xiao commented on MESOS-7210:
---

add 

```
dockerInfo.set_pid("host");
```

> MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( 
> pid namespace mismatch )
> ---
>
> Key: MESOS-7210
> URL: https://issues.apache.org/jira/browse/MESOS-7210
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 1.1.0, 1.1.1, 1.2.0
> Environment: Ubuntu 16.04.02
> Docker version 1.13.1
> mesos 1.1.0, runs from container
> docker containers  spawned by marathon 1.4.1
>Reporter: Wojciech Sielski
>Assignee: haosdent
>Priority: Critical
>
> When running mesos-slave with option "docker_mesos_image" like:
> {code}
> --master=zk://standalone:2181/mesos  --containerizers=docker,mesos  
> --executor_registration_timeout=5mins  --hostname=standalone  --ip=0.0.0.0  
> --docker_stop_timeout=5secs  --gc_delay=1days  
> --docker_socket=/var/run/docker.sock  --no-systemd_enable_support  
> --work_dir=/tmp/mesos  --docker_mesos_image=panteras/paas-in-a-box:0.4.0
> {code}
> from the container that was started with option "pid: host" like:
> {code}
>   net:host
>   privileged: true
>   pid:host
> {code}
> and example marathon job, that use MESOS_HTTP checks like:
> {code}
> {
>  "id": "python-example-stable",
>  "cmd": "python3 -m http.server 8080",
>  "mem": 16,
>  "cpus": 0.1,
>  "instances": 2,
>  "container": {
>"type": "DOCKER",
>"docker": {
>  "image": "python:alpine",
>  "network": "BRIDGE",
>  "portMappings": [
> { "containerPort": 8080, "hostPort": 0, "protocol": "tcp" }
>  ]
>}
>  },
>  "env": {
>"SERVICE_NAME" : "python"
>  },
>  "healthChecks": [
>{
>  "path": "/",
>  "portIndex": 0,
>  "protocol": "MESOS_HTTP",
>  "gracePeriodSeconds": 30,
>  "intervalSeconds": 10,
>  "timeoutSeconds": 30,
>  "maxConsecutiveFailures": 3
>}
>  ]
> }
> {code}
> I see the errors like:
> {code}
> F0306 07:41:58.84429335 health_checker.cpp:94] Failed to enter the net 
> namespace of task (pid: '13527'): Pid 13527 does not exist
> *** Check failure stack trace: ***
> @ 0x7f51770b0c1d  google::LogMessage::Fail()
> @ 0x7f51770b29d0  google::LogMessage::SendToLog()
> @ 0x7f51770b0803  google::LogMessage::Flush()
> @ 0x7f51770b33f9  google::LogMessageFatal::~LogMessageFatal()
> @ 0x7f517647ce46  
> _ZNSt17_Function_handlerIFivEZN5mesos8internal6health14cloneWithSetnsERKSt8functionIS0_E6OptionIiERKSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaISG_EEEUlvE_E9_M_invokeERKSt9_Any_data
> @ 0x7f517647bf2b  mesos::internal::health::cloneWithSetns()
> @ 0x7f517648374b  std::_Function_handler<>::_M_invoke()
> @ 0x7f5177068167  process::internal::cloneChild()
> @ 0x7f5177065c32  process::subprocess()
> @ 0x7f5176481a9d  
> mesos::internal::health::HealthCheckerProcess::_httpHealthCheck()
> @ 0x7f51764831f7  
> mesos::internal::health::HealthCheckerProcess::_healthCheck()
> @ 0x7f517701f38c  process::ProcessBase::visit()
> @ 0x7f517702c8b3  process::ProcessManager::resume()
> @ 0x7f517702fb77  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
> @ 0x7f51754ddc80  (unknown)
> @ 0x7f5174cf06ba  start_thread
> @ 0x7f5174a2682d  (unknown)
> I0306 07:41:59.077986 9 health_checker.cpp:199] Ignoring failure as 
> health check still in grace period
> {code}
> Looks like option docker_mesos_image makes, that newly started mesos job is 
> not using "pid host" option same as mother container was started, but has his 
> own PID namespace (so it doesn't matter if mother container was started with 
> "pid host" or not it will never be able to find PID)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (MESOS-7210) MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( pid namespace mismatch )

2017-04-01 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15950633#comment-15950633
 ] 

Deshi Xiao edited comment on MESOS-7210 at 4/1/17 8:49 AM:
---

sorry, this is my misunderstand. if add mesos_docker_image, the docker executor 
will spawn new container, the container is docker executor, it should be add 
pid=host to mapping host pid pool.


--it difficult to fix due to the mesos agent is wrap into container, we only 
manually add --pid=host to the mesos-agent container, then the pid can find 
same pid with container inside process pid. this is not mesos fault, we prefer 
suggest user can use systemd to running the mesos agent instead of mesos agent 
container, it will benefit with developers and users each other.--


was (Author: xds2000):
sorry, this is my misunderstand. if add mesos_docker_image, the docker executor 
will spawn new container, the container is docker executor, it should be add 
pid=host to mapping host pid pool.


it difficult to fix due to the mesos agent is wrap into container, we only 
manually add --pid=host to the mesos-agent container, then the pid can find 
same pid with container inside process pid. this is not mesos fault, we prefer 
suggest user can use systemd to running the mesos agent instead of mesos agent 
container, it will benefit with developers and users each other.

> MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( 
> pid namespace mismatch )
> ---
>
> Key: MESOS-7210
> URL: https://issues.apache.org/jira/browse/MESOS-7210
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 1.1.0
> Environment: Ubuntu 16.04.02
> Docker version 1.13.1
> mesos 1.1.0, runs from container
> docker containers  spawned by marathon 1.4.1
>Reporter: Wojciech Sielski
>Assignee: haosdent
>
> When running mesos-slave with option "docker_mesos_image" like:
> {code}
> --master=zk://standalone:2181/mesos  --containerizers=docker,mesos  
> --executor_registration_timeout=5mins  --hostname=standalone  --ip=0.0.0.0  
> --docker_stop_timeout=5secs  --gc_delay=1days  
> --docker_socket=/var/run/docker.sock  --no-systemd_enable_support  
> --work_dir=/tmp/mesos  --docker_mesos_image=panteras/paas-in-a-box:0.4.0
> {code}
> from the container that was started with option "pid: host" like:
> {code}
>   net:host
>   privileged: true
>   pid:host
> {code}
> and example marathon job, that use MESOS_HTTP checks like:
> {code}
> {
>  "id": "python-example-stable",
>  "cmd": "python3 -m http.server 8080",
>  "mem": 16,
>  "cpus": 0.1,
>  "instances": 2,
>  "container": {
>"type": "DOCKER",
>"docker": {
>  "image": "python:alpine",
>  "network": "BRIDGE",
>  "portMappings": [
> { "containerPort": 8080, "hostPort": 0, "protocol": "tcp" }
>  ]
>}
>  },
>  "env": {
>"SERVICE_NAME" : "python"
>  },
>  "healthChecks": [
>{
>  "path": "/",
>  "portIndex": 0,
>  "protocol": "MESOS_HTTP",
>  "gracePeriodSeconds": 30,
>  "intervalSeconds": 10,
>  "timeoutSeconds": 30,
>  "maxConsecutiveFailures": 3
>}
>  ]
> }
> {code}
> I see the errors like:
> {code}
> F0306 07:41:58.84429335 health_checker.cpp:94] Failed to enter the net 
> namespace of task (pid: '13527'): Pid 13527 does not exist
> *** Check failure stack trace: ***
> @ 0x7f51770b0c1d  google::LogMessage::Fail()
> @ 0x7f51770b29d0  google::LogMessage::SendToLog()
> @ 0x7f51770b0803  google::LogMessage::Flush()
> @ 0x7f51770b33f9  google::LogMessageFatal::~LogMessageFatal()
> @ 0x7f517647ce46  
> _ZNSt17_Function_handlerIFivEZN5mesos8internal6health14cloneWithSetnsERKSt8functionIS0_E6OptionIiERKSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaISG_EEEUlvE_E9_M_invokeERKSt9_Any_data
> @ 0x7f517647bf2b  mesos::internal::health::cloneWithSetns()
> @ 0x7f517648374b  std::_Function_handler<>::_M_invoke()
> @ 0x7f5177068167  process::internal::cloneChild()
> @ 0x7f5177065c32  process::subprocess()
> @ 0x7f5176481a9d  
> mesos::internal::health::HealthCheckerProcess::_httpHealthCheck()
> @ 0x7f51764831f7  
> mesos::internal::health::HealthCheckerProcess::_healthCheck()
> @ 0x7f517701f38c  process::ProcessBase::visit()
> @ 0x7f517702c8b3  process::ProcessManager::resume()
> @ 0x7f517702fb77  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
> @ 0x7f51754ddc80  (unknown)
> @ 0x7f5174cf06ba  start_thread
> @ 0x7f5174a2682d  (unknown)
> I0306 07:41:59.077986 9 health_checker.cpp:199] Ignoring 

[jira] [Comment Edited] (MESOS-7210) MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( pid namespace mismatch )

2017-04-01 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15950633#comment-15950633
 ] 

Deshi Xiao edited comment on MESOS-7210 at 4/1/17 8:49 AM:
---

sorry, this is my misunderstand. if add mesos_docker_image, the docker executor 
will spawn new container, the container is docker executor, it should be add 
pid=host to mapping host pid pool.


it difficult to fix due to the mesos agent is wrap into container, we only 
manually add --pid=host to the mesos-agent container, then the pid can find 
same pid with container inside process pid. this is not mesos fault, we prefer 
suggest user can use systemd to running the mesos agent instead of mesos agent 
container, it will benefit with developers and users each other.


was (Author: xds2000):
it difficult to fix due to the mesos agent is wrap into container, we only 
manually add --pid=host to the mesos-agent container, then the pid can find 
same pid with container inside process pid. this is not mesos fault, we prefer 
suggest user can use systemd to running the mesos agent instead of mesos agent 
container, it will benefit with developers and users each other.

> MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( 
> pid namespace mismatch )
> ---
>
> Key: MESOS-7210
> URL: https://issues.apache.org/jira/browse/MESOS-7210
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 1.1.0
> Environment: Ubuntu 16.04.02
> Docker version 1.13.1
> mesos 1.1.0, runs from container
> docker containers  spawned by marathon 1.4.1
>Reporter: Wojciech Sielski
>Assignee: haosdent
>
> When running mesos-slave with option "docker_mesos_image" like:
> {code}
> --master=zk://standalone:2181/mesos  --containerizers=docker,mesos  
> --executor_registration_timeout=5mins  --hostname=standalone  --ip=0.0.0.0  
> --docker_stop_timeout=5secs  --gc_delay=1days  
> --docker_socket=/var/run/docker.sock  --no-systemd_enable_support  
> --work_dir=/tmp/mesos  --docker_mesos_image=panteras/paas-in-a-box:0.4.0
> {code}
> from the container that was started with option "pid: host" like:
> {code}
>   net:host
>   privileged: true
>   pid:host
> {code}
> and example marathon job, that use MESOS_HTTP checks like:
> {code}
> {
>  "id": "python-example-stable",
>  "cmd": "python3 -m http.server 8080",
>  "mem": 16,
>  "cpus": 0.1,
>  "instances": 2,
>  "container": {
>"type": "DOCKER",
>"docker": {
>  "image": "python:alpine",
>  "network": "BRIDGE",
>  "portMappings": [
> { "containerPort": 8080, "hostPort": 0, "protocol": "tcp" }
>  ]
>}
>  },
>  "env": {
>"SERVICE_NAME" : "python"
>  },
>  "healthChecks": [
>{
>  "path": "/",
>  "portIndex": 0,
>  "protocol": "MESOS_HTTP",
>  "gracePeriodSeconds": 30,
>  "intervalSeconds": 10,
>  "timeoutSeconds": 30,
>  "maxConsecutiveFailures": 3
>}
>  ]
> }
> {code}
> I see the errors like:
> {code}
> F0306 07:41:58.84429335 health_checker.cpp:94] Failed to enter the net 
> namespace of task (pid: '13527'): Pid 13527 does not exist
> *** Check failure stack trace: ***
> @ 0x7f51770b0c1d  google::LogMessage::Fail()
> @ 0x7f51770b29d0  google::LogMessage::SendToLog()
> @ 0x7f51770b0803  google::LogMessage::Flush()
> @ 0x7f51770b33f9  google::LogMessageFatal::~LogMessageFatal()
> @ 0x7f517647ce46  
> _ZNSt17_Function_handlerIFivEZN5mesos8internal6health14cloneWithSetnsERKSt8functionIS0_E6OptionIiERKSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaISG_EEEUlvE_E9_M_invokeERKSt9_Any_data
> @ 0x7f517647bf2b  mesos::internal::health::cloneWithSetns()
> @ 0x7f517648374b  std::_Function_handler<>::_M_invoke()
> @ 0x7f5177068167  process::internal::cloneChild()
> @ 0x7f5177065c32  process::subprocess()
> @ 0x7f5176481a9d  
> mesos::internal::health::HealthCheckerProcess::_httpHealthCheck()
> @ 0x7f51764831f7  
> mesos::internal::health::HealthCheckerProcess::_healthCheck()
> @ 0x7f517701f38c  process::ProcessBase::visit()
> @ 0x7f517702c8b3  process::ProcessManager::resume()
> @ 0x7f517702fb77  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
> @ 0x7f51754ddc80  (unknown)
> @ 0x7f5174cf06ba  start_thread
> @ 0x7f5174a2682d  (unknown)
> I0306 07:41:59.077986 9 health_checker.cpp:199] Ignoring failure as 
> health check still in grace period
> {code}
> Looks like option docker_mesos_image makes, that newly started mesos job is 
> not using "pid host" option same as mother container was started, but has 

[jira] [Commented] (MESOS-3545) Investigate restoring tasks/executors after machine reboot.

2017-04-01 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15952102#comment-15952102
 ] 

Deshi Xiao commented on MESOS-3545:
---

any update? [~megha.sharma] [~xujyan]

> Investigate restoring tasks/executors after machine reboot.
> ---
>
> Key: MESOS-3545
> URL: https://issues.apache.org/jira/browse/MESOS-3545
> Project: Mesos
>  Issue Type: Epic
>  Components: agent
>Reporter: Benjamin Hindman
>Assignee: Megha Sharma
>
> If a task/executor is restartable (see MESOS-3544) it might make sense to 
> force an agent to restart these tasks/executors _before_ after a machine 
> reboot in the event that the machine is network partitioned away from the 
> master (or the master has failed) but we'd like to get these services running 
> again. Assuming the agent(s) running on the machine has not been disconnected 
> from the master for longer than the master's agent re-registration timeout 
> the agent should be able to re-register (i.e., after a network partition is 
> resolved) without a problem. However, in the same way that a framework would 
> be interested in knowing that it's tasks/executors were restarted we'd want 
> to send something like a TASK_RESTARTED status update.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-7210) MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( pid namespace mismatch )

2017-03-31 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15950633#comment-15950633
 ] 

Deshi Xiao commented on MESOS-7210:
---

it difficult to fix due to the mesos agent is wrap into container, we only 
manually add --pid=host to the mesos-agent container, then the pid can find 
same pid with container inside process pid. this is not mesos fault, we prefer 
suggest user can use systemd to running the mesos agent instead of mesos agent 
container, it will benefit with developers and users each other.

> MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( 
> pid namespace mismatch )
> ---
>
> Key: MESOS-7210
> URL: https://issues.apache.org/jira/browse/MESOS-7210
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 1.1.0
> Environment: Ubuntu 16.04.02
> Docker version 1.13.1
> mesos 1.1.0, runs from container
> docker containers  spawned by marathon 1.4.1
>Reporter: Wojciech Sielski
>Assignee: haosdent
>
> When running mesos-slave with option "docker_mesos_image" like:
> {code}
> --master=zk://standalone:2181/mesos  --containerizers=docker,mesos  
> --executor_registration_timeout=5mins  --hostname=standalone  --ip=0.0.0.0  
> --docker_stop_timeout=5secs  --gc_delay=1days  
> --docker_socket=/var/run/docker.sock  --no-systemd_enable_support  
> --work_dir=/tmp/mesos  --docker_mesos_image=panteras/paas-in-a-box:0.4.0
> {code}
> from the container that was started with option "pid: host" like:
> {code}
>   net:host
>   privileged: true
>   pid:host
> {code}
> and example marathon job, that use MESOS_HTTP checks like:
> {code}
> {
>  "id": "python-example-stable",
>  "cmd": "python3 -m http.server 8080",
>  "mem": 16,
>  "cpus": 0.1,
>  "instances": 2,
>  "container": {
>"type": "DOCKER",
>"docker": {
>  "image": "python:alpine",
>  "network": "BRIDGE",
>  "portMappings": [
> { "containerPort": 8080, "hostPort": 0, "protocol": "tcp" }
>  ]
>}
>  },
>  "env": {
>"SERVICE_NAME" : "python"
>  },
>  "healthChecks": [
>{
>  "path": "/",
>  "portIndex": 0,
>  "protocol": "MESOS_HTTP",
>  "gracePeriodSeconds": 30,
>  "intervalSeconds": 10,
>  "timeoutSeconds": 30,
>  "maxConsecutiveFailures": 3
>}
>  ]
> }
> {code}
> I see the errors like:
> {code}
> F0306 07:41:58.84429335 health_checker.cpp:94] Failed to enter the net 
> namespace of task (pid: '13527'): Pid 13527 does not exist
> *** Check failure stack trace: ***
> @ 0x7f51770b0c1d  google::LogMessage::Fail()
> @ 0x7f51770b29d0  google::LogMessage::SendToLog()
> @ 0x7f51770b0803  google::LogMessage::Flush()
> @ 0x7f51770b33f9  google::LogMessageFatal::~LogMessageFatal()
> @ 0x7f517647ce46  
> _ZNSt17_Function_handlerIFivEZN5mesos8internal6health14cloneWithSetnsERKSt8functionIS0_E6OptionIiERKSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaISG_EEEUlvE_E9_M_invokeERKSt9_Any_data
> @ 0x7f517647bf2b  mesos::internal::health::cloneWithSetns()
> @ 0x7f517648374b  std::_Function_handler<>::_M_invoke()
> @ 0x7f5177068167  process::internal::cloneChild()
> @ 0x7f5177065c32  process::subprocess()
> @ 0x7f5176481a9d  
> mesos::internal::health::HealthCheckerProcess::_httpHealthCheck()
> @ 0x7f51764831f7  
> mesos::internal::health::HealthCheckerProcess::_healthCheck()
> @ 0x7f517701f38c  process::ProcessBase::visit()
> @ 0x7f517702c8b3  process::ProcessManager::resume()
> @ 0x7f517702fb77  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
> @ 0x7f51754ddc80  (unknown)
> @ 0x7f5174cf06ba  start_thread
> @ 0x7f5174a2682d  (unknown)
> I0306 07:41:59.077986 9 health_checker.cpp:199] Ignoring failure as 
> health check still in grace period
> {code}
> Looks like option docker_mesos_image makes, that newly started mesos job is 
> not using "pid host" option same as mother container was started, but has his 
> own PID namespace (so it doesn't matter if mother container was started with 
> "pid host" or not it will never be able to find PID)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-7210) MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( pid namespace mismatch )

2017-03-30 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15950251#comment-15950251
 ] 

Deshi Xiao commented on MESOS-7210:
---

add me 

> MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( 
> pid namespace mismatch )
> ---
>
> Key: MESOS-7210
> URL: https://issues.apache.org/jira/browse/MESOS-7210
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 1.1.0
> Environment: Ubuntu 16.04.02
> Docker version 1.13.1
> mesos 1.1.0, runs from container
> docker containers  spawned by marathon 1.4.1
>Reporter: Wojciech Sielski
>Assignee: haosdent
>
> When running mesos-slave with option "docker_mesos_image" like:
> {code}
> --master=zk://standalone:2181/mesos  --containerizers=docker,mesos  
> --executor_registration_timeout=5mins  --hostname=standalone  --ip=0.0.0.0  
> --docker_stop_timeout=5secs  --gc_delay=1days  
> --docker_socket=/var/run/docker.sock  --no-systemd_enable_support  
> --work_dir=/tmp/mesos  --docker_mesos_image=panteras/paas-in-a-box:0.4.0
> {code}
> from the container that was started with option "pid: host" like:
> {code}
>   net:host
>   privileged: true
>   pid:host
> {code}
> and example marathon job, that use MESOS_HTTP checks like:
> {code}
> {
>  "id": "python-example-stable",
>  "cmd": "python3 -m http.server 8080",
>  "mem": 16,
>  "cpus": 0.1,
>  "instances": 2,
>  "container": {
>"type": "DOCKER",
>"docker": {
>  "image": "python:alpine",
>  "network": "BRIDGE",
>  "portMappings": [
> { "containerPort": 8080, "hostPort": 0, "protocol": "tcp" }
>  ]
>}
>  },
>  "env": {
>"SERVICE_NAME" : "python"
>  },
>  "healthChecks": [
>{
>  "path": "/",
>  "portIndex": 0,
>  "protocol": "MESOS_HTTP",
>  "gracePeriodSeconds": 30,
>  "intervalSeconds": 10,
>  "timeoutSeconds": 30,
>  "maxConsecutiveFailures": 3
>}
>  ]
> }
> {code}
> I see the errors like:
> {code}
> F0306 07:41:58.84429335 health_checker.cpp:94] Failed to enter the net 
> namespace of task (pid: '13527'): Pid 13527 does not exist
> *** Check failure stack trace: ***
> @ 0x7f51770b0c1d  google::LogMessage::Fail()
> @ 0x7f51770b29d0  google::LogMessage::SendToLog()
> @ 0x7f51770b0803  google::LogMessage::Flush()
> @ 0x7f51770b33f9  google::LogMessageFatal::~LogMessageFatal()
> @ 0x7f517647ce46  
> _ZNSt17_Function_handlerIFivEZN5mesos8internal6health14cloneWithSetnsERKSt8functionIS0_E6OptionIiERKSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaISG_EEEUlvE_E9_M_invokeERKSt9_Any_data
> @ 0x7f517647bf2b  mesos::internal::health::cloneWithSetns()
> @ 0x7f517648374b  std::_Function_handler<>::_M_invoke()
> @ 0x7f5177068167  process::internal::cloneChild()
> @ 0x7f5177065c32  process::subprocess()
> @ 0x7f5176481a9d  
> mesos::internal::health::HealthCheckerProcess::_httpHealthCheck()
> @ 0x7f51764831f7  
> mesos::internal::health::HealthCheckerProcess::_healthCheck()
> @ 0x7f517701f38c  process::ProcessBase::visit()
> @ 0x7f517702c8b3  process::ProcessManager::resume()
> @ 0x7f517702fb77  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
> @ 0x7f51754ddc80  (unknown)
> @ 0x7f5174cf06ba  start_thread
> @ 0x7f5174a2682d  (unknown)
> I0306 07:41:59.077986 9 health_checker.cpp:199] Ignoring failure as 
> health check still in grace period
> {code}
> Looks like option docker_mesos_image makes, that newly started mesos job is 
> not using "pid host" option same as mother container was started, but has his 
> own PID namespace (so it doesn't matter if mother container was started with 
> "pid host" or not it will never be able to find PID)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (MESOS-7183) Always get coredump by add a health check on docker container app

2017-03-30 Thread Deshi Xiao (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deshi Xiao reassigned MESOS-7183:
-

Resolution: Resolved
  Assignee: Deshi Xiao

For Container case, add pid=host to resolve this. this is not Mesos's issue.

> Always get coredump by add a health check on docker container app
> -
>
> Key: MESOS-7183
> URL: https://issues.apache.org/jira/browse/MESOS-7183
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.1.1, 1.2.0
>Reporter: Deshi Xiao
>Assignee: Deshi Xiao
> Attachments: stderr
>
>
> the key message is : Failed to enter the net namespace of task (pid: 
> '22392'): Pid 22392 does not exist
> see the sandbox's stderr log:
> {code}
> I0227 09:20:02.624827 22345 exec.cpp:162] Version: 1.1.0
> I0227 09:20:02.651790 22347 exec.cpp:237] Executor registered on agent 
> f2aeab4d-b224-479c-869d-121daa0c12cb-S0
> I0227 09:20:02.656651 22347 docker.cpp:811] Running docker -H 
> unix:///var/run/docker.sock run --privileged --cpu-shares 2048 --memory 
> 33554432 -e WORDPRESS_DB_HOST=192.168.1.210 -e WORDPRESS_DB_PASSWORD=root -e 
> MESOS_SANDBOX=/mnt/mesos/sandbox -e 
> MESOS_CONTAINER_NAME=mesos-f2aeab4d-b224-479c-869d-121daa0c12cb-S0.8f6de3ab-0e85-434a-a099-d16f9654a10c
>  -v /home:/data:rw -v 
> /var/lib/mesos/slaves/f2aeab4d-b224-479c-869d-121daa0c12cb-S0/frameworks/67b3106e-fe2b-4eaa-8dcc-51653d027738-0001/executors/0-wordpress4-nmg-nmgtest-55ba456bf6eb4e979610f5ec1fb23980/runs/8f6de3ab-0e85-434a-a099-d16f9654a10c:/mnt/mesos/sandbox
>  --net bridge --label=APP_ID=wordpress --label=USER=nmg 
> --label=CLUSTER=nmgtest --label=SLOT=0 --label=APP=wordpress4 -p 
> 31000:8080/tcp --name 
> mesos-f2aeab4d-b224-479c-869d-121daa0c12cb-S0.8f6de3ab-0e85-434a-a099-d16f9654a10c
>  wordpress
> WordPress not found in /var/www/html - copying now...
> Complete! WordPress has been successfully copied to /var/www/html
> MySQL Connection Error: (2002) Connection refused
> MySQL Connection Error: (2002) Connection refused
> MySQL Connection Error: (2002) Connection refused
> MySQL Connection Error: (2002) Connection refused
> MySQL Connection Error: (2002) Connection refused
> W0227 09:20:18.425110 22353 health_checker.cpp:205] Health check failed 1 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:19.535784 22347 health_checker.cpp:205] Health check failed 2 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> MySQL Connection Error: (2002) Connection refused
> W0227 09:20:20.646812 22350 health_checker.cpp:205] Health check failed 3 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:21.758222 22353 health_checker.cpp:205] Health check failed 4 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:22.773813 22349 health_checker.cpp:205] Health check failed 5 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> MySQL Connection Error: (2002) Connection refused
> W0227 09:20:23.883586 22352 health_checker.cpp:205] Health check failed 6 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:24.994628 22350 health_checker.cpp:205] Health check failed 7 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:26.106149 22352 health_checker.cpp:205] Health check failed 8 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> MySQL Connection Error: (2002) Connection refused
> W0227 09:20:27.218143 22351 health_checker.cpp:205] Health check failed 9 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:28.329988 22350 health_checker.cpp:205] Health check failed 10 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:29.440842 22348 health_checker.cpp:205] Health check failed 11 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 

[jira] [Commented] (MESOS-5544) Support running Mesos agent in a Docker container.

2017-03-29 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15947143#comment-15947143
 ] 

Deshi Xiao commented on MESOS-5544:
---

anyone can summary this feature's status?

> Support running Mesos agent in a Docker container.
> --
>
> Key: MESOS-5544
> URL: https://issues.apache.org/jira/browse/MESOS-5544
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Jie Yu
>
> Currently, this does not work if one tries to use Mesos containerizer.
> The main problem is that we want to make sure the executor is not killed when 
> agent crashes. So we have to use --pid=host so that the agent is in the host 
> pid namespace.
> But that is not sufficient, Docker daemon will put agent into all cgroups 
> available on the host. We need to make sure we migrate the executor pid out 
> of those cgroups so that when agent crashes, executors are not killed.
> Also, when start the agent container, volumes need to be setup properly so 
> that any mounts under agent's work_dir will be propagate back to the host 
> mount table. This is to make sure we can recover those mounts after agent 
> restarts. This is also true for those mounts that are needed by some isolator 
> (e.g., network/cni isolator).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Issue Comment Deleted] (MESOS-6184) Health checks should use a general mechanism to enter namespaces of the task.

2017-03-29 Thread Deshi Xiao (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deshi Xiao updated MESOS-6184:
--
Comment: was deleted

(was: i have rebase the patch to 1.2.0 branch codebase. and testing it, it 
always get coredump file.

```
I0328 11:48:12.92218148 exec.cpp:162] Version: 1.2.0
I0328 11:48:12.92925254 exec.cpp:237] Executor registered on agent 
a29dc3a5-3e3f-4058-8ab4-dd7de2ae58d1-S4
I0328 11:48:12.93164054 docker.cpp:850] Running docker -H 
unix:///var/run/docker.sock run --cpu-shares 10 --memory 33554432 --env-file 
/tmp/gvqGyb -v 
/data/mesos/slaves/a29dc3a5-3e3f-4058-8ab4-dd7de2ae58d1-S4/frameworks/d7ef5d2b-f924-42d9-a274-c020afba6bce-/executors/0-hc-xychu-datamanmesos-2f3b47f9ffc048539c7b22baa6c32d8f/runs/458189b8-2ff4-4337-ad3a-67321e96f5cb:/mnt/mesos/sandbox
 --net bridge --label=USER_NAME=xychu --label=GROUP_NAME=groupautotest 
--label=APP_ID=hc --label=VCLUSTER=clusterautotest --label=USER=xychu 
--label=CLUSTER=datamanmesos --label=SLOT=0 --label=APP=hc -p 31000:80/tcp 
--name 
mesos-a29dc3a5-3e3f-4058-8ab4-dd7de2ae58d1-S4.458189b8-2ff4-4337-ad3a-67321e96f5cb
 nginx
I0328 11:48:16.14571453 health_checker.cpp:196] Ignoring failure as health 
check still in grace period
W0328 11:48:26.28995849 health_checker.cpp:202] Health check failed 1 times 
consecutively: HTTP health check failed: curl returned terminated with signal 
Aborted (core dumped): ABORT: 
(../../../3rdparty/libprocess/include/process/posix/subprocess.hpp:190): Failed 
to execute Subprocess::ChildHook: Failed to enter the net namespace of pid 
18596: Pid 18596 does not exist
*** Aborted at 1490672906 (unix time) try "date -d @1490672906" if you are 
using GNU date ***
PC: @ 0x7f26bfb485f7 __GI_raise
*** SIGABRT (@0x4a) received by PID 74 (TID 0x7f26ba152700) from PID 74; stack 
trace: ***
@ 0x7f26c0703100 (unknown)
@ 0x7f26bfb485f7 __GI_raise
@ 0x7f26bfb49ce8 __GI_abort
@ 0x7f26c315778e _Abort()
@ 0x7f26c31577cc _Abort()
@ 0x7f26c237a4b6 process::internal::childMain()
@ 0x7f26c2379e9c std::_Function_handler<>::_M_invoke()
@ 0x7f26c2379e53 process::internal::defaultClone()
@ 0x7f26c237b951 process::internal::cloneChild()
@ 0x7f26c237954f process::subprocess()
@ 0x7f26c15a9fb1 
mesos::internal::checks::HealthCheckerProcess::httpHealthCheck()
@ 0x7f26c15ababd 
mesos::internal::checks::HealthCheckerProcess::performSingleCheck()
@ 0x7f26c2331389 process::ProcessManager::resume()
@ 0x7f26c233a3f7 
_ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
@ 0x7f26c04a1220 (unknown)
@ 0x7f26c06fbdc5 start_thread
@ 0x7f26bfc0928d __clone
W0328 11:48:36.34005555 health_checker.cpp:202] Health check failed 2 times 
consecutively: HTTP health check failed: curl returned terminated with signal 
Aborted (core dumped): ABORT: 
(../../../3rdparty/libprocess/include/process/posix/subprocess.hpp:190): Failed 
to execute Subprocess::ChildHook: Failed to enter the net namespace of pid 
18596: Pid 18596 does not exist
*** Aborted at 1490672916 (unix time) try "date -d @1490672916" if you are 
using GNU date ***
PC: @ 0x7f26bfb485f7 __GI_raise
*** SIGABRT (@0x4b) received by PID 75 (TID 0x7f26b9951700) from PID 75; stack 
trace: ***
@ 0x7f26c0703100 (unknown)
@ 0x7f26bfb485f7 __GI_raise
@ 0x7f26bfb49ce8 __GI_abort
@ 0x7f26c315778e _Abort()
@ 0x7f26c31577cc _Abort()
@ 0x7f26c237a4b6 process::internal::childMain()
@ 0x7f26c2379e9c std::_Function_handler<>::_M_invoke()
@ 0x7f26c2379e53 process::internal::defaultClone()
@ 0x7f26c237b951 process::internal::cloneChild()
@ 0x7f26c237954f process::subprocess()
@ 0x7f26c15a9fb1 
mesos::internal::checks::HealthCheckerProcess::httpHealthCheck()
@ 0x7f26c15ababd 
mesos::internal::checks::HealthCheckerProcess::performSingleCheck()
@ 0x7f26c2331389 process::ProcessManager::resume()
@ 0x7f26c233a3f7 
_ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
@ 0x7f26c04a1220 (unknown)
@ 0x7f26c06fbdc5 start_thread
@ 0x7f26bfc0928d __clone
W0328 11:48:46.38653349 health_checker.cpp:202] Health check failed 3 times 
consecutively: HTTP health check failed: curl returned terminated with signal 
Aborted (core dumped): ABORT: 
(../../../3rdparty/libprocess/include/process/posix/subprocess.hpp:190): Failed 
to execute Subprocess::ChildHook: Failed to enter the net namespace of pid 
18596: Pid 18596 does not exist
*** Aborted at 1490672926 (unix time) try "date -d @1490672926" if you are 
using GNU date ***
PC: @ 0x7f26bfb485f7 __GI_raise
*** SIGABRT (@0x4c) received by PID 76 (TID 0x7f26ba152700) from PID 76; stack 
trace: ***
@ 

[jira] [Assigned] (MESOS-7183) Always get coredump by add a health check on docker container app

2017-03-29 Thread Deshi Xiao (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deshi Xiao reassigned MESOS-7183:
-

Resolution: Won't Fix
  Assignee: Deshi Xiao

this is specified case, when mesos in docker, we should be add --pid=host to 
let native health check process can access host pid scope. 

> Always get coredump by add a health check on docker container app
> -
>
> Key: MESOS-7183
> URL: https://issues.apache.org/jira/browse/MESOS-7183
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.1.0, 1.2.0
>Reporter: Deshi Xiao
>Assignee: Deshi Xiao
> Attachments: stderr
>
>
> the key message is : Failed to enter the net namespace of task (pid: 
> '22392'): Pid 22392 does not exist
> see the sandbox's stderr log:
> {code}
> I0227 09:20:02.624827 22345 exec.cpp:162] Version: 1.1.0
> I0227 09:20:02.651790 22347 exec.cpp:237] Executor registered on agent 
> f2aeab4d-b224-479c-869d-121daa0c12cb-S0
> I0227 09:20:02.656651 22347 docker.cpp:811] Running docker -H 
> unix:///var/run/docker.sock run --privileged --cpu-shares 2048 --memory 
> 33554432 -e WORDPRESS_DB_HOST=192.168.1.210 -e WORDPRESS_DB_PASSWORD=root -e 
> MESOS_SANDBOX=/mnt/mesos/sandbox -e 
> MESOS_CONTAINER_NAME=mesos-f2aeab4d-b224-479c-869d-121daa0c12cb-S0.8f6de3ab-0e85-434a-a099-d16f9654a10c
>  -v /home:/data:rw -v 
> /var/lib/mesos/slaves/f2aeab4d-b224-479c-869d-121daa0c12cb-S0/frameworks/67b3106e-fe2b-4eaa-8dcc-51653d027738-0001/executors/0-wordpress4-nmg-nmgtest-55ba456bf6eb4e979610f5ec1fb23980/runs/8f6de3ab-0e85-434a-a099-d16f9654a10c:/mnt/mesos/sandbox
>  --net bridge --label=APP_ID=wordpress --label=USER=nmg 
> --label=CLUSTER=nmgtest --label=SLOT=0 --label=APP=wordpress4 -p 
> 31000:8080/tcp --name 
> mesos-f2aeab4d-b224-479c-869d-121daa0c12cb-S0.8f6de3ab-0e85-434a-a099-d16f9654a10c
>  wordpress
> WordPress not found in /var/www/html - copying now...
> Complete! WordPress has been successfully copied to /var/www/html
> MySQL Connection Error: (2002) Connection refused
> MySQL Connection Error: (2002) Connection refused
> MySQL Connection Error: (2002) Connection refused
> MySQL Connection Error: (2002) Connection refused
> MySQL Connection Error: (2002) Connection refused
> W0227 09:20:18.425110 22353 health_checker.cpp:205] Health check failed 1 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:19.535784 22347 health_checker.cpp:205] Health check failed 2 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> MySQL Connection Error: (2002) Connection refused
> W0227 09:20:20.646812 22350 health_checker.cpp:205] Health check failed 3 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:21.758222 22353 health_checker.cpp:205] Health check failed 4 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:22.773813 22349 health_checker.cpp:205] Health check failed 5 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> MySQL Connection Error: (2002) Connection refused
> W0227 09:20:23.883586 22352 health_checker.cpp:205] Health check failed 6 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:24.994628 22350 health_checker.cpp:205] Health check failed 7 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:26.106149 22352 health_checker.cpp:205] Health check failed 8 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> MySQL Connection Error: (2002) Connection refused
> W0227 09:20:27.218143 22351 health_checker.cpp:205] Health check failed 9 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:28.329988 22350 health_checker.cpp:205] Health check failed 10 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:29.440842 22348 health_checker.cpp:205] Health check failed 11 
> times consecutively: HTTP health check failed: curl 

[jira] [Commented] (MESOS-7183) Always get coredump by add a health check on docker container app

2017-03-29 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15946778#comment-15946778
 ] 

Deshi Xiao commented on MESOS-7183:
---

add  --pid=host resolve this issue.

> Always get coredump by add a health check on docker container app
> -
>
> Key: MESOS-7183
> URL: https://issues.apache.org/jira/browse/MESOS-7183
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.1.0, 1.2.0
>Reporter: Deshi Xiao
> Attachments: stderr
>
>
> the key message is : Failed to enter the net namespace of task (pid: 
> '22392'): Pid 22392 does not exist
> see the sandbox's stderr log:
> {code}
> I0227 09:20:02.624827 22345 exec.cpp:162] Version: 1.1.0
> I0227 09:20:02.651790 22347 exec.cpp:237] Executor registered on agent 
> f2aeab4d-b224-479c-869d-121daa0c12cb-S0
> I0227 09:20:02.656651 22347 docker.cpp:811] Running docker -H 
> unix:///var/run/docker.sock run --privileged --cpu-shares 2048 --memory 
> 33554432 -e WORDPRESS_DB_HOST=192.168.1.210 -e WORDPRESS_DB_PASSWORD=root -e 
> MESOS_SANDBOX=/mnt/mesos/sandbox -e 
> MESOS_CONTAINER_NAME=mesos-f2aeab4d-b224-479c-869d-121daa0c12cb-S0.8f6de3ab-0e85-434a-a099-d16f9654a10c
>  -v /home:/data:rw -v 
> /var/lib/mesos/slaves/f2aeab4d-b224-479c-869d-121daa0c12cb-S0/frameworks/67b3106e-fe2b-4eaa-8dcc-51653d027738-0001/executors/0-wordpress4-nmg-nmgtest-55ba456bf6eb4e979610f5ec1fb23980/runs/8f6de3ab-0e85-434a-a099-d16f9654a10c:/mnt/mesos/sandbox
>  --net bridge --label=APP_ID=wordpress --label=USER=nmg 
> --label=CLUSTER=nmgtest --label=SLOT=0 --label=APP=wordpress4 -p 
> 31000:8080/tcp --name 
> mesos-f2aeab4d-b224-479c-869d-121daa0c12cb-S0.8f6de3ab-0e85-434a-a099-d16f9654a10c
>  wordpress
> WordPress not found in /var/www/html - copying now...
> Complete! WordPress has been successfully copied to /var/www/html
> MySQL Connection Error: (2002) Connection refused
> MySQL Connection Error: (2002) Connection refused
> MySQL Connection Error: (2002) Connection refused
> MySQL Connection Error: (2002) Connection refused
> MySQL Connection Error: (2002) Connection refused
> W0227 09:20:18.425110 22353 health_checker.cpp:205] Health check failed 1 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:19.535784 22347 health_checker.cpp:205] Health check failed 2 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> MySQL Connection Error: (2002) Connection refused
> W0227 09:20:20.646812 22350 health_checker.cpp:205] Health check failed 3 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:21.758222 22353 health_checker.cpp:205] Health check failed 4 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:22.773813 22349 health_checker.cpp:205] Health check failed 5 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> MySQL Connection Error: (2002) Connection refused
> W0227 09:20:23.883586 22352 health_checker.cpp:205] Health check failed 6 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:24.994628 22350 health_checker.cpp:205] Health check failed 7 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:26.106149 22352 health_checker.cpp:205] Health check failed 8 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> MySQL Connection Error: (2002) Connection refused
> W0227 09:20:27.218143 22351 health_checker.cpp:205] Health check failed 9 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:28.329988 22350 health_checker.cpp:205] Health check failed 10 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:29.440842 22348 health_checker.cpp:205] Health check failed 11 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> MySQL Connection Error: (2002) Connection refused
> W0227 

[jira] [Commented] (MESOS-7183) Always get coredump by add a health check on docker container app

2017-03-28 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15946550#comment-15946550
 ] 

Deshi Xiao commented on MESOS-7183:
---

the reason is pid not found in container. use pid=host to get pid mapping. let 
me have a try.

> Always get coredump by add a health check on docker container app
> -
>
> Key: MESOS-7183
> URL: https://issues.apache.org/jira/browse/MESOS-7183
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.1.0, 1.2.0
>Reporter: Deshi Xiao
> Attachments: stderr
>
>
> the key message is : Failed to enter the net namespace of task (pid: 
> '22392'): Pid 22392 does not exist
> see the sandbox's stderr log:
> {code}
> I0227 09:20:02.624827 22345 exec.cpp:162] Version: 1.1.0
> I0227 09:20:02.651790 22347 exec.cpp:237] Executor registered on agent 
> f2aeab4d-b224-479c-869d-121daa0c12cb-S0
> I0227 09:20:02.656651 22347 docker.cpp:811] Running docker -H 
> unix:///var/run/docker.sock run --privileged --cpu-shares 2048 --memory 
> 33554432 -e WORDPRESS_DB_HOST=192.168.1.210 -e WORDPRESS_DB_PASSWORD=root -e 
> MESOS_SANDBOX=/mnt/mesos/sandbox -e 
> MESOS_CONTAINER_NAME=mesos-f2aeab4d-b224-479c-869d-121daa0c12cb-S0.8f6de3ab-0e85-434a-a099-d16f9654a10c
>  -v /home:/data:rw -v 
> /var/lib/mesos/slaves/f2aeab4d-b224-479c-869d-121daa0c12cb-S0/frameworks/67b3106e-fe2b-4eaa-8dcc-51653d027738-0001/executors/0-wordpress4-nmg-nmgtest-55ba456bf6eb4e979610f5ec1fb23980/runs/8f6de3ab-0e85-434a-a099-d16f9654a10c:/mnt/mesos/sandbox
>  --net bridge --label=APP_ID=wordpress --label=USER=nmg 
> --label=CLUSTER=nmgtest --label=SLOT=0 --label=APP=wordpress4 -p 
> 31000:8080/tcp --name 
> mesos-f2aeab4d-b224-479c-869d-121daa0c12cb-S0.8f6de3ab-0e85-434a-a099-d16f9654a10c
>  wordpress
> WordPress not found in /var/www/html - copying now...
> Complete! WordPress has been successfully copied to /var/www/html
> MySQL Connection Error: (2002) Connection refused
> MySQL Connection Error: (2002) Connection refused
> MySQL Connection Error: (2002) Connection refused
> MySQL Connection Error: (2002) Connection refused
> MySQL Connection Error: (2002) Connection refused
> W0227 09:20:18.425110 22353 health_checker.cpp:205] Health check failed 1 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:19.535784 22347 health_checker.cpp:205] Health check failed 2 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> MySQL Connection Error: (2002) Connection refused
> W0227 09:20:20.646812 22350 health_checker.cpp:205] Health check failed 3 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:21.758222 22353 health_checker.cpp:205] Health check failed 4 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:22.773813 22349 health_checker.cpp:205] Health check failed 5 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> MySQL Connection Error: (2002) Connection refused
> W0227 09:20:23.883586 22352 health_checker.cpp:205] Health check failed 6 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:24.994628 22350 health_checker.cpp:205] Health check failed 7 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:26.106149 22352 health_checker.cpp:205] Health check failed 8 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> MySQL Connection Error: (2002) Connection refused
> W0227 09:20:27.218143 22351 health_checker.cpp:205] Health check failed 9 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:28.329988 22350 health_checker.cpp:205] Health check failed 10 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:29.440842 22348 health_checker.cpp:205] Health check failed 11 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> 

[jira] [Commented] (MESOS-7183) Always get coredump by add a health check on docker container app

2017-03-28 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15946406#comment-15946406
 ] 

Deshi Xiao commented on MESOS-7183:
---

My Environment is specified:

mesos 1.2 in docker containerized.

send a sample nginx docker container with mesos native health check.

then get sandbox core dump.

i have digg into more information for your reference:

in mesos slave container, i can only see task container pid. but i can't found 
process nginx pid.

but in host console, i can found the nginx pid. so do we need fix it bug? 

> Always get coredump by add a health check on docker container app
> -
>
> Key: MESOS-7183
> URL: https://issues.apache.org/jira/browse/MESOS-7183
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.1.0, 1.2.0
>Reporter: Deshi Xiao
> Attachments: stderr
>
>
> the key message is : Failed to enter the net namespace of task (pid: 
> '22392'): Pid 22392 does not exist
> see the sandbox's stderr log:
> {code}
> I0227 09:20:02.624827 22345 exec.cpp:162] Version: 1.1.0
> I0227 09:20:02.651790 22347 exec.cpp:237] Executor registered on agent 
> f2aeab4d-b224-479c-869d-121daa0c12cb-S0
> I0227 09:20:02.656651 22347 docker.cpp:811] Running docker -H 
> unix:///var/run/docker.sock run --privileged --cpu-shares 2048 --memory 
> 33554432 -e WORDPRESS_DB_HOST=192.168.1.210 -e WORDPRESS_DB_PASSWORD=root -e 
> MESOS_SANDBOX=/mnt/mesos/sandbox -e 
> MESOS_CONTAINER_NAME=mesos-f2aeab4d-b224-479c-869d-121daa0c12cb-S0.8f6de3ab-0e85-434a-a099-d16f9654a10c
>  -v /home:/data:rw -v 
> /var/lib/mesos/slaves/f2aeab4d-b224-479c-869d-121daa0c12cb-S0/frameworks/67b3106e-fe2b-4eaa-8dcc-51653d027738-0001/executors/0-wordpress4-nmg-nmgtest-55ba456bf6eb4e979610f5ec1fb23980/runs/8f6de3ab-0e85-434a-a099-d16f9654a10c:/mnt/mesos/sandbox
>  --net bridge --label=APP_ID=wordpress --label=USER=nmg 
> --label=CLUSTER=nmgtest --label=SLOT=0 --label=APP=wordpress4 -p 
> 31000:8080/tcp --name 
> mesos-f2aeab4d-b224-479c-869d-121daa0c12cb-S0.8f6de3ab-0e85-434a-a099-d16f9654a10c
>  wordpress
> WordPress not found in /var/www/html - copying now...
> Complete! WordPress has been successfully copied to /var/www/html
> MySQL Connection Error: (2002) Connection refused
> MySQL Connection Error: (2002) Connection refused
> MySQL Connection Error: (2002) Connection refused
> MySQL Connection Error: (2002) Connection refused
> MySQL Connection Error: (2002) Connection refused
> W0227 09:20:18.425110 22353 health_checker.cpp:205] Health check failed 1 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:19.535784 22347 health_checker.cpp:205] Health check failed 2 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> MySQL Connection Error: (2002) Connection refused
> W0227 09:20:20.646812 22350 health_checker.cpp:205] Health check failed 3 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:21.758222 22353 health_checker.cpp:205] Health check failed 4 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:22.773813 22349 health_checker.cpp:205] Health check failed 5 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> MySQL Connection Error: (2002) Connection refused
> W0227 09:20:23.883586 22352 health_checker.cpp:205] Health check failed 6 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:24.994628 22350 health_checker.cpp:205] Health check failed 7 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:26.106149 22352 health_checker.cpp:205] Health check failed 8 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> MySQL Connection Error: (2002) Connection refused
> W0227 09:20:27.218143 22351 health_checker.cpp:205] Health check failed 9 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:28.329988 22350 health_checker.cpp:205] Health check failed 10 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 

[jira] [Comment Edited] (MESOS-7183) Always get coredump by add a health check on docker container app

2017-03-28 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15889571#comment-15889571
 ] 

Deshi Xiao edited comment on MESOS-7183 at 3/28/17 3:45 PM:


-close and enhance in MESOS-6184-

MESOS-6184 is not resolve this bug. it bug always get core file in sandbox.


was (Author: xds2000):
close and enhance in MESOS-6184

> Always get coredump by add a health check on docker container app
> -
>
> Key: MESOS-7183
> URL: https://issues.apache.org/jira/browse/MESOS-7183
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.1.0, 1.2.0
>Reporter: Deshi Xiao
> Attachments: stderr
>
>
> the key message is : Failed to enter the net namespace of task (pid: 
> '22392'): Pid 22392 does not exist
> see the sandbox's stderr log:
> {code}
> I0227 09:20:02.624827 22345 exec.cpp:162] Version: 1.1.0
> I0227 09:20:02.651790 22347 exec.cpp:237] Executor registered on agent 
> f2aeab4d-b224-479c-869d-121daa0c12cb-S0
> I0227 09:20:02.656651 22347 docker.cpp:811] Running docker -H 
> unix:///var/run/docker.sock run --privileged --cpu-shares 2048 --memory 
> 33554432 -e WORDPRESS_DB_HOST=192.168.1.210 -e WORDPRESS_DB_PASSWORD=root -e 
> MESOS_SANDBOX=/mnt/mesos/sandbox -e 
> MESOS_CONTAINER_NAME=mesos-f2aeab4d-b224-479c-869d-121daa0c12cb-S0.8f6de3ab-0e85-434a-a099-d16f9654a10c
>  -v /home:/data:rw -v 
> /var/lib/mesos/slaves/f2aeab4d-b224-479c-869d-121daa0c12cb-S0/frameworks/67b3106e-fe2b-4eaa-8dcc-51653d027738-0001/executors/0-wordpress4-nmg-nmgtest-55ba456bf6eb4e979610f5ec1fb23980/runs/8f6de3ab-0e85-434a-a099-d16f9654a10c:/mnt/mesos/sandbox
>  --net bridge --label=APP_ID=wordpress --label=USER=nmg 
> --label=CLUSTER=nmgtest --label=SLOT=0 --label=APP=wordpress4 -p 
> 31000:8080/tcp --name 
> mesos-f2aeab4d-b224-479c-869d-121daa0c12cb-S0.8f6de3ab-0e85-434a-a099-d16f9654a10c
>  wordpress
> WordPress not found in /var/www/html - copying now...
> Complete! WordPress has been successfully copied to /var/www/html
> MySQL Connection Error: (2002) Connection refused
> MySQL Connection Error: (2002) Connection refused
> MySQL Connection Error: (2002) Connection refused
> MySQL Connection Error: (2002) Connection refused
> MySQL Connection Error: (2002) Connection refused
> W0227 09:20:18.425110 22353 health_checker.cpp:205] Health check failed 1 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:19.535784 22347 health_checker.cpp:205] Health check failed 2 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> MySQL Connection Error: (2002) Connection refused
> W0227 09:20:20.646812 22350 health_checker.cpp:205] Health check failed 3 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:21.758222 22353 health_checker.cpp:205] Health check failed 4 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:22.773813 22349 health_checker.cpp:205] Health check failed 5 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> MySQL Connection Error: (2002) Connection refused
> W0227 09:20:23.883586 22352 health_checker.cpp:205] Health check failed 6 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:24.994628 22350 health_checker.cpp:205] Health check failed 7 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:26.106149 22352 health_checker.cpp:205] Health check failed 8 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> MySQL Connection Error: (2002) Connection refused
> W0227 09:20:27.218143 22351 health_checker.cpp:205] Health check failed 9 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:28.329988 22350 health_checker.cpp:205] Health check failed 10 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:29.440842 22348 health_checker.cpp:205] Health check failed 11 
> times consecutively: HTTP health 

[jira] [Updated] (MESOS-7183) Always get coredump by add a health check on docker container app

2017-03-28 Thread Deshi Xiao (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deshi Xiao updated MESOS-7183:
--
Description: 

the key message is : Failed to enter the net namespace of task (pid: '22392'): 
Pid 22392 does not exist



see the sandbox's stderr log:
{code}
I0227 09:20:02.624827 22345 exec.cpp:162] Version: 1.1.0
I0227 09:20:02.651790 22347 exec.cpp:237] Executor registered on agent 
f2aeab4d-b224-479c-869d-121daa0c12cb-S0
I0227 09:20:02.656651 22347 docker.cpp:811] Running docker -H 
unix:///var/run/docker.sock run --privileged --cpu-shares 2048 --memory 
33554432 -e WORDPRESS_DB_HOST=192.168.1.210 -e WORDPRESS_DB_PASSWORD=root -e 
MESOS_SANDBOX=/mnt/mesos/sandbox -e 
MESOS_CONTAINER_NAME=mesos-f2aeab4d-b224-479c-869d-121daa0c12cb-S0.8f6de3ab-0e85-434a-a099-d16f9654a10c
 -v /home:/data:rw -v 
/var/lib/mesos/slaves/f2aeab4d-b224-479c-869d-121daa0c12cb-S0/frameworks/67b3106e-fe2b-4eaa-8dcc-51653d027738-0001/executors/0-wordpress4-nmg-nmgtest-55ba456bf6eb4e979610f5ec1fb23980/runs/8f6de3ab-0e85-434a-a099-d16f9654a10c:/mnt/mesos/sandbox
 --net bridge --label=APP_ID=wordpress --label=USER=nmg --label=CLUSTER=nmgtest 
--label=SLOT=0 --label=APP=wordpress4 -p 31000:8080/tcp --name 
mesos-f2aeab4d-b224-479c-869d-121daa0c12cb-S0.8f6de3ab-0e85-434a-a099-d16f9654a10c
 wordpress
WordPress not found in /var/www/html - copying now...
Complete! WordPress has been successfully copied to /var/www/html

MySQL Connection Error: (2002) Connection refused

MySQL Connection Error: (2002) Connection refused

MySQL Connection Error: (2002) Connection refused

MySQL Connection Error: (2002) Connection refused

MySQL Connection Error: (2002) Connection refused
W0227 09:20:18.425110 22353 health_checker.cpp:205] Health check failed 1 times 
consecutively: HTTP health check failed: curl returned exited with status 7: 
curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
W0227 09:20:19.535784 22347 health_checker.cpp:205] Health check failed 2 times 
consecutively: HTTP health check failed: curl returned exited with status 7: 
curl: (7) Failed connect to 127.0.0.1:8080; Connection refused

MySQL Connection Error: (2002) Connection refused
W0227 09:20:20.646812 22350 health_checker.cpp:205] Health check failed 3 times 
consecutively: HTTP health check failed: curl returned exited with status 7: 
curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
W0227 09:20:21.758222 22353 health_checker.cpp:205] Health check failed 4 times 
consecutively: HTTP health check failed: curl returned exited with status 7: 
curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
W0227 09:20:22.773813 22349 health_checker.cpp:205] Health check failed 5 times 
consecutively: HTTP health check failed: curl returned exited with status 7: 
curl: (7) Failed connect to 127.0.0.1:8080; Connection refused

MySQL Connection Error: (2002) Connection refused
W0227 09:20:23.883586 22352 health_checker.cpp:205] Health check failed 6 times 
consecutively: HTTP health check failed: curl returned exited with status 7: 
curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
W0227 09:20:24.994628 22350 health_checker.cpp:205] Health check failed 7 times 
consecutively: HTTP health check failed: curl returned exited with status 7: 
curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
W0227 09:20:26.106149 22352 health_checker.cpp:205] Health check failed 8 times 
consecutively: HTTP health check failed: curl returned exited with status 7: 
curl: (7) Failed connect to 127.0.0.1:8080; Connection refused

MySQL Connection Error: (2002) Connection refused
W0227 09:20:27.218143 22351 health_checker.cpp:205] Health check failed 9 times 
consecutively: HTTP health check failed: curl returned exited with status 7: 
curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
W0227 09:20:28.329988 22350 health_checker.cpp:205] Health check failed 10 
times consecutively: HTTP health check failed: curl returned exited with status 
7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
W0227 09:20:29.440842 22348 health_checker.cpp:205] Health check failed 11 
times consecutively: HTTP health check failed: curl returned exited with status 
7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused

MySQL Connection Error: (2002) Connection refused
W0227 09:20:30.554321 22348 health_checker.cpp:205] Health check failed 12 
times consecutively: HTTP health check failed: curl returned exited with status 
7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
W0227 09:20:31.664559 22347 health_checker.cpp:205] Health check failed 13 
times consecutively: HTTP health check failed: curl returned exited with status 
7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused

MySQL Connection Error: (2002) Connection refused
F0227 09:20:32.666734 22601 health_checker.cpp:94] Failed to enter the net 
namespace of task 

[jira] [Updated] (MESOS-7183) Always get coredump by add a health check on docker container app

2017-03-28 Thread Deshi Xiao (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deshi Xiao updated MESOS-7183:
--
Affects Version/s: 1.2.0

> Always get coredump by add a health check on docker container app
> -
>
> Key: MESOS-7183
> URL: https://issues.apache.org/jira/browse/MESOS-7183
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.1.0, 1.2.0
>Reporter: Deshi Xiao
> Attachments: stderr
>
>
> see the sandbox's stderr log:
> {code}
> I0227 09:20:02.624827 22345 exec.cpp:162] Version: 1.1.0
> I0227 09:20:02.651790 22347 exec.cpp:237] Executor registered on agent 
> f2aeab4d-b224-479c-869d-121daa0c12cb-S0
> I0227 09:20:02.656651 22347 docker.cpp:811] Running docker -H 
> unix:///var/run/docker.sock run --privileged --cpu-shares 2048 --memory 
> 33554432 -e WORDPRESS_DB_HOST=192.168.1.210 -e WORDPRESS_DB_PASSWORD=root -e 
> MESOS_SANDBOX=/mnt/mesos/sandbox -e 
> MESOS_CONTAINER_NAME=mesos-f2aeab4d-b224-479c-869d-121daa0c12cb-S0.8f6de3ab-0e85-434a-a099-d16f9654a10c
>  -v /home:/data:rw -v 
> /var/lib/mesos/slaves/f2aeab4d-b224-479c-869d-121daa0c12cb-S0/frameworks/67b3106e-fe2b-4eaa-8dcc-51653d027738-0001/executors/0-wordpress4-nmg-nmgtest-55ba456bf6eb4e979610f5ec1fb23980/runs/8f6de3ab-0e85-434a-a099-d16f9654a10c:/mnt/mesos/sandbox
>  --net bridge --label=APP_ID=wordpress --label=USER=nmg 
> --label=CLUSTER=nmgtest --label=SLOT=0 --label=APP=wordpress4 -p 
> 31000:8080/tcp --name 
> mesos-f2aeab4d-b224-479c-869d-121daa0c12cb-S0.8f6de3ab-0e85-434a-a099-d16f9654a10c
>  wordpress
> WordPress not found in /var/www/html - copying now...
> Complete! WordPress has been successfully copied to /var/www/html
> MySQL Connection Error: (2002) Connection refused
> MySQL Connection Error: (2002) Connection refused
> MySQL Connection Error: (2002) Connection refused
> MySQL Connection Error: (2002) Connection refused
> MySQL Connection Error: (2002) Connection refused
> W0227 09:20:18.425110 22353 health_checker.cpp:205] Health check failed 1 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:19.535784 22347 health_checker.cpp:205] Health check failed 2 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> MySQL Connection Error: (2002) Connection refused
> W0227 09:20:20.646812 22350 health_checker.cpp:205] Health check failed 3 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:21.758222 22353 health_checker.cpp:205] Health check failed 4 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:22.773813 22349 health_checker.cpp:205] Health check failed 5 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> MySQL Connection Error: (2002) Connection refused
> W0227 09:20:23.883586 22352 health_checker.cpp:205] Health check failed 6 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:24.994628 22350 health_checker.cpp:205] Health check failed 7 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:26.106149 22352 health_checker.cpp:205] Health check failed 8 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> MySQL Connection Error: (2002) Connection refused
> W0227 09:20:27.218143 22351 health_checker.cpp:205] Health check failed 9 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:28.329988 22350 health_checker.cpp:205] Health check failed 10 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:29.440842 22348 health_checker.cpp:205] Health check failed 11 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> MySQL Connection Error: (2002) Connection refused
> W0227 09:20:30.554321 22348 health_checker.cpp:205] Health check failed 12 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: 

[jira] [Commented] (MESOS-6184) Health checks should use a general mechanism to enter namespaces of the task.

2017-03-27 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15944546#comment-15944546
 ] 

Deshi Xiao commented on MESOS-6184:
---

i have rebase the patch to 1.2.0 branch codebase. and testing it, it always get 
coredump file.

```
I0328 11:48:12.92218148 exec.cpp:162] Version: 1.2.0
I0328 11:48:12.92925254 exec.cpp:237] Executor registered on agent 
a29dc3a5-3e3f-4058-8ab4-dd7de2ae58d1-S4
I0328 11:48:12.93164054 docker.cpp:850] Running docker -H 
unix:///var/run/docker.sock run --cpu-shares 10 --memory 33554432 --env-file 
/tmp/gvqGyb -v 
/data/mesos/slaves/a29dc3a5-3e3f-4058-8ab4-dd7de2ae58d1-S4/frameworks/d7ef5d2b-f924-42d9-a274-c020afba6bce-/executors/0-hc-xychu-datamanmesos-2f3b47f9ffc048539c7b22baa6c32d8f/runs/458189b8-2ff4-4337-ad3a-67321e96f5cb:/mnt/mesos/sandbox
 --net bridge --label=USER_NAME=xychu --label=GROUP_NAME=groupautotest 
--label=APP_ID=hc --label=VCLUSTER=clusterautotest --label=USER=xychu 
--label=CLUSTER=datamanmesos --label=SLOT=0 --label=APP=hc -p 31000:80/tcp 
--name 
mesos-a29dc3a5-3e3f-4058-8ab4-dd7de2ae58d1-S4.458189b8-2ff4-4337-ad3a-67321e96f5cb
 nginx
I0328 11:48:16.14571453 health_checker.cpp:196] Ignoring failure as health 
check still in grace period
W0328 11:48:26.28995849 health_checker.cpp:202] Health check failed 1 times 
consecutively: HTTP health check failed: curl returned terminated with signal 
Aborted (core dumped): ABORT: 
(../../../3rdparty/libprocess/include/process/posix/subprocess.hpp:190): Failed 
to execute Subprocess::ChildHook: Failed to enter the net namespace of pid 
18596: Pid 18596 does not exist
*** Aborted at 1490672906 (unix time) try "date -d @1490672906" if you are 
using GNU date ***
PC: @ 0x7f26bfb485f7 __GI_raise
*** SIGABRT (@0x4a) received by PID 74 (TID 0x7f26ba152700) from PID 74; stack 
trace: ***
@ 0x7f26c0703100 (unknown)
@ 0x7f26bfb485f7 __GI_raise
@ 0x7f26bfb49ce8 __GI_abort
@ 0x7f26c315778e _Abort()
@ 0x7f26c31577cc _Abort()
@ 0x7f26c237a4b6 process::internal::childMain()
@ 0x7f26c2379e9c std::_Function_handler<>::_M_invoke()
@ 0x7f26c2379e53 process::internal::defaultClone()
@ 0x7f26c237b951 process::internal::cloneChild()
@ 0x7f26c237954f process::subprocess()
@ 0x7f26c15a9fb1 
mesos::internal::checks::HealthCheckerProcess::httpHealthCheck()
@ 0x7f26c15ababd 
mesos::internal::checks::HealthCheckerProcess::performSingleCheck()
@ 0x7f26c2331389 process::ProcessManager::resume()
@ 0x7f26c233a3f7 
_ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
@ 0x7f26c04a1220 (unknown)
@ 0x7f26c06fbdc5 start_thread
@ 0x7f26bfc0928d __clone
W0328 11:48:36.34005555 health_checker.cpp:202] Health check failed 2 times 
consecutively: HTTP health check failed: curl returned terminated with signal 
Aborted (core dumped): ABORT: 
(../../../3rdparty/libprocess/include/process/posix/subprocess.hpp:190): Failed 
to execute Subprocess::ChildHook: Failed to enter the net namespace of pid 
18596: Pid 18596 does not exist
*** Aborted at 1490672916 (unix time) try "date -d @1490672916" if you are 
using GNU date ***
PC: @ 0x7f26bfb485f7 __GI_raise
*** SIGABRT (@0x4b) received by PID 75 (TID 0x7f26b9951700) from PID 75; stack 
trace: ***
@ 0x7f26c0703100 (unknown)
@ 0x7f26bfb485f7 __GI_raise
@ 0x7f26bfb49ce8 __GI_abort
@ 0x7f26c315778e _Abort()
@ 0x7f26c31577cc _Abort()
@ 0x7f26c237a4b6 process::internal::childMain()
@ 0x7f26c2379e9c std::_Function_handler<>::_M_invoke()
@ 0x7f26c2379e53 process::internal::defaultClone()
@ 0x7f26c237b951 process::internal::cloneChild()
@ 0x7f26c237954f process::subprocess()
@ 0x7f26c15a9fb1 
mesos::internal::checks::HealthCheckerProcess::httpHealthCheck()
@ 0x7f26c15ababd 
mesos::internal::checks::HealthCheckerProcess::performSingleCheck()
@ 0x7f26c2331389 process::ProcessManager::resume()
@ 0x7f26c233a3f7 
_ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
@ 0x7f26c04a1220 (unknown)
@ 0x7f26c06fbdc5 start_thread
@ 0x7f26bfc0928d __clone
W0328 11:48:46.38653349 health_checker.cpp:202] Health check failed 3 times 
consecutively: HTTP health check failed: curl returned terminated with signal 
Aborted (core dumped): ABORT: 
(../../../3rdparty/libprocess/include/process/posix/subprocess.hpp:190): Failed 
to execute Subprocess::ChildHook: Failed to enter the net namespace of pid 
18596: Pid 18596 does not exist
*** Aborted at 1490672926 (unix time) try "date -d @1490672926" if you are 
using GNU date ***
PC: @ 0x7f26bfb485f7 __GI_raise
*** SIGABRT (@0x4c) received by PID 76 (TID 0x7f26ba152700) from PID 76; stack 
trace: ***
@ 

[jira] [Issue Comment Deleted] (MESOS-6184) Health checks should use a general mechanism to enter namespaces of the task.

2017-03-27 Thread Deshi Xiao (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deshi Xiao updated MESOS-6184:
--
Comment: was deleted

(was: good for me)

> Health checks should use a general mechanism to enter namespaces of the task.
> -
>
> Key: MESOS-6184
> URL: https://issues.apache.org/jira/browse/MESOS-6184
> Project: Mesos
>  Issue Type: Improvement
>Reporter: haosdent
>Assignee: haosdent
>Priority: Critical
>  Labels: health-check, mesosphere
>
> To perform health checks for tasks, we need to enter the corresponding 
> namespaces of the container. For now health check use custom clone to 
> implement this
> {code}
>   return process::defaultClone([=]() -> int {
> if (taskPid.isSome()) {
>   foreach (const string& ns, namespaces) {
> Try setns = ns::setns(taskPid.get(), ns);
> if (setns.isError()) {
>   ...
> }
>   }
> }
> return func();
>   });
> {code}
> After the childHooks patches merged, we could change the health check to use 
> childHooks to call {{setns}} and make {{process::defaultClone}} private 
> again.  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-6236) Launch subprocesses associated with specified namespaces.

2017-03-27 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15943782#comment-15943782
 ] 

Deshi Xiao commented on MESOS-6236:
---

the patch is outdate, i have update the patch, need testing it. if the testing 
is done. i will give a patch.

> Launch subprocesses associated with specified namespaces.
> -
>
> Key: MESOS-6236
> URL: https://issues.apache.org/jira/browse/MESOS-6236
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Qian Zhang
>Assignee: haosdent
>  Labels: mesosphere
>
> Currently there is no standard way in Mesos to launch a child process in a 
> different namespace (e.g. {{net}}, {{mnt}}). A user may leverage 
> {{Subprocess}} and provide its own {{clone}} callback, but this approach is 
> error-prone.
> One possible solution is to implement a {{Subprocess}}' child hook. In 
> [MESOS-5070|https://issues.apache.org/jira/browse/MESOS-5070], we have 
> introduced a child hook framework in subprocess and implemented three child 
> hooks {{CHDIR}}, {{SETSID}} and {{SUPERVISOR}}. We suggest to introduce 
> another child hook {{SETNS}} so that other components (e.g., health check) 
> can call it to enter the namespaces of a specific process.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-1806) Etcd-based master contender/detector module

2017-03-10 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15905109#comment-15905109
 ] 

Deshi Xiao commented on MESOS-1806:
---

anyone can confirm it resolved?

> Etcd-based master contender/detector module
> ---
>
> Key: MESOS-1806
> URL: https://issues.apache.org/jira/browse/MESOS-1806
> Project: Mesos
>  Issue Type: Epic
>  Components: leader election
>Reporter: Ed Ropple
>Assignee: Shuai Lin
>Priority: Minor
>
>eropple: Could you also file a new JIRA for Mesos to drop ZK 
> in favor of etcd or ReplicatedLog? Would love to get some momentum going on 
> that one.
> --
> Consider it filed. =)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-6223) Allow agents to re-register post a host reboot

2017-03-04 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15895772#comment-15895772
 ] 

Deshi Xiao commented on MESOS-6223:
---

any update for the review patch

> Allow agents to re-register post a host reboot
> --
>
> Key: MESOS-6223
> URL: https://issues.apache.org/jira/browse/MESOS-6223
> Project: Mesos
>  Issue Type: Improvement
>  Components: agent
>Reporter: Megha Sharma
>Assignee: Megha Sharma
>
> Agent does’t recover its state post a host reboot, it registers with the 
> master and gets a new SlaveID. With partition awareness, the agents are now 
> allowed to re-register after they have been marked Unreachable. The executors 
> are anyway terminated on the agent when it reboots so there is no harm in 
> letting the agent keep its SlaveID, re-register with the master and reconcile 
> the lost executors. This is a pre-requisite for supporting 
> persistent/restartable tasks in mesos (MESOS-3545).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-6517) Health checking only on 127.0.0.1 is limiting.

2017-03-02 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15892523#comment-15892523
 ] 

Deshi Xiao commented on MESOS-6517:
---

what decision for it. please do it action asap.

> Health checking only on 127.0.0.1 is limiting.
> --
>
> Key: MESOS-6517
> URL: https://issues.apache.org/jira/browse/MESOS-6517
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Alexander Rukletsov
>  Labels: health-check, mesosphere
>
> As of Mesos 1.1.0, HTTP and TCP health checks always use 127.0.0.1 as the 
> target IP. This is not configurable. As a result, tasks should listen on all 
> interfaces if they want to support HTTP and TCP health checks. However, there 
> might be some cases where tasks or containers will end up binding to a 
> specific IP address. 
> To make health checking more robust we can:
> * look at all interfaces in a given network namespace and do health check on 
> all the IP addresses;
> * allow users to specify the IP to health check;
> * deduce the target IP from task's discovery information.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-6184) Health checks should use a general mechanism to enter namespaces of the task.

2017-03-01 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15890424#comment-15890424
 ] 

Deshi Xiao commented on MESOS-6184:
---

good for me

> Health checks should use a general mechanism to enter namespaces of the task.
> -
>
> Key: MESOS-6184
> URL: https://issues.apache.org/jira/browse/MESOS-6184
> Project: Mesos
>  Issue Type: Improvement
>Reporter: haosdent
>Assignee: haosdent
>Priority: Critical
>  Labels: health-check, mesosphere
>
> To perform health checks for tasks, we need to enter the corresponding 
> namespaces of the container. For now health check use custom clone to 
> implement this
> {code}
>   return process::defaultClone([=]() -> int {
> if (taskPid.isSome()) {
>   foreach (const string& ns, namespaces) {
> Try setns = ns::setns(taskPid.get(), ns);
> if (setns.isError()) {
>   ...
> }
>   }
> }
> return func();
>   });
> {code}
> After the childHooks patches merged, we could change the health check to use 
> childHooks to call {{setns}} and make {{process::defaultClone}} private 
> again.  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-7183) Always get coredump by add a health check on docker container app

2017-02-28 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15889571#comment-15889571
 ] 

Deshi Xiao commented on MESOS-7183:
---

close and enhance in MESOS-6184

> Always get coredump by add a health check on docker container app
> -
>
> Key: MESOS-7183
> URL: https://issues.apache.org/jira/browse/MESOS-7183
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Deshi Xiao
> Attachments: stderr
>
>
> see the sandbox's stderr log:
> {code}
> I0227 09:20:02.624827 22345 exec.cpp:162] Version: 1.1.0
> I0227 09:20:02.651790 22347 exec.cpp:237] Executor registered on agent 
> f2aeab4d-b224-479c-869d-121daa0c12cb-S0
> I0227 09:20:02.656651 22347 docker.cpp:811] Running docker -H 
> unix:///var/run/docker.sock run --privileged --cpu-shares 2048 --memory 
> 33554432 -e WORDPRESS_DB_HOST=192.168.1.210 -e WORDPRESS_DB_PASSWORD=root -e 
> MESOS_SANDBOX=/mnt/mesos/sandbox -e 
> MESOS_CONTAINER_NAME=mesos-f2aeab4d-b224-479c-869d-121daa0c12cb-S0.8f6de3ab-0e85-434a-a099-d16f9654a10c
>  -v /home:/data:rw -v 
> /var/lib/mesos/slaves/f2aeab4d-b224-479c-869d-121daa0c12cb-S0/frameworks/67b3106e-fe2b-4eaa-8dcc-51653d027738-0001/executors/0-wordpress4-nmg-nmgtest-55ba456bf6eb4e979610f5ec1fb23980/runs/8f6de3ab-0e85-434a-a099-d16f9654a10c:/mnt/mesos/sandbox
>  --net bridge --label=APP_ID=wordpress --label=USER=nmg 
> --label=CLUSTER=nmgtest --label=SLOT=0 --label=APP=wordpress4 -p 
> 31000:8080/tcp --name 
> mesos-f2aeab4d-b224-479c-869d-121daa0c12cb-S0.8f6de3ab-0e85-434a-a099-d16f9654a10c
>  wordpress
> WordPress not found in /var/www/html - copying now...
> Complete! WordPress has been successfully copied to /var/www/html
> MySQL Connection Error: (2002) Connection refused
> MySQL Connection Error: (2002) Connection refused
> MySQL Connection Error: (2002) Connection refused
> MySQL Connection Error: (2002) Connection refused
> MySQL Connection Error: (2002) Connection refused
> W0227 09:20:18.425110 22353 health_checker.cpp:205] Health check failed 1 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:19.535784 22347 health_checker.cpp:205] Health check failed 2 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> MySQL Connection Error: (2002) Connection refused
> W0227 09:20:20.646812 22350 health_checker.cpp:205] Health check failed 3 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:21.758222 22353 health_checker.cpp:205] Health check failed 4 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:22.773813 22349 health_checker.cpp:205] Health check failed 5 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> MySQL Connection Error: (2002) Connection refused
> W0227 09:20:23.883586 22352 health_checker.cpp:205] Health check failed 6 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:24.994628 22350 health_checker.cpp:205] Health check failed 7 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:26.106149 22352 health_checker.cpp:205] Health check failed 8 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> MySQL Connection Error: (2002) Connection refused
> W0227 09:20:27.218143 22351 health_checker.cpp:205] Health check failed 9 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:28.329988 22350 health_checker.cpp:205] Health check failed 10 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:29.440842 22348 health_checker.cpp:205] Health check failed 11 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> MySQL Connection Error: (2002) Connection refused
> W0227 09:20:30.554321 22348 health_checker.cpp:205] Health check failed 12 
> times consecutively: HTTP health check failed: curl 

[jira] [Commented] (MESOS-7183) Always get coredump by add a health check on docker container app

2017-02-27 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886832#comment-15886832
 ] 

Deshi Xiao commented on MESOS-7183:
---

thanks for your help. it clarify.

> Always get coredump by add a health check on docker container app
> -
>
> Key: MESOS-7183
> URL: https://issues.apache.org/jira/browse/MESOS-7183
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Deshi Xiao
> Attachments: stderr
>
>
> see the sandbox's stderr log:
> {code}
> I0227 09:20:02.624827 22345 exec.cpp:162] Version: 1.1.0
> I0227 09:20:02.651790 22347 exec.cpp:237] Executor registered on agent 
> f2aeab4d-b224-479c-869d-121daa0c12cb-S0
> I0227 09:20:02.656651 22347 docker.cpp:811] Running docker -H 
> unix:///var/run/docker.sock run --privileged --cpu-shares 2048 --memory 
> 33554432 -e WORDPRESS_DB_HOST=192.168.1.210 -e WORDPRESS_DB_PASSWORD=root -e 
> MESOS_SANDBOX=/mnt/mesos/sandbox -e 
> MESOS_CONTAINER_NAME=mesos-f2aeab4d-b224-479c-869d-121daa0c12cb-S0.8f6de3ab-0e85-434a-a099-d16f9654a10c
>  -v /home:/data:rw -v 
> /var/lib/mesos/slaves/f2aeab4d-b224-479c-869d-121daa0c12cb-S0/frameworks/67b3106e-fe2b-4eaa-8dcc-51653d027738-0001/executors/0-wordpress4-nmg-nmgtest-55ba456bf6eb4e979610f5ec1fb23980/runs/8f6de3ab-0e85-434a-a099-d16f9654a10c:/mnt/mesos/sandbox
>  --net bridge --label=APP_ID=wordpress --label=USER=nmg 
> --label=CLUSTER=nmgtest --label=SLOT=0 --label=APP=wordpress4 -p 
> 31000:8080/tcp --name 
> mesos-f2aeab4d-b224-479c-869d-121daa0c12cb-S0.8f6de3ab-0e85-434a-a099-d16f9654a10c
>  wordpress
> WordPress not found in /var/www/html - copying now...
> Complete! WordPress has been successfully copied to /var/www/html
> MySQL Connection Error: (2002) Connection refused
> MySQL Connection Error: (2002) Connection refused
> MySQL Connection Error: (2002) Connection refused
> MySQL Connection Error: (2002) Connection refused
> MySQL Connection Error: (2002) Connection refused
> W0227 09:20:18.425110 22353 health_checker.cpp:205] Health check failed 1 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:19.535784 22347 health_checker.cpp:205] Health check failed 2 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> MySQL Connection Error: (2002) Connection refused
> W0227 09:20:20.646812 22350 health_checker.cpp:205] Health check failed 3 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:21.758222 22353 health_checker.cpp:205] Health check failed 4 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:22.773813 22349 health_checker.cpp:205] Health check failed 5 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> MySQL Connection Error: (2002) Connection refused
> W0227 09:20:23.883586 22352 health_checker.cpp:205] Health check failed 6 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:24.994628 22350 health_checker.cpp:205] Health check failed 7 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:26.106149 22352 health_checker.cpp:205] Health check failed 8 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> MySQL Connection Error: (2002) Connection refused
> W0227 09:20:27.218143 22351 health_checker.cpp:205] Health check failed 9 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:28.329988 22350 health_checker.cpp:205] Health check failed 10 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:29.440842 22348 health_checker.cpp:205] Health check failed 11 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> MySQL Connection Error: (2002) Connection refused
> W0227 09:20:30.554321 22348 health_checker.cpp:205] Health check failed 12 
> times consecutively: HTTP health check failed: curl 

[jira] [Commented] (MESOS-7151) Some stdout and stderr was disappeared. And some stdout can not fully display in webui.

2017-02-27 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886830#comment-15886830
 ] 

Deshi Xiao commented on MESOS-7151:
---

[~mark1982] with my use case, i can't found any issue for this pailer ui. so i 
mostly want to check your fw's bug. if you can provide a mini fw to reproduce 
the bug, we can easily debug it asap.

> Some stdout and stderr was disappeared. And some stdout can not fully display 
> in webui.
> ---
>
> Key: MESOS-7151
> URL: https://issues.apache.org/jira/browse/MESOS-7151
> Project: Mesos
>  Issue Type: Bug
>  Components: agent
>Affects Versions: 1.0.1
>Reporter: mark1982
> Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png, 
> screenshot-4.png, screenshot-5.png, screenshot-6.png, taskid=1292.png
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (MESOS-7183) Always get coredump by add a health check on docker container app

2017-02-26 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15885290#comment-15885290
 ] 

Deshi Xiao edited comment on MESOS-7183 at 2/27/17 7:42 AM:


Testing in 1.2.0, it seems fix it. but i can't confirm which bug cause that.


was (Author: xds2000):
Testing in 1.2.0, it seems fix it. but i can't determin which bug cause that.

> Always get coredump by add a health check on docker container app
> -
>
> Key: MESOS-7183
> URL: https://issues.apache.org/jira/browse/MESOS-7183
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Deshi Xiao
> Attachments: stderr
>
>
> see the sandbox's stderr log:
> I0227 09:20:02.624827 22345 exec.cpp:162] Version: 1.1.0
> I0227 09:20:02.651790 22347 exec.cpp:237] Executor registered on agent 
> f2aeab4d-b224-479c-869d-121daa0c12cb-S0
> I0227 09:20:02.656651 22347 docker.cpp:811] Running docker -H 
> unix:///var/run/docker.sock run --privileged --cpu-shares 2048 --memory 
> 33554432 -e WORDPRESS_DB_HOST=192.168.1.210 -e WORDPRESS_DB_PASSWORD=root -e 
> MESOS_SANDBOX=/mnt/mesos/sandbox -e 
> MESOS_CONTAINER_NAME=mesos-f2aeab4d-b224-479c-869d-121daa0c12cb-S0.8f6de3ab-0e85-434a-a099-d16f9654a10c
>  -v /home:/data:rw -v 
> /var/lib/mesos/slaves/f2aeab4d-b224-479c-869d-121daa0c12cb-S0/frameworks/67b3106e-fe2b-4eaa-8dcc-51653d027738-0001/executors/0-wordpress4-nmg-nmgtest-55ba456bf6eb4e979610f5ec1fb23980/runs/8f6de3ab-0e85-434a-a099-d16f9654a10c:/mnt/mesos/sandbox
>  --net bridge --label=APP_ID=wordpress --label=USER=nmg 
> --label=CLUSTER=nmgtest --label=SLOT=0 --label=APP=wordpress4 -p 
> 31000:8080/tcp --name 
> mesos-f2aeab4d-b224-479c-869d-121daa0c12cb-S0.8f6de3ab-0e85-434a-a099-d16f9654a10c
>  wordpress
> WordPress not found in /var/www/html - copying now...
> Complete! WordPress has been successfully copied to /var/www/html
> MySQL Connection Error: (2002) Connection refused
> MySQL Connection Error: (2002) Connection refused
> MySQL Connection Error: (2002) Connection refused
> MySQL Connection Error: (2002) Connection refused
> MySQL Connection Error: (2002) Connection refused
> W0227 09:20:18.425110 22353 health_checker.cpp:205] Health check failed 1 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:19.535784 22347 health_checker.cpp:205] Health check failed 2 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> MySQL Connection Error: (2002) Connection refused
> W0227 09:20:20.646812 22350 health_checker.cpp:205] Health check failed 3 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:21.758222 22353 health_checker.cpp:205] Health check failed 4 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:22.773813 22349 health_checker.cpp:205] Health check failed 5 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> MySQL Connection Error: (2002) Connection refused
> W0227 09:20:23.883586 22352 health_checker.cpp:205] Health check failed 6 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:24.994628 22350 health_checker.cpp:205] Health check failed 7 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:26.106149 22352 health_checker.cpp:205] Health check failed 8 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> MySQL Connection Error: (2002) Connection refused
> W0227 09:20:27.218143 22351 health_checker.cpp:205] Health check failed 9 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:28.329988 22350 health_checker.cpp:205] Health check failed 10 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:29.440842 22348 health_checker.cpp:205] Health check failed 11 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection 

[jira] [Commented] (MESOS-7183) Always get coredump by add a health check on docker container app

2017-02-26 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15885290#comment-15885290
 ] 

Deshi Xiao commented on MESOS-7183:
---

Testing in 1.2.0, it seems fix it. but i can't determin which bug cause that.

> Always get coredump by add a health check on docker container app
> -
>
> Key: MESOS-7183
> URL: https://issues.apache.org/jira/browse/MESOS-7183
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Deshi Xiao
> Attachments: stderr
>
>
> see the sandbox's stderr log:
> I0227 09:20:02.624827 22345 exec.cpp:162] Version: 1.1.0
> I0227 09:20:02.651790 22347 exec.cpp:237] Executor registered on agent 
> f2aeab4d-b224-479c-869d-121daa0c12cb-S0
> I0227 09:20:02.656651 22347 docker.cpp:811] Running docker -H 
> unix:///var/run/docker.sock run --privileged --cpu-shares 2048 --memory 
> 33554432 -e WORDPRESS_DB_HOST=192.168.1.210 -e WORDPRESS_DB_PASSWORD=root -e 
> MESOS_SANDBOX=/mnt/mesos/sandbox -e 
> MESOS_CONTAINER_NAME=mesos-f2aeab4d-b224-479c-869d-121daa0c12cb-S0.8f6de3ab-0e85-434a-a099-d16f9654a10c
>  -v /home:/data:rw -v 
> /var/lib/mesos/slaves/f2aeab4d-b224-479c-869d-121daa0c12cb-S0/frameworks/67b3106e-fe2b-4eaa-8dcc-51653d027738-0001/executors/0-wordpress4-nmg-nmgtest-55ba456bf6eb4e979610f5ec1fb23980/runs/8f6de3ab-0e85-434a-a099-d16f9654a10c:/mnt/mesos/sandbox
>  --net bridge --label=APP_ID=wordpress --label=USER=nmg 
> --label=CLUSTER=nmgtest --label=SLOT=0 --label=APP=wordpress4 -p 
> 31000:8080/tcp --name 
> mesos-f2aeab4d-b224-479c-869d-121daa0c12cb-S0.8f6de3ab-0e85-434a-a099-d16f9654a10c
>  wordpress
> WordPress not found in /var/www/html - copying now...
> Complete! WordPress has been successfully copied to /var/www/html
> MySQL Connection Error: (2002) Connection refused
> MySQL Connection Error: (2002) Connection refused
> MySQL Connection Error: (2002) Connection refused
> MySQL Connection Error: (2002) Connection refused
> MySQL Connection Error: (2002) Connection refused
> W0227 09:20:18.425110 22353 health_checker.cpp:205] Health check failed 1 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:19.535784 22347 health_checker.cpp:205] Health check failed 2 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> MySQL Connection Error: (2002) Connection refused
> W0227 09:20:20.646812 22350 health_checker.cpp:205] Health check failed 3 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:21.758222 22353 health_checker.cpp:205] Health check failed 4 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:22.773813 22349 health_checker.cpp:205] Health check failed 5 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> MySQL Connection Error: (2002) Connection refused
> W0227 09:20:23.883586 22352 health_checker.cpp:205] Health check failed 6 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:24.994628 22350 health_checker.cpp:205] Health check failed 7 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:26.106149 22352 health_checker.cpp:205] Health check failed 8 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> MySQL Connection Error: (2002) Connection refused
> W0227 09:20:27.218143 22351 health_checker.cpp:205] Health check failed 9 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:28.329988 22350 health_checker.cpp:205] Health check failed 10 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:29.440842 22348 health_checker.cpp:205] Health check failed 11 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> MySQL Connection Error: (2002) Connection refused
> W0227 09:20:30.554321 22348 health_checker.cpp:205] Health check failed 12 
> times 

[jira] [Updated] (MESOS-7183) Always get coredump by add a health check on docker container app

2017-02-26 Thread Deshi Xiao (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deshi Xiao updated MESOS-7183:
--
Attachment: stderr

> Always get coredump by add a health check on docker container app
> -
>
> Key: MESOS-7183
> URL: https://issues.apache.org/jira/browse/MESOS-7183
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Deshi Xiao
> Attachments: stderr
>
>
> see the sandbox's stderr log:
> I0227 09:20:02.624827 22345 exec.cpp:162] Version: 1.1.0
> I0227 09:20:02.651790 22347 exec.cpp:237] Executor registered on agent 
> f2aeab4d-b224-479c-869d-121daa0c12cb-S0
> I0227 09:20:02.656651 22347 docker.cpp:811] Running docker -H 
> unix:///var/run/docker.sock run --privileged --cpu-shares 2048 --memory 
> 33554432 -e WORDPRESS_DB_HOST=192.168.1.210 -e WORDPRESS_DB_PASSWORD=root -e 
> MESOS_SANDBOX=/mnt/mesos/sandbox -e 
> MESOS_CONTAINER_NAME=mesos-f2aeab4d-b224-479c-869d-121daa0c12cb-S0.8f6de3ab-0e85-434a-a099-d16f9654a10c
>  -v /home:/data:rw -v 
> /var/lib/mesos/slaves/f2aeab4d-b224-479c-869d-121daa0c12cb-S0/frameworks/67b3106e-fe2b-4eaa-8dcc-51653d027738-0001/executors/0-wordpress4-nmg-nmgtest-55ba456bf6eb4e979610f5ec1fb23980/runs/8f6de3ab-0e85-434a-a099-d16f9654a10c:/mnt/mesos/sandbox
>  --net bridge --label=APP_ID=wordpress --label=USER=nmg 
> --label=CLUSTER=nmgtest --label=SLOT=0 --label=APP=wordpress4 -p 
> 31000:8080/tcp --name 
> mesos-f2aeab4d-b224-479c-869d-121daa0c12cb-S0.8f6de3ab-0e85-434a-a099-d16f9654a10c
>  wordpress
> WordPress not found in /var/www/html - copying now...
> Complete! WordPress has been successfully copied to /var/www/html
> MySQL Connection Error: (2002) Connection refused
> MySQL Connection Error: (2002) Connection refused
> MySQL Connection Error: (2002) Connection refused
> MySQL Connection Error: (2002) Connection refused
> MySQL Connection Error: (2002) Connection refused
> W0227 09:20:18.425110 22353 health_checker.cpp:205] Health check failed 1 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:19.535784 22347 health_checker.cpp:205] Health check failed 2 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> MySQL Connection Error: (2002) Connection refused
> W0227 09:20:20.646812 22350 health_checker.cpp:205] Health check failed 3 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:21.758222 22353 health_checker.cpp:205] Health check failed 4 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:22.773813 22349 health_checker.cpp:205] Health check failed 5 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> MySQL Connection Error: (2002) Connection refused
> W0227 09:20:23.883586 22352 health_checker.cpp:205] Health check failed 6 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:24.994628 22350 health_checker.cpp:205] Health check failed 7 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:26.106149 22352 health_checker.cpp:205] Health check failed 8 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> MySQL Connection Error: (2002) Connection refused
> W0227 09:20:27.218143 22351 health_checker.cpp:205] Health check failed 9 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:28.329988 22350 health_checker.cpp:205] Health check failed 10 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:29.440842 22348 health_checker.cpp:205] Health check failed 11 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> MySQL Connection Error: (2002) Connection refused
> W0227 09:20:30.554321 22348 health_checker.cpp:205] Health check failed 12 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect 

[jira] [Created] (MESOS-7183) Always get coredump by add a health check on docker container app

2017-02-26 Thread Deshi Xiao (JIRA)
Deshi Xiao created MESOS-7183:
-

 Summary: Always get coredump by add a health check on docker 
container app
 Key: MESOS-7183
 URL: https://issues.apache.org/jira/browse/MESOS-7183
 Project: Mesos
  Issue Type: Bug
Affects Versions: 1.1.0
Reporter: Deshi Xiao


see the sandbox's stderr log:

I0227 09:20:02.624827 22345 exec.cpp:162] Version: 1.1.0
I0227 09:20:02.651790 22347 exec.cpp:237] Executor registered on agent 
f2aeab4d-b224-479c-869d-121daa0c12cb-S0
I0227 09:20:02.656651 22347 docker.cpp:811] Running docker -H 
unix:///var/run/docker.sock run --privileged --cpu-shares 2048 --memory 
33554432 -e WORDPRESS_DB_HOST=192.168.1.210 -e WORDPRESS_DB_PASSWORD=root -e 
MESOS_SANDBOX=/mnt/mesos/sandbox -e 
MESOS_CONTAINER_NAME=mesos-f2aeab4d-b224-479c-869d-121daa0c12cb-S0.8f6de3ab-0e85-434a-a099-d16f9654a10c
 -v /home:/data:rw -v 
/var/lib/mesos/slaves/f2aeab4d-b224-479c-869d-121daa0c12cb-S0/frameworks/67b3106e-fe2b-4eaa-8dcc-51653d027738-0001/executors/0-wordpress4-nmg-nmgtest-55ba456bf6eb4e979610f5ec1fb23980/runs/8f6de3ab-0e85-434a-a099-d16f9654a10c:/mnt/mesos/sandbox
 --net bridge --label=APP_ID=wordpress --label=USER=nmg --label=CLUSTER=nmgtest 
--label=SLOT=0 --label=APP=wordpress4 -p 31000:8080/tcp --name 
mesos-f2aeab4d-b224-479c-869d-121daa0c12cb-S0.8f6de3ab-0e85-434a-a099-d16f9654a10c
 wordpress
WordPress not found in /var/www/html - copying now...
Complete! WordPress has been successfully copied to /var/www/html

MySQL Connection Error: (2002) Connection refused

MySQL Connection Error: (2002) Connection refused

MySQL Connection Error: (2002) Connection refused

MySQL Connection Error: (2002) Connection refused

MySQL Connection Error: (2002) Connection refused
W0227 09:20:18.425110 22353 health_checker.cpp:205] Health check failed 1 times 
consecutively: HTTP health check failed: curl returned exited with status 7: 
curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
W0227 09:20:19.535784 22347 health_checker.cpp:205] Health check failed 2 times 
consecutively: HTTP health check failed: curl returned exited with status 7: 
curl: (7) Failed connect to 127.0.0.1:8080; Connection refused

MySQL Connection Error: (2002) Connection refused
W0227 09:20:20.646812 22350 health_checker.cpp:205] Health check failed 3 times 
consecutively: HTTP health check failed: curl returned exited with status 7: 
curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
W0227 09:20:21.758222 22353 health_checker.cpp:205] Health check failed 4 times 
consecutively: HTTP health check failed: curl returned exited with status 7: 
curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
W0227 09:20:22.773813 22349 health_checker.cpp:205] Health check failed 5 times 
consecutively: HTTP health check failed: curl returned exited with status 7: 
curl: (7) Failed connect to 127.0.0.1:8080; Connection refused

MySQL Connection Error: (2002) Connection refused
W0227 09:20:23.883586 22352 health_checker.cpp:205] Health check failed 6 times 
consecutively: HTTP health check failed: curl returned exited with status 7: 
curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
W0227 09:20:24.994628 22350 health_checker.cpp:205] Health check failed 7 times 
consecutively: HTTP health check failed: curl returned exited with status 7: 
curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
W0227 09:20:26.106149 22352 health_checker.cpp:205] Health check failed 8 times 
consecutively: HTTP health check failed: curl returned exited with status 7: 
curl: (7) Failed connect to 127.0.0.1:8080; Connection refused

MySQL Connection Error: (2002) Connection refused
W0227 09:20:27.218143 22351 health_checker.cpp:205] Health check failed 9 times 
consecutively: HTTP health check failed: curl returned exited with status 7: 
curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
W0227 09:20:28.329988 22350 health_checker.cpp:205] Health check failed 10 
times consecutively: HTTP health check failed: curl returned exited with status 
7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
W0227 09:20:29.440842 22348 health_checker.cpp:205] Health check failed 11 
times consecutively: HTTP health check failed: curl returned exited with status 
7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused

MySQL Connection Error: (2002) Connection refused
W0227 09:20:30.554321 22348 health_checker.cpp:205] Health check failed 12 
times consecutively: HTTP health check failed: curl returned exited with status 
7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
W0227 09:20:31.664559 22347 health_checker.cpp:205] Health check failed 13 
times consecutively: HTTP health check failed: curl returned exited with status 
7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused

MySQL Connection Error: (2002) Connection refused
F0227 09:20:32.666734 22601 

[jira] [Commented] (MESOS-1806) Etcd-based master contender/detector module

2017-02-25 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15884262#comment-15884262
 ] 

Deshi Xiao commented on MESOS-1806:
---

ping  [~karya]  [~haosd...@gmail.com], [~lins05] 

> Etcd-based master contender/detector module
> ---
>
> Key: MESOS-1806
> URL: https://issues.apache.org/jira/browse/MESOS-1806
> Project: Mesos
>  Issue Type: Epic
>  Components: leader election
>Reporter: Ed Ropple
>Assignee: Shuai Lin
>Priority: Minor
>
>eropple: Could you also file a new JIRA for Mesos to drop ZK 
> in favor of etcd or ReplicatedLog? Would love to get some momentum going on 
> that one.
> --
> Consider it filed. =)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


  1   2   3   >