[jira] [Commented] (MESOS-9174) Unexpected containers transition from RUNNING to DESTROYING during recovery

2018-08-23 Thread Stephan Erb (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-9174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16589939#comment-16589939 ] Stephan Erb commented on MESOS-9174: I have run a few more experiments: *Broken setup*: Containers

[jira] [Commented] (MESOS-9174) Unexpected containers transition from RUNNING to DESTROYING during recovery

2018-08-22 Thread Stephan Erb (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-9174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1651#comment-1651 ] Stephan Erb commented on MESOS-9174: [~jieyu], we have found something interesting related to

[jira] [Created] (MESOS-9174) Unexpected containers transition from RUNNING to DESTROYING during recovery

2018-08-21 Thread Stephan Erb (JIRA)
Stephan Erb created MESOS-9174: -- Summary: Unexpected containers transition from RUNNING to DESTROYING during recovery Key: MESOS-9174 URL: https://issues.apache.org/jira/browse/MESOS-9174 Project: Mesos

[jira] [Commented] (MESOS-8418) mesos-agent high cpu usage because of numerous /proc/mounts reads

2018-08-06 Thread Stephan Erb (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-8418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16570101#comment-16570101 ] Stephan Erb commented on MESOS-8418: The first graph gives a rough impression of how many tasks per

[jira] [Commented] (MESOS-8418) mesos-agent high cpu usage because of numerous /proc/mounts reads

2018-07-17 Thread Stephan Erb (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-8418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16546413#comment-16546413 ] Stephan Erb commented on MESOS-8418: Thanks a lot for the super quick resolution [~bmahler]! We are

[jira] [Comment Edited] (MESOS-8418) mesos-agent high cpu usage because of numerous /proc/mounts reads

2018-07-16 Thread Stephan Erb (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-8418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16543743#comment-16543743 ] Stephan Erb edited comment on MESOS-8418 at 7/16/18 8:29 AM: - I have attached

[jira] [Comment Edited] (MESOS-8418) mesos-agent high cpu usage because of numerous /proc/mounts reads

2018-07-13 Thread Stephan Erb (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-8418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16543743#comment-16543743 ] Stephan Erb edited comment on MESOS-8418 at 7/13/18 9:32 PM: - I have attached

[jira] [Commented] (MESOS-8418) mesos-agent high cpu usage because of numerous /proc/mounts reads

2018-07-13 Thread Stephan Erb (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-8418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16543743#comment-16543743 ] Stephan Erb commented on MESOS-8418: I have attached a profile [^mesos-agent.stacks.gz] gathered on

[jira] [Commented] (MESOS-8418) mesos-agent high cpu usage because of numerous /proc/mounts reads

2018-07-10 Thread Stephan Erb (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-8418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16538309#comment-16538309 ] Stephan Erb commented on MESOS-8418: As a workaround bumping the following options seems to help

[jira] [Commented] (MESOS-7069) The linux filesystem isolator should set mode and ownership for host volumes.

2017-02-21 Thread Stephan Erb (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-7069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15877719#comment-15877719 ] Stephan Erb commented on MESOS-7069: Relevant patch: https://reviews.apache.org/r/56889/ > The linux

[jira] [Commented] (MESOS-6563) Shared Filesystem Isolator does not clean up mounts

2017-02-21 Thread Stephan Erb (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-6563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15877716#comment-15877716 ] Stephan Erb commented on MESOS-6563: Relevant patch: https://reviews.apache.org/r/56889/ > Shared

[jira] [Commented] (MESOS-7057) Consider using the relink functionality of libprocess in the executor driver.

2017-02-21 Thread Stephan Erb (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-7057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15876592#comment-15876592 ] Stephan Erb commented on MESOS-7057: Thanks for fixing this! :-) > Consider using the relink

[jira] [Commented] (MESOS-6648) MesosContainerizer launch helper should take ContainerLaunchInfo.

2017-01-25 Thread Stephan Erb (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-6648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15838594#comment-15838594 ] Stephan Erb commented on MESOS-6648: I fear the sudden interface change makes upgrading difficult.

[jira] [Commented] (MESOS-6281) Document how executors can obtain the IP address of the container

2017-01-04 Thread Stephan Erb (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-6281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15799560#comment-15799560 ] Stephan Erb commented on MESOS-6281: TLDR: Thinking about it, I kind of agree that most of the stuff I

[jira] [Commented] (MESOS-4641) Support Container Network Interface (CNI).

2016-12-27 Thread Stephan Erb (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-4641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15780580#comment-15780580 ] Stephan Erb commented on MESOS-4641: CNI supports multiple IP addresses. From an executor standpoint

[jira] [Commented] (MESOS-4641) Support Container Network Interface (CNI).

2016-12-27 Thread Stephan Erb (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-4641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15780201#comment-15780201 ] Stephan Erb commented on MESOS-4641: I would currently see MESOS-6281 as a blocker as well. It kind of

[jira] [Commented] (MESOS-6440) "Catch up" the webui to features that have been added.

2016-10-24 Thread Stephan Erb (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15601261#comment-15601261 ] Stephan Erb commented on MESOS-6440: Please consider the following task as well MESOS-6456 (revocable

[jira] [Created] (MESOS-6456) Display oversubscribable and oversubscribed resources on the Web UI

2016-10-24 Thread Stephan Erb (JIRA)
Stephan Erb created MESOS-6456: -- Summary: Display oversubscribable and oversubscribed resources on the Web UI Key: MESOS-6456 URL: https://issues.apache.org/jira/browse/MESOS-6456 Project: Mesos

[jira] [Updated] (MESOS-6456) Display oversubscribable and oversubscribed resources on the Web UI

2016-10-24 Thread Stephan Erb (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-6456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Erb updated MESOS-6456: --- Attachment: Screen Shot 2016-10-24 at 09.43.33.png Screen Shot 2016-10-24 at

[jira] [Commented] (MESOS-6281) Document how executors can obtain the IP address of the container

2016-10-11 Thread Stephan Erb (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-6281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15565487#comment-15565487 ] Stephan Erb commented on MESOS-6281: >From the perspective of an executor author, I would prefer if

[jira] [Comment Edited] (MESOS-5029) Add labels to ExecutorInfo

2016-09-09 Thread Stephan Erb (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-5029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15476927#comment-15476927 ] Stephan Erb edited comment on MESOS-5029 at 9/9/16 12:14 PM: - I have filed

[jira] [Created] (MESOS-6146) Executor labels missing in serveral stats endpoints

2016-09-09 Thread Stephan Erb (JIRA)
Stephan Erb created MESOS-6146: -- Summary: Executor labels missing in serveral stats endpoints Key: MESOS-6146 URL: https://issues.apache.org/jira/browse/MESOS-6146 Project: Mesos Issue Type:

[jira] [Commented] (MESOS-5029) Add labels to ExecutorInfo

2016-09-09 Thread Stephan Erb (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-5029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15476927#comment-15476927 ] Stephan Erb commented on MESOS-5029: I have filed MESOS-5029 > Add labels to ExecutorInfo >

[jira] [Updated] (MESOS-6145) Isolator namespaces/pid is leaking mounts

2016-09-09 Thread Stephan Erb (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-6145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Erb updated MESOS-6145: --- Description: As the operator of a Mesos cluster, I would like every container/executor to run in a

[jira] [Commented] (MESOS-6145) Isolator namespaces/pid is leaking mounts

2016-09-09 Thread Stephan Erb (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-6145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15476879#comment-15476879 ] Stephan Erb commented on MESOS-6145: In slack, [~jieyu] wrote: "As far as I know, no one is using pid

[jira] [Created] (MESOS-6145) Isolator namespaces/pid is leaking mounts

2016-09-09 Thread Stephan Erb (JIRA)
Stephan Erb created MESOS-6145: -- Summary: Isolator namespaces/pid is leaking mounts Key: MESOS-6145 URL: https://issues.apache.org/jira/browse/MESOS-6145 Project: Mesos Issue Type: Bug

[jira] [Commented] (MESOS-313) Report executor terminations to framework schedulers.

2016-08-22 Thread Stephan Erb (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15431688#comment-15431688 ] Stephan Erb commented on MESOS-313: --- Now that this patch has landed, even a clean shutdown of an executor

[jira] [Commented] (MESOS-4641) Support Container Network Interface (CNI).

2016-06-18 Thread Stephan Erb (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-4641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15337751#comment-15337751 ] Stephan Erb commented on MESOS-4641: Mesos has a bunch of network level statistics ([as documented

[jira] [Updated] (MESOS-5332) TASK_LOST on slave restart potentially due to executor race condition

2016-05-11 Thread Stephan Erb (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Erb updated MESOS-5332: --- Description: When restarting the Mesos agent binary, tasks can end up as LOST. We lose from 20% to

[jira] [Commented] (MESOS-5332) TASK_LOST on slave restart potentially due to executor race condition

2016-05-11 Thread Stephan Erb (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15279791#comment-15279791 ] Stephan Erb commented on MESOS-5332: I was able to assemble a reproducing example (using Aurora master

[jira] [Updated] (MESOS-5332) TASK_LOST on slave restart potentially due to executor race condition

2016-05-07 Thread Stephan Erb (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Erb updated MESOS-5332: --- Attachment: executor-logs.tar.gz All executor logs (surviving and failed ones) > TASK_LOST on slave

[jira] [Commented] (MESOS-5332) TASK_LOST on slave restart potentially due to executor race condition

2016-05-07 Thread Stephan Erb (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15275314#comment-15275314 ] Stephan Erb commented on MESOS-5332: The observation that it takes 5 seconds for a faulty executor to

[jira] [Commented] (MESOS-5332) TASK_LOST on slave restart potentially due to executor race condition

2016-05-06 Thread Stephan Erb (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15273776#comment-15273776 ] Stephan Erb commented on MESOS-5332: All 7 killed executors have the same offending log messages. I

[jira] [Updated] (MESOS-5332) TASK_LOST on slave restart potentially due to executor race condition

2016-05-06 Thread Stephan Erb (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Erb updated MESOS-5332: --- Attachment: executor-stderrV2.log > TASK_LOST on slave restart potentially due to executor race

[jira] [Updated] (MESOS-5332) TASK_LOST on slave restart potentially due to executor race condition

2016-05-05 Thread Stephan Erb (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Erb updated MESOS-5332: --- Attachment: executor-stderr.log > TASK_LOST on slave restart potentially due to executor race

[jira] [Updated] (MESOS-5332) TASK_LOST on slave restart potentially due to executor race condition

2016-05-05 Thread Stephan Erb (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Erb updated MESOS-5332: --- Attachment: (was: executor.stderr) > TASK_LOST on slave restart potentially due to executor race

[jira] [Commented] (MESOS-5332) TASK_LOST on slave restart potentially due to executor race condition

2016-05-05 Thread Stephan Erb (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15273171#comment-15273171 ] Stephan Erb commented on MESOS-5332: {code} $ ls

[jira] [Updated] (MESOS-5332) TASK_LOST on slave restart potentially due to executor race condition

2016-05-05 Thread Stephan Erb (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Erb updated MESOS-5332: --- Attachment: mesos-slave.log executor.stderr Here are the V1 logs of executor and

[jira] [Updated] (MESOS-5332) TASK_LOST on slave restart potentially due to executor race condition

2016-05-05 Thread Stephan Erb (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Erb updated MESOS-5332: --- Description: When restarting the Mesos agent binary, tasks can end up as LOST. We lose from 20% to

[jira] [Commented] (MESOS-5332) TASK_LOST on slave restart potentially due to executor race condition

2016-05-05 Thread Stephan Erb (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15272937#comment-15272937 ] Stephan Erb commented on MESOS-5332: [~vinodkone] we have talked about this issue before

[jira] [Commented] (MESOS-3870) Prevent out-of-order libprocess message delivery

2016-05-05 Thread Stephan Erb (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15272922#comment-15272922 ] Stephan Erb commented on MESOS-3870: Could MESOS-5332 be due to the out-of-order deliver mentioned

[jira] [Created] (MESOS-5332) TASK_LOST on slave restart potentially due to executor race condition

2016-05-05 Thread Stephan Erb (JIRA)
Stephan Erb created MESOS-5332: -- Summary: TASK_LOST on slave restart potentially due to executor race condition Key: MESOS-5332 URL: https://issues.apache.org/jira/browse/MESOS-5332 Project: Mesos

[jira] [Issue Comment Deleted] (MESOS-3241) Update FrameworkInfo.user on framework reregistration

2016-04-26 Thread Stephan Erb (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-3241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Erb updated MESOS-3241: --- Comment: was deleted (was: This ticket seems to be a duplicate of

[jira] [Commented] (MESOS-3241) Update FrameworkInfo.user on framework reregistration

2016-04-26 Thread Stephan Erb (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-3241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259026#comment-15259026 ] Stephan Erb commented on MESOS-3241: This ticket seems to be a duplicate of

[jira] [Commented] (MESOS-5187) filesystem/linux isolator does not set the permissions of the host_path

2016-04-12 Thread Stephan Erb (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-5187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15236895#comment-15236895 ] Stephan Erb commented on MESOS-5187: /cc [~jieyu] > filesystem/linux isolator does not set the

[jira] [Commented] (MESOS-3363) custom executor's child process intermittently leaks to be a child of slave

2015-09-06 Thread Stephan Erb (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-3363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14732337#comment-14732337 ] Stephan Erb commented on MESOS-3363: Have you tried using the `namespace/pid` isolator? I'd guess it

[jira] [Commented] (MESOS-3374) Improve High Availability documentation

2015-09-06 Thread Stephan Erb (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-3374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14732365#comment-14732365 ] Stephan Erb commented on MESOS-3374: Additional ideas: * Co-location of frameworks with Mesos masters

[jira] [Created] (MESOS-3277) Implement basic security isolators such as linux/apparmor or linux/seccomp

2015-08-17 Thread Stephan Erb (JIRA)
Stephan Erb created MESOS-3277: -- Summary: Implement basic security isolators such as linux/apparmor or linux/seccomp Key: MESOS-3277 URL: https://issues.apache.org/jira/browse/MESOS-3277 Project: Mesos

[jira] [Commented] (MESOS-2940) Reconciliation is expensive for large numbers of tasks.

2015-07-14 Thread Stephan Erb (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-2940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627102#comment-14627102 ] Stephan Erb commented on MESOS-2940: The code seems to re-initialize the generator for

[jira] [Commented] (MESOS-2940) Reconciliation is expensive for large numbers of tasks.

2015-07-14 Thread Stephan Erb (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-2940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627115#comment-14627115 ] Stephan Erb commented on MESOS-2940: Is using a thread local generator sufficient to

[jira] [Commented] (MESOS-2902) Enable Mesos to use arbitrary script / module to figure out IP, HOSTNAME

2015-07-14 Thread Stephan Erb (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627121#comment-14627121 ] Stephan Erb commented on MESOS-2902: Why don't you just change your init system to

[jira] [Created] (MESOS-2999) Implement a linux/iptables isolator

2015-07-07 Thread Stephan Erb (JIRA)
Stephan Erb created MESOS-2999: -- Summary: Implement a linux/iptables isolator Key: MESOS-2999 URL: https://issues.apache.org/jira/browse/MESOS-2999 Project: Mesos Issue Type: Story

[jira] [Commented] (MESOS-2794) Implement filesystem isolators

2015-06-16 Thread Stephan Erb (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-2794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588635#comment-14588635 ] Stephan Erb commented on MESOS-2794: Ian, is there a document describing the