[jira] [Comment Edited] (MESOS-7813) when lxc run after a period of time, the file(/proc/pid/cgroup) is modified, devices,blkio,memory,cpuacct is changed. why?

2017-07-20 Thread y123456yz (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094174#comment-16094174
 ] 

y123456yz edited comment on MESOS-7813 at 7/21/17 2:24 AM:
---

I have resolve the problem by "delegate=true"

thanks


was (Author: y123456yz):
[~jvanremoortere]
[~hartem]
[~jvanremoortere]
where the flowing config should be add to?
delegate=true

 cat /usr/lib/systemd/system/mesos-slave.service
[Unit]
Description=Mesos Slave
After=network.target
Wants=network.target
[Service]
delegate=true // add here??


only add "delegate=true" to mesos-slave.service's [Service]? 


whether need to add "KillMode=control-group" to  mesos-slave.service's 
[Service]? 

thanks again!


> when lxc run after a period of time, the file(/proc/pid/cgroup) is modified, 
> devices,blkio,memory,cpuacct is changed. why?
> --
>
> Key: MESOS-7813
> URL: https://issues.apache.org/jira/browse/MESOS-7813
> Project: Mesos
>  Issue Type: Bug
>  Components: agent, cgroups, executor, framework
> Environment: 1 SMP Wed Apr 12 15:04:24 UTC 2017 x86_64 x86_64 x86_64 
> GNU/Linux
>Reporter: y123456yz
>
> when lxc run after a period of time, the file(/proc/pid/cgroup) is modified, 
> devices,blkio,memory,cpuacct is changed. why?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7744) Mesos Agent Sends TASK_KILL status update to Master, and still launches task

2017-07-20 Thread Sargun Dhillon (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095569#comment-16095569
 ] 

Sargun Dhillon commented on MESOS-7744:
---

[~neilc]

The task is still running. The agent, and master think the task is killed. The 
framework receives TASK_KILLED. The framework "knows" due to out-of-band 
mechanisms the task is still alive (We have our own mechanism outside Mesos to 
do reconciliation), and it resends the kill, but the kill never gets to the 
executor. The Executor sends TASK_RUNNING status updates to the agent, but 
these never make it to the master, nor the framework.

It occurs if the executor is already running, and the task is killed nearly 
immediately after it's being started. Specifically, if when the task is on the 
"queue".

> Mesos Agent Sends TASK_KILL status update to Master, and still launches task
> 
>
> Key: MESOS-7744
> URL: https://issues.apache.org/jira/browse/MESOS-7744
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.0.1
>Reporter: Sargun Dhillon
>Priority: Minor
>  Labels: reliability
>
> We sometimes launch jobs, and cancel them in ~7 seconds, if we don't get a 
> TASK_STARTING back from the agent. Under certain conditions it can result in 
> Mesos losing track of the task. The chunk of the logs which is interesting is 
> here:
> {code}
> Jun 29 23:22:26 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:26.951799  5171 slave.cpp:1495] Got assigned 
> task Titus-7590548-worker-0-4476 for framework TitusFramework
> Jun 29 23:22:26 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:26.952251  5171 slave.cpp:1614] Launching task 
> Titus-7590548-worker-0-4476 for framework TitusFramework
> Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:37.484611  5171 slave.cpp:1853] Queuing task 
> ‘Titus-7590548-worker-0-4476’ for executor ‘docker-executor’ of framework 
> TitusFramework at executor(1)@100.66.11.10:17707
> Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:37.487876  5171 slave.cpp:2035] Asked to kill 
> task Titus-7590548-worker-0-4476 of framework TitusFramework
> Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:37.488994  5171 slave.cpp:3211] Handling 
> status update TASK_KILLED (UUID: 898215d6-a244-4dbe-bc9c-878a22d36ea4) for 
> task Titus-7590548-worker-0-4476 of framework TitusFramework from @0.0.0.0:0
> Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:37.490603  5171 slave.cpp:2005] Sending queued 
> task ‘Titus-7590548-worker-0-4476’ to executor ‘docker-executor’ of framework 
> TitusFramework at executor(1)@100.66.11.10:17707{
> {code}
> In our executor, we see that the launch message arrives after the master has 
> already gotten the kill update. We then send non-terminal state updates to 
> the agent, and yet it doesn't forward these to our framework. We're using a 
> custom executor which is based on the older mesos-go bindings. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7744) Mesos Agent Sends TASK_KILL status update to Master, and still launches task

2017-07-20 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-7744:
---
Labels: reliability  (was: )

> Mesos Agent Sends TASK_KILL status update to Master, and still launches task
> 
>
> Key: MESOS-7744
> URL: https://issues.apache.org/jira/browse/MESOS-7744
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.0.1
>Reporter: Sargun Dhillon
>Priority: Minor
>  Labels: reliability
>
> We sometimes launch jobs, and cancel them in ~7 seconds, if we don't get a 
> TASK_STARTING back from the agent. Under certain conditions it can result in 
> Mesos losing track of the task. The chunk of the logs which is interesting is 
> here:
> {code}
> Jun 29 23:22:26 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:26.951799  5171 slave.cpp:1495] Got assigned 
> task Titus-7590548-worker-0-4476 for framework TitusFramework
> Jun 29 23:22:26 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:26.952251  5171 slave.cpp:1614] Launching task 
> Titus-7590548-worker-0-4476 for framework TitusFramework
> Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:37.484611  5171 slave.cpp:1853] Queuing task 
> ‘Titus-7590548-worker-0-4476’ for executor ‘docker-executor’ of framework 
> TitusFramework at executor(1)@100.66.11.10:17707
> Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:37.487876  5171 slave.cpp:2035] Asked to kill 
> task Titus-7590548-worker-0-4476 of framework TitusFramework
> Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:37.488994  5171 slave.cpp:3211] Handling 
> status update TASK_KILLED (UUID: 898215d6-a244-4dbe-bc9c-878a22d36ea4) for 
> task Titus-7590548-worker-0-4476 of framework TitusFramework from @0.0.0.0:0
> Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:37.490603  5171 slave.cpp:2005] Sending queued 
> task ‘Titus-7590548-worker-0-4476’ to executor ‘docker-executor’ of framework 
> TitusFramework at executor(1)@100.66.11.10:17707{
> {code}
> In our executor, we see that the launch message arrives after the master has 
> already gotten the kill update. We then send non-terminal state updates to 
> the agent, and yet it doesn't forward these to our framework. We're using a 
> custom executor which is based on the older mesos-go bindings. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7812) Request to add artifacts ( binary ) creation steps to the mesos-ppc64le jenkins job

2017-07-20 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095479#comment-16095479
 ] 

Vinod Kone commented on MESOS-7812:
---

[~amitkumar_ghatwal] Hey Amit. The mesos project doesn't provide binaries for 
any platform. We have plans to do so in the near future. So we'll have a better 
answer for you once we figure how we are going to provide binaries in general. 
cc [~karya] 

> Request to add artifacts ( binary ) creation steps to the mesos-ppc64le 
> jenkins job
> ---
>
> Key: MESOS-7812
> URL: https://issues.apache.org/jira/browse/MESOS-7812
> Project: Mesos
>  Issue Type: Wish
> Environment: OS - Ubuntu
> Platform - ppc64le
>Reporter: Amitkumar Ghatwal
>
> Hi All,
> In reference to the job re-enabled for ppc64le via this JIRA ticket  - 
> https://issues.apache.org/jira/browse/INFRA-14367 again . Wanted to know if 
> its possible to add steps to this jenkins job so that we can get artifacts 
> such as binary/installers ( *.deb) for mesos on ubuntu-ppc64le during the job 
> build.
> Job - https://builds.apache.org/job/Mesos-PPC64LE/.
> Binary installer ( *.deb) for mesos on ppc64le will come in handy for one 
> step installation on power.
> Requesting [~vinodkone] , to comment if you have any information to add 
> artifacts creation for this jenkins job. 
> Regards,
> Amit



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-4992) sandbox uri does not work outisde mesos http server

2017-07-20 Thread Benjamin Mahler (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095346#comment-16095346
 ] 

Benjamin Mahler commented on MESOS-4992:


[~skonto] 1.4 is not released yet, you can test against the master branch for 
now.

> sandbox uri does not work outisde mesos http server
> ---
>
> Key: MESOS-4992
> URL: https://issues.apache.org/jira/browse/MESOS-4992
> Project: Mesos
>  Issue Type: Bug
>  Components: webui
>Affects Versions: 0.27.1
>Reporter: Stavros Kontopoulos
>Assignee: haosdent
>  Labels: mesosphere
> Fix For: 1.4.0
>
>
> The SandBox uri of a framework does not work if i just copy paste it to the 
> browser.
> For example the following sandbox uri:
> http://172.17.0.1:5050/#/slaves/50f87c73-79ef-4f2a-95f0-b2b4062b2de6-S0/frameworks/50f87c73-79ef-4f2a-95f0-b2b4062b2de6-0009/executors/driver-20160321155016-0001/browse
> should redirect to:
> http://172.17.0.1:5050/#/slaves/50f87c73-79ef-4f2a-95f0-b2b4062b2de6-S0/browse?path=%2Ftmp%2Fmesos%2Fslaves%2F50f87c73-79ef-4f2a-95f0-b2b4062b2de6-S0%2Fframeworks%2F50f87c73-79ef-4f2a-95f0-b2b4062b2de6-0009%2Fexecutors%2Fdriver-20160321155016-0001%2Fruns%2F60533483-31fb-4353-987d-f3393911cc80
> yet it fails with the message:
> "Failed to find slaves.
> Navigate to the slave's sandbox via the Mesos UI."
> and redirects to:
> http://172.17.0.1:5050/#/
> It is an issue for me because im working on expanding the mesos spark ui with 
> sandbox uri, The other option is to get the slave info and parse the json 
> file there and get executor paths not so straightforward or elegant though.
> Moreover i dont see the runs/container_id in the Mesos Proto Api. I guess 
> this is hidden info, this is the needed piece of info to re-write the uri 
> without redirection.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7817) CreateProcess wrapper's error message is bad

2017-07-20 Thread Michael Park (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095177#comment-16095177
 ] 

Michael Park commented on MESOS-7817:
-

{noformat}
commit cce7f5f6e8bea0972dab003cd84959d044350f5f
Author: Andrew Schwartzmeyer 
Date:   Thu Jul 20 11:35:07 2017 -0700

Windows: Fixed `CreateProcess` error message.

The buffer conversion of the argument string was being printed instead
of the argument string itself, leading to an error message with a bunch
of bytes.

Review: https://reviews.apache.org/r/61001/
{noformat}

> CreateProcess wrapper's error message is bad
> 
>
> Key: MESOS-7817
> URL: https://issues.apache.org/jira/browse/MESOS-7817
> Project: Mesos
>  Issue Type: Bug
>  Components: stout
> Environment: Windows 10
>Reporter: Andrew Schwartzmeyer
>Assignee: Andrew Schwartzmeyer
>Priority: Trivial
>  Labels: stout, windows
>
> In stout/os/windows/shell.hpp: Try create_process(...) wrapper:
> We have an `arg_string` and an `arg_buffer` because of oddities of 
> `CreateProcessW`. But when composing the error message we need to use the 
> string, not the buffer. Otherwise stringify kindly outputs a massive array of 
> bytes. Currently by mistake we use the buffer, probably due to a renaming 
> refactor I did.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-6101) Add event for Framwork added to master operator API

2017-07-20 Thread Greg Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-6101:
-
Shepherd: Anand Mazumdar  (was: Greg Mann)

> Add event for Framwork added to master operator API
> ---
>
> Key: MESOS-6101
> URL: https://issues.apache.org/jira/browse/MESOS-6101
> Project: Mesos
>  Issue Type: Task
>Reporter: Zhitao Li
>Assignee: Quinn
>
> Consider the following case:
> 1) a subscriber connects to master;
> 2) a new scheduler registered as a new framework;
> 3) a task is launched from this framework.
> In this sequence, subscriber does not have a way to know the FrameworkInfo 
> belonging to the FrameworkId.
> We should support an event (e.g. when framework info in master is 
> added/changed).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7818) Add more filtering options for unversioned operator API

2017-07-20 Thread Greg Mann (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095017#comment-16095017
 ] 

Greg Mann commented on MESOS-7818:
--

Let's collect specific requirements here and break out into additional tickets 
if necessary.

cc [~klueska] [~cinchurge]

> Add more filtering options for unversioned operator API
> ---
>
> Key: MESOS-7818
> URL: https://issues.apache.org/jira/browse/MESOS-7818
> Project: Mesos
>  Issue Type: Improvement
>  Components: master
>Reporter: Greg Mann
>  Labels: api, mesosphere, operator
>
> The Mesos CLI hits {{/state}} to get the state of the Mesos cluster, which 
> can cause performance issues in large clusters. To optimize the CLI for large 
> clusters, we can add more filtering options to unversioned operator endpoints 
> like {{/tasks}}, so that the CLI can request results for only those tasks 
> which match certain criteria.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (MESOS-7818) Add more filtering options for unversioned operator API

2017-07-20 Thread Greg Mann (JIRA)
Greg Mann created MESOS-7818:


 Summary: Add more filtering options for unversioned operator API
 Key: MESOS-7818
 URL: https://issues.apache.org/jira/browse/MESOS-7818
 Project: Mesos
  Issue Type: Improvement
  Components: master
Reporter: Greg Mann


The Mesos CLI hits {{/state}} to get the state of the Mesos cluster, which can 
cause performance issues in large clusters. To optimize the CLI for large 
clusters, we can add more filtering options to unversioned operator endpoints 
like {{/tasks}}, so that the CLI can request results for only those tasks which 
match certain criteria.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-2258) Enable filtering of task information in master/state.json

2017-07-20 Thread Greg Mann (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095014#comment-16095014
 ] 

Greg Mann commented on MESOS-2258:
--

Closing this in favor of MESOS-7818.

> Enable filtering of task information in master/state.json
> -
>
> Key: MESOS-2258
> URL: https://issues.apache.org/jira/browse/MESOS-2258
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Niklas Quarfot Nielsen
>
> The masters state endpoint can grow huge (several MB's) in large 
> installations due to data of all running and completed tasks, while other 
> pieces of information (counters, attached slaves and frameworks) are still 
> useful to be polled frequently.
> We can add query parameters to state.json to filter out task information 
> and/or introduce a /metadata.json endpoint with all but task information.
> Any thoughts?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (MESOS-7817) CreateProcess wrapper's error message is bad

2017-07-20 Thread Andrew Schwartzmeyer (JIRA)
Andrew Schwartzmeyer created MESOS-7817:
---

 Summary: CreateProcess wrapper's error message is bad
 Key: MESOS-7817
 URL: https://issues.apache.org/jira/browse/MESOS-7817
 Project: Mesos
  Issue Type: Bug
  Components: stout
 Environment: Windows 10
Reporter: Andrew Schwartzmeyer
Assignee: Andrew Schwartzmeyer
Priority: Trivial


In stout/os/windows/shell.hpp: Try create_process(...) wrapper:

We have an `arg_string` and an `arg_buffer` because of oddities of 
`CreateProcessW`. But when composing the error message we need to use the 
string, not the buffer. Otherwise stringify kindly outputs a massive array of 
bytes. Currently by mistake we use the buffer, probably due to a renaming 
refactor I did.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7817) CreateProcess wrapper's error message is bad

2017-07-20 Thread Andrew Schwartzmeyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Schwartzmeyer updated MESOS-7817:

Labels: stout windows  (was: )

> CreateProcess wrapper's error message is bad
> 
>
> Key: MESOS-7817
> URL: https://issues.apache.org/jira/browse/MESOS-7817
> Project: Mesos
>  Issue Type: Bug
>  Components: stout
> Environment: Windows 10
>Reporter: Andrew Schwartzmeyer
>Assignee: Andrew Schwartzmeyer
>Priority: Trivial
>  Labels: stout, windows
>
> In stout/os/windows/shell.hpp: Try create_process(...) wrapper:
> We have an `arg_string` and an `arg_buffer` because of oddities of 
> `CreateProcessW`. But when composing the error message we need to use the 
> string, not the buffer. Otherwise stringify kindly outputs a massive array of 
> bytes. Currently by mistake we use the buffer, probably due to a renaming 
> refactor I did.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7784) MasterTestPrePostReservationRefinement.CreateAndDestroyVolumesV1 is flaky

2017-07-20 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-7784:
--
Sprint:   (was: Mesosphere Sprint 59)

> MasterTestPrePostReservationRefinement.CreateAndDestroyVolumesV1 is flaky
> -
>
> Key: MESOS-7784
> URL: https://issues.apache.org/jira/browse/MESOS-7784
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.4.0
> Environment: ASF CI, cmake,clang,--verbose --enable-libevent 
> --enable-ssl,GLOG_v=1 
> MESOS_VERBOSE=1,ubuntu:14.04,(ubuntu)&&(!ubuntu-us1)&&(!ubuntu-eu2)
>Reporter: Benjamin Bannier
>  Labels: flaky, flaky-test, mesosphere
>
> Saw 
> {{bool/MasterTestPrePostReservationRefinement.CreateAndDestroyVolumesV1/1}} 
> fail in ASF CI today.
> {noformat}
> [ RUN  ] 
> bool/MasterTestPrePostReservationRefinement.CreateAndDestroyVolumesV1/1
> I0711 22:59:15.174615   608 cluster.cpp:162] Creating default 'local' 
> authorizer
> I0711 22:59:15.176837  8488 master.cpp:442] Master 
> 838c7e1d-60d1-4aa7-8918-397da9ebcfa7 (6e892dc05c61) started on 
> 172.17.0.4:32791
> I0711 22:59:15.177094  8488 master.cpp:444] Flags at startup: --acls="" 
> --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate_agents="true" --authenticate_frameworks="true" 
> --authenticate_http_frameworks="true" --authenticate_http_readonly="true" 
> --authenticate_http_readwrite="true" --authenticators="crammd5" 
> --authorizers="local" --credentials="/tmp/fTUJFH/credentials" 
> --filter_gpu_resources="true" --framework_sorter="drf" --help="false" 
> --hostname_lookup="true" --http_authenticators="basic" 
> --http_framework_authenticators="basic" --initialize_driver_logging="true" 
> --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
> --max_agent_ping_timeouts="5" --max_completed_frameworks="50" 
> --max_completed_tasks_per_framework="1000" 
> --max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false" 
> --recovery_agent_removal_limit="100%" --registry="in_memory" 
> --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" 
> --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" 
> --registry_store_timeout="100secs" --registry_strict="false" 
> --root_submissions="true" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/fTUJFH/master" 
> --zk_session_timeout="10secs"
> I0711 22:59:15.177460  8488 master.cpp:494] Master only allowing 
> authenticated frameworks to register
> I0711 22:59:15.177477  8488 master.cpp:508] Master only allowing 
> authenticated agents to register
> I0711 22:59:15.177487  8488 master.cpp:521] Master only allowing 
> authenticated HTTP frameworks to register
> I0711 22:59:15.177497  8488 credentials.hpp:37] Loading credentials for 
> authentication from '/tmp/fTUJFH/credentials'
> I0711 22:59:15.177793  8488 master.cpp:566] Using default 'crammd5' 
> authenticator
> I0711 22:59:15.177947  8488 http.cpp:1009] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-readonly'
> I0711 22:59:15.178114  8488 http.cpp:1009] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-readwrite'
> I0711 22:59:15.178267  8488 http.cpp:1009] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-scheduler'
> I0711 22:59:15.178421  8488 master.cpp:646] Authorization enabled
> I0711 22:59:15.178761  8482 hierarchical.cpp:171] Initialized hierarchical 
> allocator process
> I0711 22:59:15.178794  8484 whitelist_watcher.cpp:77] No whitelist given
> I0711 22:59:15.182126  8486 master.cpp:2163] Elected as the leading master!
> I0711 22:59:15.182142  8486 master.cpp:1702] Recovering from registrar
> I0711 22:59:15.182231  8484 registrar.cpp:345] Recovering registrar
> I0711 22:59:15.182955  8484 registrar.cpp:389] Successfully fetched the 
> registry (0B) in 500992ns
> I0711 22:59:15.183069  8484 registrar.cpp:493] Applied 1 operations in 
> 42023ns; attempting to update the registry
> I0711 22:59:15.183899  8484 registrar.cpp:550] Successfully updated the 
> registry in 573952ns
> I0711 22:59:15.184029  8484 registrar.cpp:422] Successfully recovered 
> registrar
> I0711 22:59:15.184540  8486 hierarchical.cpp:209] Skipping recovery of 
> hierarchical allocator: nothing to recover
> I0711 22:59:15.184530  8496 master.cpp:1801] Recovered 0 agents from the 
> registry (129B); allowing 10mins for agents to re-register
> I0711 22:59:15.192324   608 containerizer.cpp:230] Using isolation: 
> posix/cpu,posix/mem,filesystem/posix,network/cni,environment_secret
> W0711 22:59:15.193044   608 backend.cpp:76] Failed to create 'aufs' backend: 
> AufsBackend requires root privileges

[jira] [Updated] (MESOS-7774) Consider more clearly distinguishing "zombie" tasks from other tasks in webui

2017-07-20 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-7774:
--
Sprint: Mesosphere Sprint 60

> Consider more clearly distinguishing "zombie" tasks from other tasks in webui
> -
>
> Key: MESOS-7774
> URL: https://issues.apache.org/jira/browse/MESOS-7774
> Project: Mesos
>  Issue Type: Improvement
>  Components: webui
>Reporter: Benjamin Bannier
>  Labels: mesosphere
>
> The webui's home page displays a list of active tasks; this list is 
> constructed from the list of non-completed framework tasks. If a framework 
> has not yet acknowledged a terminal status update, the list of "active tasks" 
> can contain tasks in terminal states which is confusing to users, e.g.,
> * launch {{sleep 10}} with {{mesos-execute}}
> * after the task is launch suspend the {{mesos-execute}} process
> * after 10s the list of active tasks contains a {{FINISHED}} task
> or
> * launch {{sleep 10}} with {{mesos-execute}}
> * after the task is launch suspend the {{mesos-execute}} process
> * kill the {{sleep}} system process
> * the list of active tasks contains a {{FAILED}} task
> The underlying issue here is that what is displayed in the webui very 
> directly reflects the list of tasks in {{master}} {{Framework}} objects. 
> There {{tasks}} holds tasks the master needs to track (since they might e.g., 
> still be running, or the frameworks need to be notified of status changes, 
> etc.), while e.g. {{completedTasks}} holds tasks of just historic interest 
> since they do not anymore require any master actions.
> Exposing this information in such an unfiltered way is likely confusing. 
> While this applies to the {{state}} endpoint like it does to the webui, a fix 
> should be easier to accomplish in the ui. We could there add some (visual?) 
> clue that active tasks in terminal states are analogous to zombie processes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7784) MasterTestPrePostReservationRefinement.CreateAndDestroyVolumesV1 is flaky

2017-07-20 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-7784:
--
Sprint: Mesosphere Sprint 60

> MasterTestPrePostReservationRefinement.CreateAndDestroyVolumesV1 is flaky
> -
>
> Key: MESOS-7784
> URL: https://issues.apache.org/jira/browse/MESOS-7784
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.4.0
> Environment: ASF CI, cmake,clang,--verbose --enable-libevent 
> --enable-ssl,GLOG_v=1 
> MESOS_VERBOSE=1,ubuntu:14.04,(ubuntu)&&(!ubuntu-us1)&&(!ubuntu-eu2)
>Reporter: Benjamin Bannier
>  Labels: flaky, flaky-test, mesosphere
>
> Saw 
> {{bool/MasterTestPrePostReservationRefinement.CreateAndDestroyVolumesV1/1}} 
> fail in ASF CI today.
> {noformat}
> [ RUN  ] 
> bool/MasterTestPrePostReservationRefinement.CreateAndDestroyVolumesV1/1
> I0711 22:59:15.174615   608 cluster.cpp:162] Creating default 'local' 
> authorizer
> I0711 22:59:15.176837  8488 master.cpp:442] Master 
> 838c7e1d-60d1-4aa7-8918-397da9ebcfa7 (6e892dc05c61) started on 
> 172.17.0.4:32791
> I0711 22:59:15.177094  8488 master.cpp:444] Flags at startup: --acls="" 
> --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate_agents="true" --authenticate_frameworks="true" 
> --authenticate_http_frameworks="true" --authenticate_http_readonly="true" 
> --authenticate_http_readwrite="true" --authenticators="crammd5" 
> --authorizers="local" --credentials="/tmp/fTUJFH/credentials" 
> --filter_gpu_resources="true" --framework_sorter="drf" --help="false" 
> --hostname_lookup="true" --http_authenticators="basic" 
> --http_framework_authenticators="basic" --initialize_driver_logging="true" 
> --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
> --max_agent_ping_timeouts="5" --max_completed_frameworks="50" 
> --max_completed_tasks_per_framework="1000" 
> --max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false" 
> --recovery_agent_removal_limit="100%" --registry="in_memory" 
> --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" 
> --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" 
> --registry_store_timeout="100secs" --registry_strict="false" 
> --root_submissions="true" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/fTUJFH/master" 
> --zk_session_timeout="10secs"
> I0711 22:59:15.177460  8488 master.cpp:494] Master only allowing 
> authenticated frameworks to register
> I0711 22:59:15.177477  8488 master.cpp:508] Master only allowing 
> authenticated agents to register
> I0711 22:59:15.177487  8488 master.cpp:521] Master only allowing 
> authenticated HTTP frameworks to register
> I0711 22:59:15.177497  8488 credentials.hpp:37] Loading credentials for 
> authentication from '/tmp/fTUJFH/credentials'
> I0711 22:59:15.177793  8488 master.cpp:566] Using default 'crammd5' 
> authenticator
> I0711 22:59:15.177947  8488 http.cpp:1009] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-readonly'
> I0711 22:59:15.178114  8488 http.cpp:1009] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-readwrite'
> I0711 22:59:15.178267  8488 http.cpp:1009] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-scheduler'
> I0711 22:59:15.178421  8488 master.cpp:646] Authorization enabled
> I0711 22:59:15.178761  8482 hierarchical.cpp:171] Initialized hierarchical 
> allocator process
> I0711 22:59:15.178794  8484 whitelist_watcher.cpp:77] No whitelist given
> I0711 22:59:15.182126  8486 master.cpp:2163] Elected as the leading master!
> I0711 22:59:15.182142  8486 master.cpp:1702] Recovering from registrar
> I0711 22:59:15.182231  8484 registrar.cpp:345] Recovering registrar
> I0711 22:59:15.182955  8484 registrar.cpp:389] Successfully fetched the 
> registry (0B) in 500992ns
> I0711 22:59:15.183069  8484 registrar.cpp:493] Applied 1 operations in 
> 42023ns; attempting to update the registry
> I0711 22:59:15.183899  8484 registrar.cpp:550] Successfully updated the 
> registry in 573952ns
> I0711 22:59:15.184029  8484 registrar.cpp:422] Successfully recovered 
> registrar
> I0711 22:59:15.184540  8486 hierarchical.cpp:209] Skipping recovery of 
> hierarchical allocator: nothing to recover
> I0711 22:59:15.184530  8496 master.cpp:1801] Recovered 0 agents from the 
> registry (129B); allowing 10mins for agents to re-register
> I0711 22:59:15.192324   608 containerizer.cpp:230] Using isolation: 
> posix/cpu,posix/mem,filesystem/posix,network/cni,environment_secret
> W0711 22:59:15.193044   608 backend.cpp:76] Failed to create 'aufs' backend: 
> AufsBackend requires root privileges
> W0711 

[jira] [Updated] (MESOS-7774) Consider more clearly distinguishing "zombie" tasks from other tasks in webui

2017-07-20 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-7774:
--
Sprint:   (was: Mesosphere Sprint 59)

> Consider more clearly distinguishing "zombie" tasks from other tasks in webui
> -
>
> Key: MESOS-7774
> URL: https://issues.apache.org/jira/browse/MESOS-7774
> Project: Mesos
>  Issue Type: Improvement
>  Components: webui
>Reporter: Benjamin Bannier
>  Labels: mesosphere
>
> The webui's home page displays a list of active tasks; this list is 
> constructed from the list of non-completed framework tasks. If a framework 
> has not yet acknowledged a terminal status update, the list of "active tasks" 
> can contain tasks in terminal states which is confusing to users, e.g.,
> * launch {{sleep 10}} with {{mesos-execute}}
> * after the task is launch suspend the {{mesos-execute}} process
> * after 10s the list of active tasks contains a {{FINISHED}} task
> or
> * launch {{sleep 10}} with {{mesos-execute}}
> * after the task is launch suspend the {{mesos-execute}} process
> * kill the {{sleep}} system process
> * the list of active tasks contains a {{FAILED}} task
> The underlying issue here is that what is displayed in the webui very 
> directly reflects the list of tasks in {{master}} {{Framework}} objects. 
> There {{tasks}} holds tasks the master needs to track (since they might e.g., 
> still be running, or the frameworks need to be notified of status changes, 
> etc.), while e.g. {{completedTasks}} holds tasks of just historic interest 
> since they do not anymore require any master actions.
> Exposing this information in such an unfiltered way is likely confusing. 
> While this applies to the {{state}} endpoint like it does to the webui, a fix 
> should be easier to accomplish in the ui. We could there add some (visual?) 
> clue that active tasks in terminal states are analogous to zombie processes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-6390) Ensure Python support scripts are linted

2017-07-20 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-6390:
--
Story Points: 3

> Ensure Python support scripts are linted
> 
>
> Key: MESOS-6390
> URL: https://issues.apache.org/jira/browse/MESOS-6390
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Benjamin Bannier
>Assignee: Armand Grillet
>  Labels: newbie, python
>
> Currently {{support/mesos-style.py}} does not lint files under {{support/}}. 
> This is mostly due to the fact that these scripts are too inconsistent 
> style-wise that they wouldn't even pass the linter now.
> We should clean up all Python scripts under {{support/}} so they pass the 
> Python linter, and activate that directory in the linter for future 
> additions. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7792) Add support for ECDH ciphers

2017-07-20 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-7792:
--
Story Points: 5

> Add support for ECDH ciphers
> 
>
> Key: MESOS-7792
> URL: https://issues.apache.org/jira/browse/MESOS-7792
> Project: Mesos
>  Issue Type: Improvement
>  Components: libprocess
>Affects Versions: 1.3.0
>Reporter: Alexander Rojas
>Assignee: Alexander Rojas
>  Labels: security
>
> [Elliptic curve 
> ciphers|https://wiki.openssl.org/index.php/Elliptic_Curve_Cryptography] are a 
> family of ciphers supported by OpenSSL. They allow to have smaller keys, but 
> require an extra configuration parameter, the actual curve to be used, which 
> can't be done through libprocess as it is.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7160) Parsing of perf version segfaults

2017-07-20 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-7160:
--
Sprint: Mesosphere Sprint 58  (was: Mesosphere Sprint 58, Mesosphere Sprint 
59)

> Parsing of perf version segfaults
> -
>
> Key: MESOS-7160
> URL: https://issues.apache.org/jira/browse/MESOS-7160
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Reporter: Benjamin Bannier
>Assignee: Andrei Budnik
>  Labels: mesosphere, tech-debt
>
> Parsing the perf version [fails with a segfault in ASF 
> CI|https://builds.apache.org/job/Mesos-Buildbot/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose%20--enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu:14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/3294/],
> {noformat}
> E0222 20:54:03.033464   805 perf.cpp:237] Failed to get perf version: Failed 
> to execute perf: terminated with signal Aborted (core dumped)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7816) Add HTTP connection handling to the resource provider driver

2017-07-20 Thread Jan Schlicht (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Schlicht updated MESOS-7816:

Labels: mesosphere storage  (was: mesosphere)

> Add HTTP connection handling to the resource provider driver
> 
>
> Key: MESOS-7816
> URL: https://issues.apache.org/jira/browse/MESOS-7816
> Project: Mesos
>  Issue Type: Task
>  Components: storage
>Reporter: Jan Schlicht
>Assignee: Jan Schlicht
>  Labels: mesosphere, storage
>
> The {{resource_provider::Driver}} is responsible for establishing a 
> connection with an agent/master resource provider API and provide calls to 
> the API, receive events from the API. This is done using HTTP and should be 
> implemented similar to how it's done for schedulers and executors (see 
> {{src/executor/executor.cpp, src/scheduler/scheduler.cpp}}).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7630) Add simple filtering to unversioned operator API

2017-07-20 Thread Greg Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-7630:
-
Sprint: Mesosphere Sprint 59

> Add simple filtering to unversioned operator API
> 
>
> Key: MESOS-7630
> URL: https://issues.apache.org/jira/browse/MESOS-7630
> Project: Mesos
>  Issue Type: Improvement
>  Components: agent, master
>Reporter: Quinn
>Assignee: Quinn
>  Labels: agent, api, http, master, mesosphere
> Fix For: 1.4.0
>
>
> Add filtering for the following endpoints:
> - {{/frameworks}}
> - {{/slaves}}
> - {{/tasks}}
> - {{/containers}}
> We should investigate whether we should use RESTful style or query string to 
> filter the specific resource. We should also figure out whether it's 
> necessary to filter a list of resources.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7807) Docker executor needs to return multiple IP addresses for the container

2017-07-20 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-7807:
--
Shepherd: Qian Zhang  (was: Jie Yu)

> Docker executor needs to return multiple IP addresses for the container
> ---
>
> Key: MESOS-7807
> URL: https://issues.apache.org/jira/browse/MESOS-7807
> Project: Mesos
>  Issue Type: Task
>Reporter: Avinash Sridharan
>Assignee: Avinash Sridharan
>  Labels: Mesosphere
>
> `Docker executor` currently returns only a single IP address for each docker 
> container. In a world where container has a v4 and v6 address the executor 
> needs to return all the addresses it sees for the container else we won't be 
> able to support dual-stack containers.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7434) SlaveTest.RestartSlaveRequireExecutorAuthentication is flaky.

2017-07-20 Thread Greg Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-7434:
-
Sprint: Mesosphere Sprint 58  (was: Mesosphere Sprint 58, Mesosphere Sprint 
59)

> SlaveTest.RestartSlaveRequireExecutorAuthentication is flaky.
> -
>
> Key: MESOS-7434
> URL: https://issues.apache.org/jira/browse/MESOS-7434
> Project: Mesos
>  Issue Type: Bug
> Environment: Debian 8
> CentOS 6
> other Linux distros
>Reporter: Greg Mann
>Assignee: Greg Mann
>  Labels: flaky, flaky-test, mesosphere
> Attachments: 
> RestartSlaveRequireExecutorAuthentication_failure_log_debian8.txt, 
> RestartSlaveRequireExecutorAuthentication is flaky_failure_log_centos6.txt
>
>
> This test failure has been observed on an internal CI system. It occurs on a 
> variety of Linux distributions. It seems that using {{cat}} as the task 
> command may be problematic; see attached log file 
> {{SlaveTest.RestartSlaveRequireExecutorAuthentication.txt}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7798) Improve libprocess message passing performance

2017-07-20 Thread Benjamin Hindman (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094902#comment-16094902
 ] 

Benjamin Hindman commented on MESOS-7798:
-

Benchmark committed:
{code}
commit 7c90f18603223474652722c78e46b0aa65a528e5
Author: Benjamin Hindman 
Date:   Wed Jul 19 13:56:28 2017 -0700

Added a message passing benchmark.

Review: https://reviews.apache.org/r/60983
{code}

> Improve libprocess message passing performance
> --
>
> Key: MESOS-7798
> URL: https://issues.apache.org/jira/browse/MESOS-7798
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Benjamin Hindman
>Assignee: Benjamin Hindman
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (MESOS-5187) filesystem/linux isolator does not set the permissions of the host_path

2017-07-20 Thread Gilbert Song (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gilbert Song reassigned MESOS-5187:
---

Assignee: Gilbert Song

> filesystem/linux isolator does not set the permissions of the host_path
> ---
>
> Key: MESOS-5187
> URL: https://issues.apache.org/jira/browse/MESOS-5187
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 0.26.0
> Environment: Mesos 0.26.0, Apache Aurora 0.12
>Reporter: Stephan Erb
>Assignee: Gilbert Song
>
> The {{filesystem/linux}} isolator is not a drop in replacement for the 
> {{filesystem/shared}} isolator. This should be considered before the latter 
> is deprecated.
> We are currently using the {{filesystem/shared}} isolator together with the 
> following slave option. This provides us with a private {{/tmp}} and 
> {{/var/tmp}} folder for each task.
> {code}
> --default_container_info='{
> "type": "MESOS",
> "volumes": [
> {"host_path": "system/tmp", "container_path": "/tmp", 
>"mode": "RW"},
> {"host_path": "system/vartmp",  "container_path": "/var/tmp", 
>"mode": "RW"}
> ]
> }'
> {code}
> When browsing the Mesos sandbox, one can see the following permissions:
> {code}
> mode  nlink   uid gid sizemtime   
> drwxrwxrwx3   rootroot4 KBApr 11 18:16 tmp
> drwxrwxrwx2   rootroot4 KBApr 11 18:15 vartmp 
> {code}
> However, when running with the new {{filesystem/linux}} isolator, the 
> permissions are different:
> {code}
> mode  nlink   uid gid sizemtime   
> drwxr-xr-x 2  rootroot4 KBApr 12 10:34 tmp
> drwxr-xr-x 2  rootroot4 KBApr 12 10:34 vartmp
> {code}
> This prevents user code (running as a non-root user) from writing to those 
> folders, i.e. every write attempt fails with permission denied. 
> *Context*:
> * We are using Apache Aurora. Aurora is running its custom executor as root 
> but then switches to a non-privileged user before running the actual user 
> code. 
> * The follow code seems to have enabled our usecase in the existing 
> {{filesystem/shared}} isolator: 
> https://github.com/apache/mesos/blob/4d2b1b793e07a9c90b984ca330a3d7bc9e1404cc/src/slave/containerizer/mesos/isolators/filesystem/shared.cpp#L175-L198
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7727) Scheme/HTTPTest.Get segfaults

2017-07-20 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094878#comment-16094878
 ] 

Till Toenshoff commented on MESOS-7727:
---

[~vinodkone] missed this - now looking into it - thanks for the ping.

> Scheme/HTTPTest.Get segfaults
> -
>
> Key: MESOS-7727
> URL: https://issues.apache.org/jira/browse/MESOS-7727
> Project: Mesos
>  Issue Type: Bug
>Reporter: Vinod Kone
>Assignee: Till Toenshoff
>  Labels: flaky-test, mesosphere-oncall
>
> Observed this on ASF CI
> {code}
> [ RUN  ] Scheme/HTTPTest.Get/0
> I0627 09:58:16.931704  2483 openssl.cpp:419] CA file path is unspecified! 
> NOTE: Set CA file path with LIBPROCESS_SSL_CA_FILE=
> I0627 09:58:16.931727  2483 openssl.cpp:424] CA directory path unspecified! 
> NOTE: Set CA directory path with LIBPROCESS_SSL_CA_DIR=
> I0627 09:58:16.931732  2483 openssl.cpp:429] Will not verify peer certificate!
> NOTE: Set LIBPROCESS_SSL_VERIFY_CERT=1 to enable peer certificate verification
> I0627 09:58:16.931740  2483 openssl.cpp:435] Will only verify peer 
> certificate if presented!
> NOTE: Set LIBPROCESS_SSL_REQUIRE_CERT=1 to require peer certificate 
> verification
> I0627 09:58:16.932193  3504 process.cpp:968] Failed to accept socket: future 
> discarded
> *** Aborted at 1498557496 (unix time) try "date -d @1498557496" if you are 
> using GNU date ***
> PC: @ 0x7f5397f30912 (unknown)
> *** SIGSEGV (@0x7f5349e18068) received by PID 2483 (TID 0x7f53937cd700) from 
> PID 1239515240; stack trace: ***
> I0627 09:58:16.934547  2483 process.cpp:1282] libprocess is initialized on 
> 172.17.0.4:50357 with 16 worker threads
> @ 0x7f53987ac370 (unknown)
> @ 0x7f5397f30912 (unknown)
> @ 0x7f5397f30f8c (unknown)
> @   0x42b1a3 process::UPID::UPID()
> @   0x8fcdec process::DispatchEvent::DispatchEvent()
> I0627 09:58:16.940096  3518 process.cpp:3779] Handling HTTP event for process 
> '(80)' with path: '/(80)/get'
> @   0x8f5275 process::internal::dispatch()
> @   0x910002 process::dispatch<>()
> I0627 09:58:16.945485  3519 process.cpp:3779] Handling HTTP event for process 
> '(80)' with path: '/(80)/get'
> @   0x8f4184 process::ProcessBase::route()
> [   OK ] Scheme/HTTPTest.Get/0 (463 ms)
> [ RUN  ] Scheme/HTTPTest.Get/1
> @   0x9e88b9 process::ProcessBase::route<>()
> @   0x9e4bb2 process::Help::initialize()
> @   0x8ed69a process::ProcessManager::resume()
> @   0x8e9a98 
> _ZZN7process14ProcessManager12init_threadsEvENKUt_clEv
> @   0x8fc38c 
> _ZNSt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEE9_M_invokeIIEEEvSt12_Index_tupleIIXspT_EEE
> @   0x8fc2d0 
> _ZNSt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEclEv
> @   0x8fc25a 
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
> @ 0x7f5397f27230 (unknown)
> @ 0x7f53987a4dc5 start_thread
> @ 0x7f539769076d __clone
> make[7]: *** [check-local] Segmentation fault
> {code}
> [~tillt] can you triage this? looks related to SSL



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7792) Add support for ECDH ciphers

2017-07-20 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-7792:
--
Target Version/s: 1.4.0

> Add support for ECDH ciphers
> 
>
> Key: MESOS-7792
> URL: https://issues.apache.org/jira/browse/MESOS-7792
> Project: Mesos
>  Issue Type: Improvement
>  Components: libprocess
>Affects Versions: 1.3.0
>Reporter: Alexander Rojas
>Assignee: Alexander Rojas
>  Labels: security
>
> [Elliptic curve 
> ciphers|https://wiki.openssl.org/index.php/Elliptic_Curve_Cryptography] are a 
> family of ciphers supported by OpenSSL. They allow to have smaller keys, but 
> require an extra configuration parameter, the actual curve to be used, which 
> can't be done through libprocess as it is.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7792) Add support for ECDH ciphers

2017-07-20 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-7792:
--
Issue Type: Improvement  (was: Bug)

> Add support for ECDH ciphers
> 
>
> Key: MESOS-7792
> URL: https://issues.apache.org/jira/browse/MESOS-7792
> Project: Mesos
>  Issue Type: Improvement
>  Components: libprocess
>Affects Versions: 1.3.0
>Reporter: Alexander Rojas
>Assignee: Alexander Rojas
>  Labels: security
>
> [Elliptic curve 
> ciphers|https://wiki.openssl.org/index.php/Elliptic_Curve_Cryptography] are a 
> family of ciphers supported by OpenSSL. They allow to have smaller keys, but 
> require an extra configuration parameter, the actual curve to be used, which 
> can't be done through libprocess as it is.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7800) Tasks with many labels can cause disproportionally huge allocations

2017-07-20 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-7800:
---
Labels: mesosphere reliability  (was: mesosphere)

> Tasks with many labels can cause disproportionally huge allocations
> ---
>
> Key: MESOS-7800
> URL: https://issues.apache.org/jira/browse/MESOS-7800
> Project: Mesos
>  Issue Type: Bug
>  Components: agent, master
>Reporter: Benjamin Bannier
>  Labels: mesosphere, reliability
> Attachments: stat_all_task_labels.dat, stat_individual_labels.dat
>
>
> {{mesos.proto}} provides the {{Labels}} message so others can add free-form 
> data to a number of messages. In e.g., {{TaskInfo}} and {{ExecutorInfo}} we 
> explicitly document
> {quote}
> Therefore, labels should be used to tag tasks with light-weight meta-data.
> {quote}
> We however never enforce this requirement.
> This becomes e.g., problematic in the agent where a {{TaskInfo}} will likely 
> be copied often, e.g., due to multiple levels of dispatches. I have measured 
> that a single {{Label}} can trigger 50-100 concurrent copies in flight on the 
> agent's container launch path; our general assumption here seems to be that 
> while a {{TaskInfo}} is not necessarily small, it still is not huge.
> If users embed a lot of data into e.g., {{TaskInfo}} {{labels}} this can lead 
> to a temporary explosion of the agent process' memory footprint which can 
> lead to it being killed by the OS.
> Due to the potential negative effects of huge {{labels}} we should evaluate 
> how we can limit the amount of data we accept from users. This could mean 
> limiting the size of {{TaskInfo}} or {{Labels}} we accept, measured e.g., by 
> the message's {{ByteSizeLong}}. It seems that a value somehow related to 
> {{ARG_MAX}} would be intuitive, but am not sure if we can go as low as the 
> POSIX-mandated minimum requirement of 4096.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7800) Tasks with many labels can cause disproportionally huge allocations

2017-07-20 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-7800:

Attachment: stat_all_task_labels.dat
stat_individual_labels.dat

I went through a couple of sample workloads, and am attaching files with pairs 
of the length of the key and value, respectively. One file contains the an 
entry for each {{Label}} used ({{stat_individual_labels.dat}}); the other file 
contains entries accumulating the sizes of all keys or values by task.

Looking at the values it seems there are two groups of workloads here, one 
where all {{Label}} contents should fit well into 0.5 kB uncompressed, while 
the other group seems to need around 16kB. While the first group clearly only 
passes _lightweight data_ as documented, the second group passes encoded data 
payloads.

> Tasks with many labels can cause disproportionally huge allocations
> ---
>
> Key: MESOS-7800
> URL: https://issues.apache.org/jira/browse/MESOS-7800
> Project: Mesos
>  Issue Type: Bug
>  Components: agent, master
>Reporter: Benjamin Bannier
>  Labels: mesosphere
> Attachments: stat_all_task_labels.dat, stat_individual_labels.dat
>
>
> {{mesos.proto}} provides the {{Labels}} message so others can add free-form 
> data to a number of messages. In e.g., {{TaskInfo}} and {{ExecutorInfo}} we 
> explicitly document
> {quote}
> Therefore, labels should be used to tag tasks with light-weight meta-data.
> {quote}
> We however never enforce this requirement.
> This becomes e.g., problematic in the agent where a {{TaskInfo}} will likely 
> be copied often, e.g., due to multiple levels of dispatches. I have measured 
> that a single {{Label}} can trigger 50-100 concurrent copies in flight on the 
> agent's container launch path; our general assumption here seems to be that 
> while a {{TaskInfo}} is not necessarily small, it still is not huge.
> If users embed a lot of data into e.g., {{TaskInfo}} {{labels}} this can lead 
> to a temporary explosion of the agent process' memory footprint which can 
> lead to it being killed by the OS.
> Due to the potential negative effects of huge {{labels}} we should evaluate 
> how we can limit the amount of data we accept from users. This could mean 
> limiting the size of {{TaskInfo}} or {{Labels}} we accept, measured e.g., by 
> the message's {{ByteSizeLong}}. It seems that a value somehow related to 
> {{ARG_MAX}} would be intuitive, but am not sure if we can go as low as the 
> POSIX-mandated minimum requirement of 4096.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (MESOS-7816) Add HTTP connection handling to the resource provider driver

2017-07-20 Thread Jan Schlicht (JIRA)
Jan Schlicht created MESOS-7816:
---

 Summary: Add HTTP connection handling to the resource provider 
driver
 Key: MESOS-7816
 URL: https://issues.apache.org/jira/browse/MESOS-7816
 Project: Mesos
  Issue Type: Task
  Components: storage
Reporter: Jan Schlicht
Assignee: Jan Schlicht


The {{resource_provider::Driver}} is responsible for establishing a connection 
with an agent/master resource provider API and provide calls to the API, 
receive events from the API. This is done using HTTP and should be implemented 
similar to how it's done for schedulers and executors (see 
{{src/executor/executor.cpp, src/scheduler/scheduler.cpp}}).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7444) Add support for storing gone agents to the master registry.

2017-07-20 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-7444:
--
Sprint: Mesosphere Sprint 56, Mesosphere Sprint 57, Mesosphere Sprint 58  
(was: Mesosphere Sprint 56, Mesosphere Sprint 57, Mesosphere Sprint 58, 
Mesosphere Sprint 59)

> Add support for storing gone agents to the master registry.
> ---
>
> Key: MESOS-7444
> URL: https://issues.apache.org/jira/browse/MESOS-7444
> Project: Mesos
>  Issue Type: Task
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>  Labels: mesosphere
>
> We need to add the {{MarkSlaveGone}} registry operation to the master 
> allowing it to store agents that have been marked as gone. The relevant 
> changes to {{registry.proto}} would also be done as part of this issue.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7443) Add the MARK_AGENT_GONE call to the Operator v1 API protos.

2017-07-20 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-7443:
--
Sprint: Mesosphere Sprint 56, Mesosphere Sprint 57, Mesosphere Sprint 58  
(was: Mesosphere Sprint 56, Mesosphere Sprint 57, Mesosphere Sprint 58, 
Mesosphere Sprint 59)

> Add the MARK_AGENT_GONE call to the Operator v1 API protos.
> ---
>
> Key: MESOS-7443
> URL: https://issues.apache.org/jira/browse/MESOS-7443
> Project: Mesos
>  Issue Type: Task
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>  Labels: mesosphere
>
> We need to add the relevant call to the v1 Operator API protos to mark an 
> agent as GONE. The actual handler implementation on the master would be done 
> in a separate ticket.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-6440) "Catch up" the webui to features that have been added.

2017-07-20 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6440:
---
Labels: mesosphere  (was: )

> "Catch up" the webui to features that have been added.
> --
>
> Key: MESOS-6440
> URL: https://issues.apache.org/jira/browse/MESOS-6440
> Project: Mesos
>  Issue Type: Epic
>  Components: webui
>Reporter: Benjamin Mahler
>Assignee: haosdent
>  Labels: mesosphere
>
> Going forward, we'd like to ensure that all features that are added include 
> the appropriate changes to the webui.
> Over time there have been some features that have been developed that have 
> not been reflected in the webui. The purpose of this epic is to collect these 
> and have an effort to catch up the webui to reflect the current state of 
> functionality.
> E.g. reservations / volumes are not visible in the UI
> E.g. framework capabilities are not visible in the UI



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7813) when lxc run after a period of time, the file(/proc/pid/cgroup) is modified, devices,blkio,memory,cpuacct is changed. why?

2017-07-20 Thread Joris Van Remoortere (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094733#comment-16094733
 ] 

Joris Van Remoortere commented on MESOS-7813:
-

[~y123456yz] here is an example of the systemd configuration in DC/OS
https://github.com/dcos/dcos/blob/18c76a2b4b24aab0c4107bae9c7191a68e6de174/packages/mesos/extra/dcos-mesos-slave.service

> when lxc run after a period of time, the file(/proc/pid/cgroup) is modified, 
> devices,blkio,memory,cpuacct is changed. why?
> --
>
> Key: MESOS-7813
> URL: https://issues.apache.org/jira/browse/MESOS-7813
> Project: Mesos
>  Issue Type: Bug
>  Components: agent, cgroups, executor, framework
> Environment: 1 SMP Wed Apr 12 15:04:24 UTC 2017 x86_64 x86_64 x86_64 
> GNU/Linux
>Reporter: y123456yz
>
> when lxc run after a period of time, the file(/proc/pid/cgroup) is modified, 
> devices,blkio,memory,cpuacct is changed. why?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-6441) Display reservations in the agent page in the webui.

2017-07-20 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6441:
---
Labels: mesosphere  (was: )

> Display reservations in the agent page in the webui.
> 
>
> Key: MESOS-6441
> URL: https://issues.apache.org/jira/browse/MESOS-6441
> Project: Mesos
>  Issue Type: Task
>  Components: webui
>Reporter: Benjamin Mahler
>Assignee: Andrei Budnik
>  Labels: mesosphere
> Fix For: 1.4.0
>
>
> We currently do not display the reservations present on an agent in the 
> webui. It would be nice to see this information.
> It would also be nice to update the resource statistics tables to make the 
> distinction between unreserved and reserved resources. E.g.
> Reserved:
> Used, Allocated, Available and Total
> Unreserved:
> Used, Allocated, Available and Total



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Issue Comment Deleted] (MESOS-7813) when lxc run after a period of time, the file(/proc/pid/cgroup) is modified, devices,blkio,memory,cpuacct is changed. why?

2017-07-20 Thread y123456yz (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

y123456yz updated MESOS-7813:
-
Comment: was deleted

(was: [~jvanremoortere]

can you tell me that how to Check out the delegate flag in systemd?)

> when lxc run after a period of time, the file(/proc/pid/cgroup) is modified, 
> devices,blkio,memory,cpuacct is changed. why?
> --
>
> Key: MESOS-7813
> URL: https://issues.apache.org/jira/browse/MESOS-7813
> Project: Mesos
>  Issue Type: Bug
>  Components: agent, cgroups, executor, framework
> Environment: 1 SMP Wed Apr 12 15:04:24 UTC 2017 x86_64 x86_64 x86_64 
> GNU/Linux
>Reporter: y123456yz
>
> when lxc run after a period of time, the file(/proc/pid/cgroup) is modified, 
> devices,blkio,memory,cpuacct is changed. why?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7815) Add gauge for master event processing time

2017-07-20 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-7815:
---
Labels: mesosphere metrics reliability  (was: mesosphere)

> Add gauge for master event processing time
> --
>
> Key: MESOS-7815
> URL: https://issues.apache.org/jira/browse/MESOS-7815
> Project: Mesos
>  Issue Type: Improvement
>  Components: master
>Reporter: Benjamin Bannier
>  Labels: mesosphere, metrics, reliability
>
> To diagnose cases where e.g., the master is backlogged, looking at just 
> {{event_queue_messages}} will only tell about the size of the queue, but 
> diagnosing whether this is due to higher message arrival rate or slower 
> processing requires complicated interference with other metrics.
> We should provide metrics to characterize the time it takes to process 
> messages in the queue, optimally with statistics over some window. This would 
> allow better identification of slow requests.
> We should also consider ways to characterizing the arrival rate via some 
> metric with statistics.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7747) Improve metrics around active subscribers.

2017-07-20 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-7747:
---
Shepherd: Vinod Kone

> Improve metrics around active subscribers.
> --
>
> Key: MESOS-7747
> URL: https://issues.apache.org/jira/browse/MESOS-7747
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Alexander Rukletsov
>Assignee: Alexander Rukletsov
>  Labels: mesosphere, metrics, reliability
>
> Active subscribers to, e.g., Mesos streaming API, may influence Mesos master 
> performance. To improve triaging and having a better understanding of master 
> workload, we should add metrics to track active subscribers, send queue size 
> and so on.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7815) Add gauge for master event processing time

2017-07-20 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-7815:

Issue Type: Improvement  (was: Bug)

> Add gauge for master event processing time
> --
>
> Key: MESOS-7815
> URL: https://issues.apache.org/jira/browse/MESOS-7815
> Project: Mesos
>  Issue Type: Improvement
>  Components: master
>Reporter: Benjamin Bannier
>  Labels: mesosphere
>
> To diagnose cases where e.g., the master is backlogged, looking at just 
> {{event_queue_messages}} will only tell about the size of the queue, but 
> diagnosing whether this is due to higher message arrival rate or slower 
> processing requires complicated interference with other metrics.
> We should provide metrics to characterize the time it takes to process 
> messages in the queue, optimally with statistics over some window. This would 
> allow better identification of slow requests.
> We should also consider ways to characterizing the arrival rate via some 
> metric with statistics.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7815) Add gauge for master event processing time

2017-07-20 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-7815:

Labels: mesosphere  (was: )

> Add gauge for master event processing time
> --
>
> Key: MESOS-7815
> URL: https://issues.apache.org/jira/browse/MESOS-7815
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Reporter: Benjamin Bannier
>  Labels: mesosphere
>
> To diagnose cases where e.g., the master is backlogged, looking at just 
> {{event_queue_messages}} will only tell about the size of the queue, but 
> diagnosing whether this is due to higher message arrival rate or slower 
> processing requires complicated interference with other metrics.
> We should provide metrics to characterize the time it takes to process 
> messages in the queue, optimally with statistics over some window. This would 
> allow better identification of slow requests.
> We should also consider ways to characterizing the arrival rate via some 
> metric with statistics.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (MESOS-7815) Add gauge for master event processing time

2017-07-20 Thread Benjamin Bannier (JIRA)
Benjamin Bannier created MESOS-7815:
---

 Summary: Add gauge for master event processing time
 Key: MESOS-7815
 URL: https://issues.apache.org/jira/browse/MESOS-7815
 Project: Mesos
  Issue Type: Bug
  Components: master
Reporter: Benjamin Bannier


To diagnose cases where e.g., the master is backlogged, looking at just 
{{event_queue_messages}} will only tell about the size of the queue, but 
diagnosing whether this is due to higher message arrival rate or slower 
processing requires complicated interference with other metrics.

We should provide metrics to characterize the time it takes to process messages 
in the queue, optimally with statistics over some window. This would allow 
better identification of slow requests.

We should also consider ways to characterizing the arrival rate via some metric 
with statistics.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-4812) Mesos fails to escape command health checks

2017-07-20 Thread Lukas Loesche (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094511#comment-16094511
 ] 

Lukas Loesche commented on MESOS-4812:
--

[~alexr] do you have cycles to look into this?

> Mesos fails to escape command health checks
> ---
>
> Key: MESOS-4812
> URL: https://issues.apache.org/jira/browse/MESOS-4812
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.25.0
>Reporter: Lukas Loesche
>Assignee: haosdent
>  Labels: health-check, mesosphere, tech-debt
> Attachments: health_task.gif
>
>
> As described in https://github.com/mesosphere/marathon/issues/
> I would like to run a command health check
> {noformat}
> /bin/bash -c " {noformat}
> The health check fails because Mesos, while running the command inside double 
> quotes of a sh -c "" doesn't escape the double quotes in the command.
> If I escape the double quotes myself the command health check succeeds. But 
> this would mean that the user needs intimate knowledge of how Mesos executes 
> his commands which can't be right.
> I was told this is not a Marathon but a Mesos issue so am opening this JIRA. 
> I don't know if this only affects the command health check.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7813) when lxc run after a period of time, the file(/proc/pid/cgroup) is modified, devices,blkio,memory,cpuacct is changed. why?

2017-07-20 Thread y123456yz (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094393#comment-16094393
 ] 

y123456yz commented on MESOS-7813:
--

[~jvanremoortere]

can you tell me that how to Check out the delegate flag in systemd?

> when lxc run after a period of time, the file(/proc/pid/cgroup) is modified, 
> devices,blkio,memory,cpuacct is changed. why?
> --
>
> Key: MESOS-7813
> URL: https://issues.apache.org/jira/browse/MESOS-7813
> Project: Mesos
>  Issue Type: Bug
>  Components: agent, cgroups, executor, framework
> Environment: 1 SMP Wed Apr 12 15:04:24 UTC 2017 x86_64 x86_64 x86_64 
> GNU/Linux
>Reporter: y123456yz
>
> when lxc run after a period of time, the file(/proc/pid/cgroup) is modified, 
> devices,blkio,memory,cpuacct is changed. why?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7374) Running DOCKER images in Mesos Container Runtime without `linux/filesystem` isolation enabled renders host unusable

2017-07-20 Thread Gilbert Song (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gilbert Song updated MESOS-7374:

Priority: Blocker  (was: Critical)

> Running DOCKER images in Mesos Container Runtime without `linux/filesystem` 
> isolation enabled renders host unusable
> ---
>
> Key: MESOS-7374
> URL: https://issues.apache.org/jira/browse/MESOS-7374
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 1.2.0
>Reporter: Tim Harper
>Assignee: Chun-Hung Hsiao
>Priority: Blocker
>  Labels: containerizer, mesosphere
>
> If I run the pod below (using Marathon 1.4.2) against a mesos agent that has 
> the flags (also below), then the overlay filesystem replaces the system root 
> mount, effectively rendering the host unusable until reboot.
> flags:
> - {{--containerizers mesos,docker}}
> - {{--image_providers APPC,DOCKER}}
> - {{--isolation cgroups/cpu,cgroups/mem,docker/runtime}}
> pod definition for Marathon:
> {code:java}
> {
>   "id": "/simplepod",
>   "scaling": { "kind": "fixed", "instances": 1 },
>   "containers": [
> {
>   "name": "sleep1",
>   "exec": { "command": { "shell": "sleep 1000" } },
>   "resources": { "cpus": 0.1, "mem": 32 },
>   "image": {
> "id": "alpine",
> "kind": "DOCKER"
>   }
> }
>   ],
>   "networks": [ {"mode": "host"} ]
> }
> {code}
> Mesos should probably check for this and avoid replacing the system root 
> mount point at startup or launch time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (MESOS-7814) Improve the code style of the test frameworks

2017-07-20 Thread Armand Grillet (JIRA)
Armand Grillet created MESOS-7814:
-

 Summary: Improve the code style of the test frameworks
 Key: MESOS-7814
 URL: https://issues.apache.org/jira/browse/MESOS-7814
 Project: Mesos
  Issue Type: Improvement
  Components: framework
Reporter: Armand Grillet
Assignee: Armand Grillet
Priority: Minor


These improvements include three main points:
* Adding a {{name}} flag to certain frameworks to distinguish between instances.
* Cleaning up the code style of the frameworks.
* For certain frameworks such as balloon framework, adding a 
{{executor_extra_uris}} flag containing URIs that will be passed to the 
{{command_info}} of the executor.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (MESOS-7813) when lxc run after a period of time, the file(/proc/pid/cgroup) is modified, devices,blkio,memory,cpuacct is changed. why?

2017-07-20 Thread y123456yz (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094174#comment-16094174
 ] 

y123456yz edited comment on MESOS-7813 at 7/20/17 7:19 AM:
---

[~jvanremoortere]
[~hartem]
[~jvanremoortere]
where the flowing config should be add to?
delegate=true

 cat /usr/lib/systemd/system/mesos-slave.service
[Unit]
Description=Mesos Slave
After=network.target
Wants=network.target
[Service]
delegate=true // add here??


only add "delegate=true" to mesos-slave.service's [Service]? 


whether need to add "KillMode=control-group" to  mesos-slave.service's 
[Service]? 

thanks again!



was (Author: y123456yz):
[~jvanremoortere]
where the flowing config should be add to?
delegate=true

 cat /usr/lib/systemd/system/mesos-slave.service
[Unit]
Description=Mesos Slave
After=network.target
Wants=network.target
[Service]
delegate=true // add here??


only add "delegate=true" to mesos-slave.service's [Service]? 


whether need to add "KillMode=control-group" to  mesos-slave.service's 
[Service]? 

thanks again!


> when lxc run after a period of time, the file(/proc/pid/cgroup) is modified, 
> devices,blkio,memory,cpuacct is changed. why?
> --
>
> Key: MESOS-7813
> URL: https://issues.apache.org/jira/browse/MESOS-7813
> Project: Mesos
>  Issue Type: Bug
>  Components: agent, cgroups, executor, framework
> Environment: 1 SMP Wed Apr 12 15:04:24 UTC 2017 x86_64 x86_64 x86_64 
> GNU/Linux
>Reporter: y123456yz
>
> when lxc run after a period of time, the file(/proc/pid/cgroup) is modified, 
> devices,blkio,memory,cpuacct is changed. why?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)