[jira] [Commented] (MESOS-7975) The command/default/docker executor can incorrectly send a TASK_FINISHED update even when the task is killed

2017-10-04 Thread Qian Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16192306#comment-16192306
 ] 

Qian Zhang commented on MESOS-7975:
---

[~vinodkone] Sure, done.

> The command/default/docker executor can incorrectly send a TASK_FINISHED 
> update even when the task is killed
> 
>
> Key: MESOS-7975
> URL: https://issues.apache.org/jira/browse/MESOS-7975
> Project: Mesos
>  Issue Type: Bug
>Reporter: Anand Mazumdar
>Assignee: Qian Zhang
>Priority: Critical
>  Labels: mesosphere
>
> Currently, when a task is killed, the default/command/docker executor 
> incorrectly send a {{TASK_FINISHED}} status update instead of 
> {{TASK_KILLED}}. This is due to an unfortunate missed conditional check when 
> the task exits with a zero status code.
> {code}
>   if (WSUCCEEDED(status)) {
> taskState = TASK_FINISHED;
>   } else if (killed) {
> // Send TASK_KILLED if the task was killed as a result of
> // kill() or shutdown().
> taskState = TASK_KILLED;
>   } else {
> taskState = TASK_FAILED;
>   }
> {code}
> We should modify the code to correctly send {{TASK_KILLED}} status updates 
> when a task is killed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (MESOS-7975) The command/default/docker executor can incorrectly send a TASK_FINISHED update even when the task is killed

2017-10-04 Thread Qian Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16166060#comment-16166060
 ] 

Qian Zhang edited comment on MESOS-7975 at 10/5/17 12:44 AM:
-

RR:
https://reviews.apache.org/r/62685/
https://reviews.apache.org/r/62326/
https://reviews.apache.org/r/62327/
https://reviews.apache.org/r/62774/
https://reviews.apache.org/r/62775/


was (Author: qianzhang):
RR:
https://reviews.apache.org/r/62685/
https://reviews.apache.org/r/62326/
https://reviews.apache.org/r/62327/

> The command/default/docker executor can incorrectly send a TASK_FINISHED 
> update even when the task is killed
> 
>
> Key: MESOS-7975
> URL: https://issues.apache.org/jira/browse/MESOS-7975
> Project: Mesos
>  Issue Type: Bug
>Reporter: Anand Mazumdar
>Assignee: Qian Zhang
>Priority: Critical
>  Labels: mesosphere
>
> Currently, when a task is killed, the default/command/docker executor 
> incorrectly send a {{TASK_FINISHED}} status update instead of 
> {{TASK_KILLED}}. This is due to an unfortunate missed conditional check when 
> the task exits with a zero status code.
> {code}
>   if (WSUCCEEDED(status)) {
> taskState = TASK_FINISHED;
>   } else if (killed) {
> // Send TASK_KILLED if the task was killed as a result of
> // kill() or shutdown().
> taskState = TASK_KILLED;
>   } else {
> taskState = TASK_FAILED;
>   }
> {code}
> We should modify the code to correctly send {{TASK_KILLED}} status updates 
> when a task is killed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7975) The command/default/docker executor can incorrectly send a TASK_FINISHED update even when the task is killed

2017-10-04 Thread Qian Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qian Zhang updated MESOS-7975:
--
Story Points: 3

> The command/default/docker executor can incorrectly send a TASK_FINISHED 
> update even when the task is killed
> 
>
> Key: MESOS-7975
> URL: https://issues.apache.org/jira/browse/MESOS-7975
> Project: Mesos
>  Issue Type: Bug
>Reporter: Anand Mazumdar
>Assignee: Qian Zhang
>Priority: Critical
>  Labels: mesosphere
>
> Currently, when a task is killed, the default/command/docker executor 
> incorrectly send a {{TASK_FINISHED}} status update instead of 
> {{TASK_KILLED}}. This is due to an unfortunate missed conditional check when 
> the task exits with a zero status code.
> {code}
>   if (WSUCCEEDED(status)) {
> taskState = TASK_FINISHED;
>   } else if (killed) {
> // Send TASK_KILLED if the task was killed as a result of
> // kill() or shutdown().
> taskState = TASK_KILLED;
>   } else {
> taskState = TASK_FAILED;
>   }
> {code}
> We should modify the code to correctly send {{TASK_KILLED}} status updates 
> when a task is killed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8052) "protoc" not found when running "make -j4 check" directly in stout

2017-10-04 Thread Chun-Hung Hsiao (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chun-Hung Hsiao updated MESOS-8052:
---
Shepherd: Benjamin Bannier

> "protoc" not found when running "make -j4 check" directly in stout
> --
>
> Key: MESOS-8052
> URL: https://issues.apache.org/jira/browse/MESOS-8052
> Project: Mesos
>  Issue Type: Bug
>  Components: stout
>Reporter: Chun-Hung Hsiao
>Assignee: Chun-Hung Hsiao
>  Labels: compile-error
> Fix For: 1.4.1
>
>
> If we run {{make -j4 check}} without running {{make}} first, we will get the 
> following error message:
> {noformat}
> 3rdparty/protobuf-3.3.0/src/protoc -I../tests --cpp_out=. 
> ../tests/protobuf_tests.proto
> /bin/bash: 3rdparty/protobuf-3.3.0/src/protoc: No such file or directory
> Makefile:1934: recipe for target 'protobuf_tests.pb.cc' failed
> make: *** [protobuf_tests.pb.cc] Error 127
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (MESOS-8051) Killing TASK_GROUP fail to kill some tasks

2017-10-04 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone reassigned MESOS-8051:
-

Assignee: Qian Zhang

[~qianzhang] Can you look into this?

> Killing TASK_GROUP fail to kill some tasks
> --
>
> Key: MESOS-8051
> URL: https://issues.apache.org/jira/browse/MESOS-8051
> Project: Mesos
>  Issue Type: Bug
>  Components: agent, executor
>Affects Versions: 1.4.0
>Reporter: A. Dukhovniy
>Assignee: Qian Zhang
>Priority: Critical
> Attachments: dcos-mesos-master.log.gz, dcos-mesos-slave.log.gz, 
> screenshot-1.png
>
>
> When starting following pod definition via marathon:
> {code:java}
> {
>   "id": "/simple-pod",
>   "scaling": {
> "kind": "fixed",
> "instances": 3
>   },
>   "environment": {
> "PING": "PONG"
>   },
>   "containers": [
> {
>   "name": "ct1",
>   "resources": {
> "cpus": 0.1,
> "mem": 32
>   },
>   "image": {
> "kind": "MESOS",
> "id": "busybox"
>   },
>   "exec": {
> "command": {
>   "shell": "while true; do echo the current time is $(date) > 
> ./test-v1/clock; sleep 1; done"
> }
>   },
>   "volumeMounts": [
> {
>   "name": "v1",
>   "mountPath": "test-v1"
> }
>   ]
> },
> {
>   "name": "ct2",
>   "resources": {
> "cpus": 0.1,
> "mem": 32
>   },
>   "exec": {
> "command": {
>   "shell": "while true; do echo -n $PING ' '; cat ./etc/clock; sleep 
> 1; done"
> }
>   },
>   "volumeMounts": [
> {
>   "name": "v1",
>   "mountPath": "etc"
> },
> {
>   "name": "v2",
>   "mountPath": "docker"
> }
>   ]
> }
>   ],
>   "networks": [
> {
>   "mode": "host"
> }
>   ],
>   "volumes": [
> {
>   "name": "v1"
> },
> {
>   "name": "v2",
>   "host": "/var/lib/docker"
> }
>   ]
> }
> {code}
> mesos will successfully kill all {{ct2}} containers but fail to kill all/some 
> of the {{ct1}} containers. I've attached both master and agent logs. The 
> interesting part starts after marathon issues 6 kills:
> {code:java}
> Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal 
> mesos-master[4708]: I1004 14:58:25.209966  4746 master.cpp:5297] Processing 
> KILL call for task 'simple-pod.instance-3c1098e5-a914-11e7-bcd5-e63c853d
> bf20.ct1' of framework bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) 
> at scheduler-c61c493c-728f-4bd9-be60-7373574749af@10.0.5.229:15101
> Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal 
> mesos-master[4708]: I1004 14:58:25.210033  4746 master.cpp:5371] Telling 
> agent bae11d5d-20c2-4d66-9ec3-773d1d717e58-S1 at slave(1)@10.0.1.207:5051 (
> 10.0.1.207) to kill task 
> simple-pod.instance-3c1098e5-a914-11e7-bcd5-e63c853dbf20.ct1 of framework 
> bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) at 
> scheduler-c61c493c-728f-4bd9-be60-7373574749af@10.0.5
> .229:15101
> Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal 
> mesos-master[4708]: I1004 14:58:25.210471  4748 master.cpp:5297] Processing 
> KILL call for task 'simple-pod.instance-3c1098e5-a914-11e7-bcd5-e63c853d
> bf20.ct2' of framework bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) 
> at scheduler-c61c493c-728f-4bd9-be60-7373574749af@10.0.5.229:15101
> Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal 
> mesos-master[4708]: I1004 14:58:25.210518  4748 master.cpp:5371] Telling 
> agent bae11d5d-20c2-4d66-9ec3-773d1d717e58-S1 at slave(1)@10.0.1.207:5051 (
> 10.0.1.207) to kill task 
> simple-pod.instance-3c1098e5-a914-11e7-bcd5-e63c853dbf20.ct2 of framework 
> bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) at 
> scheduler-c61c493c-728f-4bd9-be60-7373574749af@10.0.5
> .229:15101
> Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal 
> mesos-master[4708]: I1004 14:58:25.210602  4748 master.cpp:5297] Processing 
> KILL call for task 'simple-pod.instance-3c0ffca4-a914-11e7-bcd5-e63c853d
> bf20.ct1' of framework bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) 
> at scheduler-c61c493c-728f-4bd9-be60-7373574749af@10.0.5.229:15101
> Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal 
> mesos-master[4708]: I1004 14:58:25.210639  4748 master.cpp:5371] Telling 
> agent bae11d5d-20c2-4d66-9ec3-773d1d717e58-S1 at slave(1)@10.0.1.207:5051 (
> 10.0.1.207) to kill task 
> simple-pod.instance-3c0ffca4-a914-11e7-bcd5-e63c853dbf20.ct1 of framework 
> bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) at 
> scheduler-c61c493c-728f-4bd9-be60-7373574749af@10.0.5
> .229:15101
> Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal 
> mesos-master[4708]: I1004 14:58:25.210932  4753 master.cpp:52

[jira] [Updated] (MESOS-7130) port_mapping isolator: executor hangs when running on EC2

2017-10-04 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-7130:
--
Story Points: 2

> port_mapping isolator: executor hangs when running on EC2
> -
>
> Key: MESOS-7130
> URL: https://issues.apache.org/jira/browse/MESOS-7130
> Project: Mesos
>  Issue Type: Bug
>  Components: executor
>Reporter: Pierre Cheynier
>Assignee: Jie Yu
>
> Hi,
> I'm experiencing a weird issue: I'm using a CI to do testing on 
> infrastructure automation.
> I recently activated the {{network/port_mapping}} isolator.
> I'm able to make the changes work and pass the test for bare-metal servers 
> and virtualbox VMs using this configuration.
> But when I try on EC2 (on which my CI pipeline rely) it systematically fails 
> to run any container.
> It appears that the sandbox is created and the port_mapping isolator seems to 
> be OK according to the logs in stdout and stderr and the {tc} output :
> {noformat}
> + mount --make-rslave /run/netns
> + test -f /proc/sys/net/ipv6/conf/all/disable_ipv6
> + echo 1
> + ip link set lo address 02:44:20:bb:42:cf mtu 9001 up
> + ethtool -K eth0 rx off
> (...)
> + tc filter show dev eth0 parent :0
> + tc filter show dev lo parent :0
> I0215 16:01:13.941375 1 exec.cpp:161] Version: 1.0.2
> {noformat}
> Then the executor never come back in REGISTERED state and hang indefinitely.
> {GLOG_v=3} doesn't help here.
> My skills in this area are limited, but trying to load the symbols and attach 
> a gdb to the mesos-executor process, I'm able to print this stack:
> {noformat}
> #0  0x7feffc1386d5 in pthread_cond_wait@@GLIBC_2.3.2 () from 
> /usr/lib64/libpthread.so.0
> #1  0x7feffbed69ec in 
> std::condition_variable::wait(std::unique_lock&) () from 
> /usr/lib64/libstdc++.so.6
> #2  0x7ff0003dd8ec in void synchronized_wait std::mutex>(std::condition_variable*, std::mutex*) () from 
> /usr/lib64/libmesos-1.0.2.so
> #3  0x7ff0017d595d in Gate::arrive(long) () from 
> /usr/lib64/libmesos-1.0.2.so
> #4  0x7ff0017c00ed in process::ProcessManager::wait(process::UPID const&) 
> () from /usr/lib64/libmesos-1.0.2.so
> #5  0x7ff0017c5c05 in process::wait(process::UPID const&, Duration 
> const&) () from /usr/lib64/libmesos-1.0.2.so
> #6  0x004ab26f in process::wait(process::ProcessBase const*, Duration 
> const&) ()
> #7  0x004a3903 in main ()
> {noformat}
> I concluded that the underlying shell script launched by the isolator or the 
> task itself is just .. blocked. But I don't understand why.
> Here is a process tree to show that I've no task running but the executor is:
> {noformat}
> root 28420  0.8  3.0 1061420 124940 ?  Ssl  17:56   0:25 
> /usr/sbin/mesos-slave --advertise_ip=127.0.0.1 
> --attributes=platform:centos;platform_major_version:7;type:base 
> --cgroups_enable_cfs --cgroups_hierarchy=/sys/fs/cgroup 
> --cgroups_net_cls_primary_handle=0xC370 
> --container_logger=org_apache_mesos_LogrotateContainerLogger 
> --containerizers=mesos,docker 
> --credential=file:///etc/mesos-chef/slave-credential 
> --default_container_info={"type":"MESOS","volumes":[{"host_path":"tmp","container_path":"/tmp","mode":"RW"}]}
>  --default_role=default --docker_registry=/usr/share/mesos/users 
> --docker_store_dir=/var/opt/mesos/store/docker 
> --egress_unique_flow_per_container --enforce_container_disk_quota 
> --ephemeral_ports_per_container=128 
> --executor_environment_variables={"PATH":"/bin:/usr/bin:/usr/sbin","CRITEO_DC":"par","CRITEO_ENV":"prod"}
>  --image_providers=docker --image_provisioner_backend=copy 
> --isolation=cgroups/cpu,cgroups/mem,cgroups/net_cls,namespaces/pid,disk/du,filesystem/shared,filesystem/linux,docker/runtime,network/cni,network/port_mapping
>  --logging_level=INFO 
> --master=zk://mesos:test@localhost.localdomain:2181/mesos 
> --modules=file:///etc/mesos-chef/slave-modules.json --port=5051 
> --recover=reconnect 
> --resources=ports:[31000-32000];ephemeral_ports:[32768-57344] --strict 
> --work_dir=/var/opt/mesos
> root 28484  0.0  2.3 433676 95016 ?Ssl  17:56   0:00  \_ 
> mesos-logrotate-logger --help=false 
> --log_filename=/var/opt/mesos/slaves/cdf94219-87b2-4af2-9f61-5697f0442915-S0/frameworks/366e8ed2-730e-4423-9324-086704d182b0-/executors/group_simplehttp.16f7c2ee-f3a8-11e6-be1c-0242b44d071f/runs/1d3e6b1c-cda8-47e5-92c4-a161429a7ac6/stdout
>  --logrotate_options=rotate 5 --logrotate_path=logrotate --max_size=10MB
> root 28485  0.0  2.3 499212 94724 ?Ssl  17:56   0:00  \_ 
> mesos-logrotate-logger --help=false 
> --log_filename=/var/opt/mesos/slaves/cdf94219-87b2-4af2-9f61-5697f0442915-S0/frameworks/366e8ed2-730e-4423-9324-086704d182b0-/executors/group_simplehttp.16f7c2ee-f3a8-11e6-be1c-0242b44d071f/runs/1d3e6b1c-cda8-47e5-92c4-a161429a7ac6/st

[jira] [Updated] (MESOS-8052) "protoc" not found when running "make -j4 check" directly in stout

2017-10-04 Thread Chun-Hung Hsiao (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chun-Hung Hsiao updated MESOS-8052:
---
Description: 
If we run {{make -j4 check}} without running {{make}} first, we will get the 
following error message:
{noformat}
3rdparty/protobuf-3.3.0/src/protoc -I../tests --cpp_out=. 
../tests/protobuf_tests.proto
/bin/bash: 3rdparty/protobuf-3.3.0/src/protoc: No such file or directory
Makefile:1934: recipe for target 'protobuf_tests.pb.cc' failed
make: *** [protobuf_tests.pb.cc] Error 127
{noformat}

  was:
+underlined text+If we run {{make tests}} without running {{make}} first, 
{{tests/protobuf_tests.proto}} would not be compiled, and thus the generated 
files would be missing:
{noformat}
g++ -DPACKAGE_NAME=\"stout\" -DPACKAGE_TARNAME=\"stout\" 
-DPACKAGE_VERSION=\"0.1.0\" -DPACKAGE_STRING=\"stout\ 0.1.0\" 
-DPACKAGE_BUGREPORT=\"\" -DPACKAGE_URL=\"\" -DPACKAGE=\"stout\" 
-DVERSION=\"0.1.0\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 
-DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 
-DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 
-DLT_OBJDIR=\".libs/\" -DHAVE_PTHREAD_PRIO_INHERIT=1 -DHAVE_PTHREAD=1 
-DHAVE_LIBDL=1 -DHAVE_FTS_H=1 -DHAVE_APR_POOLS_H=1 -DHAVE_LIBAPR_1=1 
-DHAVE_SVN_DELTA_H=1 -DHAVE_LIBSVN_DELTA_1=1 -DHAVE_SVN_VERSION_H=1 
-DHAVE_LIBSVN_SUBR_1=1 -DHAVE_CXX11=1 -I. -I..  -I../include -isystem 
3rdparty/boost-1.53.0 -I3rdparty/elfio-3.2 -I3rdparty/glog-0.3.3/src 
-I3rdparty/googletest-release-1.8.0/googlemock/include 
-I3rdparty/googletest-release-1.8.0/googletest/include -DPICOJSON_USE_INT64 
-D__STDC_FORMAT_MACROS -I3rdparty/picojson-1.3.0 -I3rdparty/protobuf-3.3.0/src  
-I/usr/include/subversion-1 -I/usr/include/apr-1 -I/usr/include/apr-1.0  -Wall 
-Wsign-compare -Wformat-security -fstack-protector-strong -fPIC -fPIE -g1 -O0 
-Wno-unused-local-typedefs -std=c++11 -MT stout_tests-protobuf_tests.o -MD -MP 
-MF .deps/stout_tests-protobuf_tests.Tpo -c -o stout_tests-protobuf_tests.o 
`test -f 'tests/protobuf_tests.cpp' || echo '../'`tests/protobuf_tests.cpp
../tests/protobuf_tests.cpp:28:31: fatal error: protobuf_tests.pb.h: No such 
file or directory
compilation terminated.
Makefile:1278: recipe for target 'stout_tests-protobuf_tests.o' failed
make[1]: *** [stout_tests-protobuf_tests.o] Error 1
{noformat}


> "protoc" not found when running "make -j4 check" directly in stout
> --
>
> Key: MESOS-8052
> URL: https://issues.apache.org/jira/browse/MESOS-8052
> Project: Mesos
>  Issue Type: Bug
>  Components: stout
>Reporter: Chun-Hung Hsiao
>Assignee: Chun-Hung Hsiao
>  Labels: compile-error
> Fix For: 1.4.1
>
>
> If we run {{make -j4 check}} without running {{make}} first, we will get the 
> following error message:
> {noformat}
> 3rdparty/protobuf-3.3.0/src/protoc -I../tests --cpp_out=. 
> ../tests/protobuf_tests.proto
> /bin/bash: 3rdparty/protobuf-3.3.0/src/protoc: No such file or directory
> Makefile:1934: recipe for target 'protobuf_tests.pb.cc' failed
> make: *** [protobuf_tests.pb.cc] Error 127
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8052) "protoc" not found when running "make -j4 check" directly in stout

2017-10-04 Thread Chun-Hung Hsiao (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chun-Hung Hsiao updated MESOS-8052:
---
Summary: "protoc" not found when running "make -j4 check" directly in stout 
 (was: "protobuf_tests.pb.h" not found when running "make tests" directly in 
stout)

> "protoc" not found when running "make -j4 check" directly in stout
> --
>
> Key: MESOS-8052
> URL: https://issues.apache.org/jira/browse/MESOS-8052
> Project: Mesos
>  Issue Type: Bug
>  Components: stout
>Reporter: Chun-Hung Hsiao
>Assignee: Chun-Hung Hsiao
>  Labels: compile-error
> Fix For: 1.4.1
>
>
> +underlined text+If we run {{make tests}} without running {{make}} first, 
> {{tests/protobuf_tests.proto}} would not be compiled, and thus the generated 
> files would be missing:
> {noformat}
> g++ -DPACKAGE_NAME=\"stout\" -DPACKAGE_TARNAME=\"stout\" 
> -DPACKAGE_VERSION=\"0.1.0\" -DPACKAGE_STRING=\"stout\ 0.1.0\" 
> -DPACKAGE_BUGREPORT=\"\" -DPACKAGE_URL=\"\" -DPACKAGE=\"stout\" 
> -DVERSION=\"0.1.0\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 
> -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 
> -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 
> -DLT_OBJDIR=\".libs/\" -DHAVE_PTHREAD_PRIO_INHERIT=1 -DHAVE_PTHREAD=1 
> -DHAVE_LIBDL=1 -DHAVE_FTS_H=1 -DHAVE_APR_POOLS_H=1 -DHAVE_LIBAPR_1=1 
> -DHAVE_SVN_DELTA_H=1 -DHAVE_LIBSVN_DELTA_1=1 -DHAVE_SVN_VERSION_H=1 
> -DHAVE_LIBSVN_SUBR_1=1 -DHAVE_CXX11=1 -I. -I..  -I../include -isystem 
> 3rdparty/boost-1.53.0 -I3rdparty/elfio-3.2 -I3rdparty/glog-0.3.3/src 
> -I3rdparty/googletest-release-1.8.0/googlemock/include 
> -I3rdparty/googletest-release-1.8.0/googletest/include -DPICOJSON_USE_INT64 
> -D__STDC_FORMAT_MACROS -I3rdparty/picojson-1.3.0 
> -I3rdparty/protobuf-3.3.0/src  -I/usr/include/subversion-1 
> -I/usr/include/apr-1 -I/usr/include/apr-1.0  -Wall -Wsign-compare 
> -Wformat-security -fstack-protector-strong -fPIC -fPIE -g1 -O0 
> -Wno-unused-local-typedefs -std=c++11 -MT stout_tests-protobuf_tests.o -MD 
> -MP -MF .deps/stout_tests-protobuf_tests.Tpo -c -o 
> stout_tests-protobuf_tests.o `test -f 'tests/protobuf_tests.cpp' || echo 
> '../'`tests/protobuf_tests.cpp
> ../tests/protobuf_tests.cpp:28:31: fatal error: protobuf_tests.pb.h: No such 
> file or directory
> compilation terminated.
> Makefile:1278: recipe for target 'stout_tests-protobuf_tests.o' failed
> make[1]: *** [stout_tests-protobuf_tests.o] Error 1
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8052) "protobuf_tests.pb.h" not found when running "make tests" directly in stout

2017-10-04 Thread Chun-Hung Hsiao (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chun-Hung Hsiao updated MESOS-8052:
---
Description: 
+underlined text+If we run {{make tests}} without running {{make}} first, 
{{tests/protobuf_tests.proto}} would not be compiled, and thus the generated 
files would be missing:
{noformat}
g++ -DPACKAGE_NAME=\"stout\" -DPACKAGE_TARNAME=\"stout\" 
-DPACKAGE_VERSION=\"0.1.0\" -DPACKAGE_STRING=\"stout\ 0.1.0\" 
-DPACKAGE_BUGREPORT=\"\" -DPACKAGE_URL=\"\" -DPACKAGE=\"stout\" 
-DVERSION=\"0.1.0\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 
-DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 
-DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 
-DLT_OBJDIR=\".libs/\" -DHAVE_PTHREAD_PRIO_INHERIT=1 -DHAVE_PTHREAD=1 
-DHAVE_LIBDL=1 -DHAVE_FTS_H=1 -DHAVE_APR_POOLS_H=1 -DHAVE_LIBAPR_1=1 
-DHAVE_SVN_DELTA_H=1 -DHAVE_LIBSVN_DELTA_1=1 -DHAVE_SVN_VERSION_H=1 
-DHAVE_LIBSVN_SUBR_1=1 -DHAVE_CXX11=1 -I. -I..  -I../include -isystem 
3rdparty/boost-1.53.0 -I3rdparty/elfio-3.2 -I3rdparty/glog-0.3.3/src 
-I3rdparty/googletest-release-1.8.0/googlemock/include 
-I3rdparty/googletest-release-1.8.0/googletest/include -DPICOJSON_USE_INT64 
-D__STDC_FORMAT_MACROS -I3rdparty/picojson-1.3.0 -I3rdparty/protobuf-3.3.0/src  
-I/usr/include/subversion-1 -I/usr/include/apr-1 -I/usr/include/apr-1.0  -Wall 
-Wsign-compare -Wformat-security -fstack-protector-strong -fPIC -fPIE -g1 -O0 
-Wno-unused-local-typedefs -std=c++11 -MT stout_tests-protobuf_tests.o -MD -MP 
-MF .deps/stout_tests-protobuf_tests.Tpo -c -o stout_tests-protobuf_tests.o 
`test -f 'tests/protobuf_tests.cpp' || echo '../'`tests/protobuf_tests.cpp
../tests/protobuf_tests.cpp:28:31: fatal error: protobuf_tests.pb.h: No such 
file or directory
compilation terminated.
Makefile:1278: recipe for target 'stout_tests-protobuf_tests.o' failed
make[1]: *** [stout_tests-protobuf_tests.o] Error 1
{noformat}

  was:
If we run {{make tests}} without running {{make}} first, 
{{tests/protobuf_tests.proto}} would not be compiled, and thus the generated 
files would be missing:
{noformat}
g++ -DPACKAGE_NAME=\"stout\" -DPACKAGE_TARNAME=\"stout\" 
-DPACKAGE_VERSION=\"0.1.0\" -DPACKAGE_STRING=\"stout\ 0.1.0\" 
-DPACKAGE_BUGREPORT=\"\" -DPACKAGE_URL=\"\" -DPACKAGE=\"stout\" 
-DVERSION=\"0.1.0\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 
-DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 
-DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 
-DLT_OBJDIR=\".libs/\" -DHAVE_PTHREAD_PRIO_INHERIT=1 -DHAVE_PTHREAD=1 
-DHAVE_LIBDL=1 -DHAVE_FTS_H=1 -DHAVE_APR_POOLS_H=1 -DHAVE_LIBAPR_1=1 
-DHAVE_SVN_DELTA_H=1 -DHAVE_LIBSVN_DELTA_1=1 -DHAVE_SVN_VERSION_H=1 
-DHAVE_LIBSVN_SUBR_1=1 -DHAVE_CXX11=1 -I. -I..  -I../include -isystem 
3rdparty/boost-1.53.0 -I3rdparty/elfio-3.2 -I3rdparty/glog-0.3.3/src 
-I3rdparty/googletest-release-1.8.0/googlemock/include 
-I3rdparty/googletest-release-1.8.0/googletest/include -DPICOJSON_USE_INT64 
-D__STDC_FORMAT_MACROS -I3rdparty/picojson-1.3.0 -I3rdparty/protobuf-3.3.0/src  
-I/usr/include/subversion-1 -I/usr/include/apr-1 -I/usr/include/apr-1.0  -Wall 
-Wsign-compare -Wformat-security -fstack-protector-strong -fPIC -fPIE -g1 -O0 
-Wno-unused-local-typedefs -std=c++11 -MT stout_tests-protobuf_tests.o -MD -MP 
-MF .deps/stout_tests-protobuf_tests.Tpo -c -o stout_tests-protobuf_tests.o 
`test -f 'tests/protobuf_tests.cpp' || echo '../'`tests/protobuf_tests.cpp
../tests/protobuf_tests.cpp:28:31: fatal error: protobuf_tests.pb.h: No such 
file or directory
compilation terminated.
Makefile:1278: recipe for target 'stout_tests-protobuf_tests.o' failed
make[1]: *** [stout_tests-protobuf_tests.o] Error 1
{noformat}


> "protobuf_tests.pb.h" not found when running "make tests" directly in stout
> ---
>
> Key: MESOS-8052
> URL: https://issues.apache.org/jira/browse/MESOS-8052
> Project: Mesos
>  Issue Type: Bug
>  Components: stout
>Reporter: Chun-Hung Hsiao
>Assignee: Chun-Hung Hsiao
>  Labels: compile-error
> Fix For: 1.4.1
>
>
> +underlined text+If we run {{make tests}} without running {{make}} first, 
> {{tests/protobuf_tests.proto}} would not be compiled, and thus the generated 
> files would be missing:
> {noformat}
> g++ -DPACKAGE_NAME=\"stout\" -DPACKAGE_TARNAME=\"stout\" 
> -DPACKAGE_VERSION=\"0.1.0\" -DPACKAGE_STRING=\"stout\ 0.1.0\" 
> -DPACKAGE_BUGREPORT=\"\" -DPACKAGE_URL=\"\" -DPACKAGE=\"stout\" 
> -DVERSION=\"0.1.0\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 
> -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 
> -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 
> -DLT_OBJDIR=\".libs/\" -DHAVE_PTHREAD_PRIO_INH

[jira] [Commented] (MESOS-6240) Allow executor/agent communication over non-TCP/IP stream socket.

2017-10-04 Thread Aaron Wood (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16191759#comment-16191759
 ] 

Aaron Wood commented on MESOS-6240:
---

+1 to what [~zhitao] said!

> Allow executor/agent communication over non-TCP/IP stream socket.
> -
>
> Key: MESOS-6240
> URL: https://issues.apache.org/jira/browse/MESOS-6240
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
> Environment: Linux and Windows
>Reporter: Avinash Sridharan
>Assignee: Benjamin Hindman
>Priority: Critical
>  Labels: mesosphere
>
> Currently, the executor agent communication happens specifically over TCP 
> sockets. This works fine in most cases, but specifically for the 
> `MesosContainerizer` when containers are running on CNI networks, this mode 
> of communication starts imposing constraints on the CNI network. Since, now 
> there has to connectivity between the CNI network  (on which the executor is 
> running) and the agent. Introducing paths from a CNI network to the 
> underlying agent, at best, creates headaches for operators and at worst 
> introduces serious security holes in the network, since it is breaking the 
> isolation between the container CNI network and the host network (on which 
> the agent is running).
> In order to simplify/strengthen deployment of Mesos containers on CNI 
> networks we therefore need to move away from using TCP/IP sockets for 
> executor/agent communication. Since, executor and agent are guaranteed to run 
> on the same host, the above problems can be resolved if, for the 
> `MesosContainerizer`, we use UNIX domain sockets or named pipes instead of 
> TCP/IP sockets for the executor/agent communication.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (MESOS-8052) "protobuf_tests.pb.h" not found when running "make tests" directly in stout

2017-10-04 Thread Chun-Hung Hsiao (JIRA)
Chun-Hung Hsiao created MESOS-8052:
--

 Summary: "protobuf_tests.pb.h" not found when running "make tests" 
directly in stout
 Key: MESOS-8052
 URL: https://issues.apache.org/jira/browse/MESOS-8052
 Project: Mesos
  Issue Type: Bug
  Components: stout
Reporter: Chun-Hung Hsiao
Assignee: Chun-Hung Hsiao
 Fix For: 1.4.1


If we run {{make tests}} without running {{make}} first, 
{{tests/protobuf_tests.proto}} would not be compiled, and thus the generated 
files would be missing:
{noformat}
g++ -DPACKAGE_NAME=\"stout\" -DPACKAGE_TARNAME=\"stout\" 
-DPACKAGE_VERSION=\"0.1.0\" -DPACKAGE_STRING=\"stout\ 0.1.0\" 
-DPACKAGE_BUGREPORT=\"\" -DPACKAGE_URL=\"\" -DPACKAGE=\"stout\" 
-DVERSION=\"0.1.0\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 
-DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 
-DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 
-DLT_OBJDIR=\".libs/\" -DHAVE_PTHREAD_PRIO_INHERIT=1 -DHAVE_PTHREAD=1 
-DHAVE_LIBDL=1 -DHAVE_FTS_H=1 -DHAVE_APR_POOLS_H=1 -DHAVE_LIBAPR_1=1 
-DHAVE_SVN_DELTA_H=1 -DHAVE_LIBSVN_DELTA_1=1 -DHAVE_SVN_VERSION_H=1 
-DHAVE_LIBSVN_SUBR_1=1 -DHAVE_CXX11=1 -I. -I..  -I../include -isystem 
3rdparty/boost-1.53.0 -I3rdparty/elfio-3.2 -I3rdparty/glog-0.3.3/src 
-I3rdparty/googletest-release-1.8.0/googlemock/include 
-I3rdparty/googletest-release-1.8.0/googletest/include -DPICOJSON_USE_INT64 
-D__STDC_FORMAT_MACROS -I3rdparty/picojson-1.3.0 -I3rdparty/protobuf-3.3.0/src  
-I/usr/include/subversion-1 -I/usr/include/apr-1 -I/usr/include/apr-1.0  -Wall 
-Wsign-compare -Wformat-security -fstack-protector-strong -fPIC -fPIE -g1 -O0 
-Wno-unused-local-typedefs -std=c++11 -MT stout_tests-protobuf_tests.o -MD -MP 
-MF .deps/stout_tests-protobuf_tests.Tpo -c -o stout_tests-protobuf_tests.o 
`test -f 'tests/protobuf_tests.cpp' || echo '../'`tests/protobuf_tests.cpp
../tests/protobuf_tests.cpp:28:31: fatal error: protobuf_tests.pb.h: No such 
file or directory
compilation terminated.
Makefile:1278: recipe for target 'stout_tests-protobuf_tests.o' failed
make[1]: *** [stout_tests-protobuf_tests.o] Error 1
{noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-6240) Allow executor/agent communication over non-TCP/IP stream socket.

2017-10-04 Thread Zhitao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16191610#comment-16191610
 ] 

Zhitao Li commented on MESOS-6240:
--

+1

Taking out executor to agent API from TCP to domain socket will also reduce 
some potential security exposure of agent.

Is there a design doc for this work?

> Allow executor/agent communication over non-TCP/IP stream socket.
> -
>
> Key: MESOS-6240
> URL: https://issues.apache.org/jira/browse/MESOS-6240
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
> Environment: Linux and Windows
>Reporter: Avinash Sridharan
>Assignee: Benjamin Hindman
>Priority: Critical
>  Labels: mesosphere
>
> Currently, the executor agent communication happens specifically over TCP 
> sockets. This works fine in most cases, but specifically for the 
> `MesosContainerizer` when containers are running on CNI networks, this mode 
> of communication starts imposing constraints on the CNI network. Since, now 
> there has to connectivity between the CNI network  (on which the executor is 
> running) and the agent. Introducing paths from a CNI network to the 
> underlying agent, at best, creates headaches for operators and at worst 
> introduces serious security holes in the network, since it is breaking the 
> isolation between the container CNI network and the host network (on which 
> the agent is running).
> In order to simplify/strengthen deployment of Mesos containers on CNI 
> networks we therefore need to move away from using TCP/IP sockets for 
> executor/agent communication. Since, executor and agent are guaranteed to run 
> on the same host, the above problems can be resolved if, for the 
> `MesosContainerizer`, we use UNIX domain sockets or named pipes instead of 
> TCP/IP sockets for the executor/agent communication.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (MESOS-6240) Allow executor/agent communication over non-TCP/IP stream socket.

2017-10-04 Thread Avinash Sridharan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Avinash Sridharan reassigned MESOS-6240:


Assignee: Benjamin Hindman

> Allow executor/agent communication over non-TCP/IP stream socket.
> -
>
> Key: MESOS-6240
> URL: https://issues.apache.org/jira/browse/MESOS-6240
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
> Environment: Linux and Windows
>Reporter: Avinash Sridharan
>Assignee: Benjamin Hindman
>Priority: Critical
>  Labels: mesosphere
>
> Currently, the executor agent communication happens specifically over TCP 
> sockets. This works fine in most cases, but specifically for the 
> `MesosContainerizer` when containers are running on CNI networks, this mode 
> of communication starts imposing constraints on the CNI network. Since, now 
> there has to connectivity between the CNI network  (on which the executor is 
> running) and the agent. Introducing paths from a CNI network to the 
> underlying agent, at best, creates headaches for operators and at worst 
> introduces serious security holes in the network, since it is breaking the 
> isolation between the container CNI network and the host network (on which 
> the agent is running).
> In order to simplify/strengthen deployment of Mesos containers on CNI 
> networks we therefore need to move away from using TCP/IP sockets for 
> executor/agent communication. Since, executor and agent are guaranteed to run 
> on the same host, the above problems can be resolved if, for the 
> `MesosContainerizer`, we use UNIX domain sockets or named pipes instead of 
> TCP/IP sockets for the executor/agent communication.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-6240) Allow executor/agent communication over non-TCP/IP stream socket.

2017-10-04 Thread Avinash Sridharan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Avinash Sridharan updated MESOS-6240:
-
Target Version/s: 1.5.0

> Allow executor/agent communication over non-TCP/IP stream socket.
> -
>
> Key: MESOS-6240
> URL: https://issues.apache.org/jira/browse/MESOS-6240
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
> Environment: Linux and Windows
>Reporter: Avinash Sridharan
>Assignee: Benjamin Hindman
>Priority: Critical
>  Labels: mesosphere
>
> Currently, the executor agent communication happens specifically over TCP 
> sockets. This works fine in most cases, but specifically for the 
> `MesosContainerizer` when containers are running on CNI networks, this mode 
> of communication starts imposing constraints on the CNI network. Since, now 
> there has to connectivity between the CNI network  (on which the executor is 
> running) and the agent. Introducing paths from a CNI network to the 
> underlying agent, at best, creates headaches for operators and at worst 
> introduces serious security holes in the network, since it is breaking the 
> isolation between the container CNI network and the host network (on which 
> the agent is running).
> In order to simplify/strengthen deployment of Mesos containers on CNI 
> networks we therefore need to move away from using TCP/IP sockets for 
> executor/agent communication. Since, executor and agent are guaranteed to run 
> on the same host, the above problems can be resolved if, for the 
> `MesosContainerizer`, we use UNIX domain sockets or named pipes instead of 
> TCP/IP sockets for the executor/agent communication.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7951) Extend the KillPolicy

2017-10-04 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-7951:
--
Sprint: Mesosphere Sprint 63, Mesosphere Sprint 64  (was: Mesosphere Sprint 
63, Mesosphere Sprint 64, Mesosphere Sprint 65)

> Extend the KillPolicy
> -
>
> Key: MESOS-7951
> URL: https://issues.apache.org/jira/browse/MESOS-7951
> Project: Mesos
>  Issue Type: Improvement
>  Components: agent, executor, HTTP API
>Reporter: Greg Mann
>Assignee: Greg Mann
>  Labels: mesosphere
>
> After introducing the {{KillPolicy}} in MESOS-4909, some interactions with 
> framework developers have led to the suggestion of a couple possible 
> improvements to this interface. Namely,
> * Allowing the framework to specify a command to be run to initiate 
> termination, rather than a signal to be sent, would allow some developers to 
> avoid wrapping their application in a signal handler. This is useful because 
> a signal handler wrapper modifies the application's process tree, which may 
> make introspection and debugging more difficult in the case of well-known 
> services with standard debugging procedures.
> * In the case of terminations which do begin with a signal, it would be 
> useful to allow the framework to specify the signal to be sent, rather than 
> assuming SIGTERM. PostgreSQL, for example, permits several shutdown types, 
> each initiated with a [different 
> signal|https://www.postgresql.org/docs/9.3/static/server-shutdown.html].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7951) Extend the KillPolicy

2017-10-04 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-7951:
--
Sprint: Mesosphere Sprint 63, Mesosphere Sprint 64, Mesosphere Sprint 66  
(was: Mesosphere Sprint 63, Mesosphere Sprint 64)

> Extend the KillPolicy
> -
>
> Key: MESOS-7951
> URL: https://issues.apache.org/jira/browse/MESOS-7951
> Project: Mesos
>  Issue Type: Improvement
>  Components: agent, executor, HTTP API
>Reporter: Greg Mann
>Assignee: Greg Mann
>  Labels: mesosphere
>
> After introducing the {{KillPolicy}} in MESOS-4909, some interactions with 
> framework developers have led to the suggestion of a couple possible 
> improvements to this interface. Namely,
> * Allowing the framework to specify a command to be run to initiate 
> termination, rather than a signal to be sent, would allow some developers to 
> avoid wrapping their application in a signal handler. This is useful because 
> a signal handler wrapper modifies the application's process tree, which may 
> make introspection and debugging more difficult in the case of well-known 
> services with standard debugging procedures.
> * In the case of terminations which do begin with a signal, it would be 
> useful to allow the framework to specify the signal to be sent, rather than 
> assuming SIGTERM. PostgreSQL, for example, permits several shutdown types, 
> each initiated with a [different 
> signal|https://www.postgresql.org/docs/9.3/static/server-shutdown.html].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7130) port_mapping isolator: executor hangs when running on EC2

2017-10-04 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16191565#comment-16191565
 ] 

Vinod Kone commented on MESOS-7130:
---

Story points?

> port_mapping isolator: executor hangs when running on EC2
> -
>
> Key: MESOS-7130
> URL: https://issues.apache.org/jira/browse/MESOS-7130
> Project: Mesos
>  Issue Type: Bug
>  Components: executor
>Reporter: Pierre Cheynier
>Assignee: Jie Yu
>
> Hi,
> I'm experiencing a weird issue: I'm using a CI to do testing on 
> infrastructure automation.
> I recently activated the {{network/port_mapping}} isolator.
> I'm able to make the changes work and pass the test for bare-metal servers 
> and virtualbox VMs using this configuration.
> But when I try on EC2 (on which my CI pipeline rely) it systematically fails 
> to run any container.
> It appears that the sandbox is created and the port_mapping isolator seems to 
> be OK according to the logs in stdout and stderr and the {tc} output :
> {noformat}
> + mount --make-rslave /run/netns
> + test -f /proc/sys/net/ipv6/conf/all/disable_ipv6
> + echo 1
> + ip link set lo address 02:44:20:bb:42:cf mtu 9001 up
> + ethtool -K eth0 rx off
> (...)
> + tc filter show dev eth0 parent :0
> + tc filter show dev lo parent :0
> I0215 16:01:13.941375 1 exec.cpp:161] Version: 1.0.2
> {noformat}
> Then the executor never come back in REGISTERED state and hang indefinitely.
> {GLOG_v=3} doesn't help here.
> My skills in this area are limited, but trying to load the symbols and attach 
> a gdb to the mesos-executor process, I'm able to print this stack:
> {noformat}
> #0  0x7feffc1386d5 in pthread_cond_wait@@GLIBC_2.3.2 () from 
> /usr/lib64/libpthread.so.0
> #1  0x7feffbed69ec in 
> std::condition_variable::wait(std::unique_lock&) () from 
> /usr/lib64/libstdc++.so.6
> #2  0x7ff0003dd8ec in void synchronized_wait std::mutex>(std::condition_variable*, std::mutex*) () from 
> /usr/lib64/libmesos-1.0.2.so
> #3  0x7ff0017d595d in Gate::arrive(long) () from 
> /usr/lib64/libmesos-1.0.2.so
> #4  0x7ff0017c00ed in process::ProcessManager::wait(process::UPID const&) 
> () from /usr/lib64/libmesos-1.0.2.so
> #5  0x7ff0017c5c05 in process::wait(process::UPID const&, Duration 
> const&) () from /usr/lib64/libmesos-1.0.2.so
> #6  0x004ab26f in process::wait(process::ProcessBase const*, Duration 
> const&) ()
> #7  0x004a3903 in main ()
> {noformat}
> I concluded that the underlying shell script launched by the isolator or the 
> task itself is just .. blocked. But I don't understand why.
> Here is a process tree to show that I've no task running but the executor is:
> {noformat}
> root 28420  0.8  3.0 1061420 124940 ?  Ssl  17:56   0:25 
> /usr/sbin/mesos-slave --advertise_ip=127.0.0.1 
> --attributes=platform:centos;platform_major_version:7;type:base 
> --cgroups_enable_cfs --cgroups_hierarchy=/sys/fs/cgroup 
> --cgroups_net_cls_primary_handle=0xC370 
> --container_logger=org_apache_mesos_LogrotateContainerLogger 
> --containerizers=mesos,docker 
> --credential=file:///etc/mesos-chef/slave-credential 
> --default_container_info={"type":"MESOS","volumes":[{"host_path":"tmp","container_path":"/tmp","mode":"RW"}]}
>  --default_role=default --docker_registry=/usr/share/mesos/users 
> --docker_store_dir=/var/opt/mesos/store/docker 
> --egress_unique_flow_per_container --enforce_container_disk_quota 
> --ephemeral_ports_per_container=128 
> --executor_environment_variables={"PATH":"/bin:/usr/bin:/usr/sbin","CRITEO_DC":"par","CRITEO_ENV":"prod"}
>  --image_providers=docker --image_provisioner_backend=copy 
> --isolation=cgroups/cpu,cgroups/mem,cgroups/net_cls,namespaces/pid,disk/du,filesystem/shared,filesystem/linux,docker/runtime,network/cni,network/port_mapping
>  --logging_level=INFO 
> --master=zk://mesos:test@localhost.localdomain:2181/mesos 
> --modules=file:///etc/mesos-chef/slave-modules.json --port=5051 
> --recover=reconnect 
> --resources=ports:[31000-32000];ephemeral_ports:[32768-57344] --strict 
> --work_dir=/var/opt/mesos
> root 28484  0.0  2.3 433676 95016 ?Ssl  17:56   0:00  \_ 
> mesos-logrotate-logger --help=false 
> --log_filename=/var/opt/mesos/slaves/cdf94219-87b2-4af2-9f61-5697f0442915-S0/frameworks/366e8ed2-730e-4423-9324-086704d182b0-/executors/group_simplehttp.16f7c2ee-f3a8-11e6-be1c-0242b44d071f/runs/1d3e6b1c-cda8-47e5-92c4-a161429a7ac6/stdout
>  --logrotate_options=rotate 5 --logrotate_path=logrotate --max_size=10MB
> root 28485  0.0  2.3 499212 94724 ?Ssl  17:56   0:00  \_ 
> mesos-logrotate-logger --help=false 
> --log_filename=/var/opt/mesos/slaves/cdf94219-87b2-4af2-9f61-5697f0442915-S0/frameworks/366e8ed2-730e-4423-9324-086704d182b0-/executors/group_simplehttp.16f7c2ee-f3a8-11e6-be

[jira] [Commented] (MESOS-7975) The command/default/docker executor can incorrectly send a TASK_FINISHED update even when the task is killed

2017-10-04 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16191564#comment-16191564
 ] 

Vinod Kone commented on MESOS-7975:
---

[~qianzhang] Can you add story points for this?

> The command/default/docker executor can incorrectly send a TASK_FINISHED 
> update even when the task is killed
> 
>
> Key: MESOS-7975
> URL: https://issues.apache.org/jira/browse/MESOS-7975
> Project: Mesos
>  Issue Type: Bug
>Reporter: Anand Mazumdar
>Assignee: Qian Zhang
>Priority: Critical
>  Labels: mesosphere
>
> Currently, when a task is killed, the default/command/docker executor 
> incorrectly send a {{TASK_FINISHED}} status update instead of 
> {{TASK_KILLED}}. This is due to an unfortunate missed conditional check when 
> the task exits with a zero status code.
> {code}
>   if (WSUCCEEDED(status)) {
> taskState = TASK_FINISHED;
>   } else if (killed) {
> // Send TASK_KILLED if the task was killed as a result of
> // kill() or shutdown().
> taskState = TASK_KILLED;
>   } else {
> taskState = TASK_FAILED;
>   }
> {code}
> We should modify the code to correctly send {{TASK_KILLED}} status updates 
> when a task is killed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8051) Killing TASK_GROUP fail to kill some tasks

2017-10-04 Thread A. Dukhovniy (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

A. Dukhovniy updated MESOS-8051:

Summary: Killing TASK_GROUP fail to kill some tasks  (was: Killing 
TASK_GROUP fails to kill some tasks)

> Killing TASK_GROUP fail to kill some tasks
> --
>
> Key: MESOS-8051
> URL: https://issues.apache.org/jira/browse/MESOS-8051
> Project: Mesos
>  Issue Type: Bug
>  Components: agent, executor
>Affects Versions: 1.4.0
>Reporter: A. Dukhovniy
>Priority: Critical
> Attachments: dcos-mesos-master.log.gz, dcos-mesos-slave.log.gz, 
> screenshot-1.png
>
>
> When starting following pod definition via marathon:
> {code:java}
> {
>   "id": "/simple-pod",
>   "scaling": {
> "kind": "fixed",
> "instances": 3
>   },
>   "environment": {
> "PING": "PONG"
>   },
>   "containers": [
> {
>   "name": "ct1",
>   "resources": {
> "cpus": 0.1,
> "mem": 32
>   },
>   "image": {
> "kind": "MESOS",
> "id": "busybox"
>   },
>   "exec": {
> "command": {
>   "shell": "while true; do echo the current time is $(date) > 
> ./test-v1/clock; sleep 1; done"
> }
>   },
>   "volumeMounts": [
> {
>   "name": "v1",
>   "mountPath": "test-v1"
> }
>   ]
> },
> {
>   "name": "ct2",
>   "resources": {
> "cpus": 0.1,
> "mem": 32
>   },
>   "exec": {
> "command": {
>   "shell": "while true; do echo -n $PING ' '; cat ./etc/clock; sleep 
> 1; done"
> }
>   },
>   "volumeMounts": [
> {
>   "name": "v1",
>   "mountPath": "etc"
> },
> {
>   "name": "v2",
>   "mountPath": "docker"
> }
>   ]
> }
>   ],
>   "networks": [
> {
>   "mode": "host"
> }
>   ],
>   "volumes": [
> {
>   "name": "v1"
> },
> {
>   "name": "v2",
>   "host": "/var/lib/docker"
> }
>   ]
> }
> {code}
> mesos will successfully kill all {{ct2}} containers but fail to kill all/some 
> of the {{ct1}} containers. I've attached both master and agent logs. The 
> interesting part starts after marathon issues 6 kills:
> {code:java}
> Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal 
> mesos-master[4708]: I1004 14:58:25.209966  4746 master.cpp:5297] Processing 
> KILL call for task 'simple-pod.instance-3c1098e5-a914-11e7-bcd5-e63c853d
> bf20.ct1' of framework bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) 
> at scheduler-c61c493c-728f-4bd9-be60-7373574749af@10.0.5.229:15101
> Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal 
> mesos-master[4708]: I1004 14:58:25.210033  4746 master.cpp:5371] Telling 
> agent bae11d5d-20c2-4d66-9ec3-773d1d717e58-S1 at slave(1)@10.0.1.207:5051 (
> 10.0.1.207) to kill task 
> simple-pod.instance-3c1098e5-a914-11e7-bcd5-e63c853dbf20.ct1 of framework 
> bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) at 
> scheduler-c61c493c-728f-4bd9-be60-7373574749af@10.0.5
> .229:15101
> Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal 
> mesos-master[4708]: I1004 14:58:25.210471  4748 master.cpp:5297] Processing 
> KILL call for task 'simple-pod.instance-3c1098e5-a914-11e7-bcd5-e63c853d
> bf20.ct2' of framework bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) 
> at scheduler-c61c493c-728f-4bd9-be60-7373574749af@10.0.5.229:15101
> Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal 
> mesos-master[4708]: I1004 14:58:25.210518  4748 master.cpp:5371] Telling 
> agent bae11d5d-20c2-4d66-9ec3-773d1d717e58-S1 at slave(1)@10.0.1.207:5051 (
> 10.0.1.207) to kill task 
> simple-pod.instance-3c1098e5-a914-11e7-bcd5-e63c853dbf20.ct2 of framework 
> bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) at 
> scheduler-c61c493c-728f-4bd9-be60-7373574749af@10.0.5
> .229:15101
> Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal 
> mesos-master[4708]: I1004 14:58:25.210602  4748 master.cpp:5297] Processing 
> KILL call for task 'simple-pod.instance-3c0ffca4-a914-11e7-bcd5-e63c853d
> bf20.ct1' of framework bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) 
> at scheduler-c61c493c-728f-4bd9-be60-7373574749af@10.0.5.229:15101
> Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal 
> mesos-master[4708]: I1004 14:58:25.210639  4748 master.cpp:5371] Telling 
> agent bae11d5d-20c2-4d66-9ec3-773d1d717e58-S1 at slave(1)@10.0.1.207:5051 (
> 10.0.1.207) to kill task 
> simple-pod.instance-3c0ffca4-a914-11e7-bcd5-e63c853dbf20.ct1 of framework 
> bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) at 
> scheduler-c61c493c-728f-4bd9-be60-7373574749af@10.0.5
> .229:15101
> Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal 
> mesos-master[4708]: I1004 14:58:25.210932  4753 mast

[jira] [Commented] (MESOS-8051) Killing TASK_GROUP fails to kill some tasks

2017-10-04 Thread A. Dukhovniy (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16191542#comment-16191542
 ] 

A. Dukhovniy commented on MESOS-8051:
-

It also has nothing to do with the fact that {{ct1}} container has a docker 
image - in another test I removed it and the result is the same - one of the 
containers will fail to stop.

> Killing TASK_GROUP fails to kill some tasks
> ---
>
> Key: MESOS-8051
> URL: https://issues.apache.org/jira/browse/MESOS-8051
> Project: Mesos
>  Issue Type: Bug
>  Components: agent, executor
>Affects Versions: 1.4.0
>Reporter: A. Dukhovniy
>Priority: Critical
> Attachments: dcos-mesos-master.log.gz, dcos-mesos-slave.log.gz, 
> screenshot-1.png
>
>
> When starting following pod definition via marathon:
> {code:java}
> {
>   "id": "/simple-pod",
>   "scaling": {
> "kind": "fixed",
> "instances": 3
>   },
>   "environment": {
> "PING": "PONG"
>   },
>   "containers": [
> {
>   "name": "ct1",
>   "resources": {
> "cpus": 0.1,
> "mem": 32
>   },
>   "image": {
> "kind": "MESOS",
> "id": "busybox"
>   },
>   "exec": {
> "command": {
>   "shell": "while true; do echo the current time is $(date) > 
> ./test-v1/clock; sleep 1; done"
> }
>   },
>   "volumeMounts": [
> {
>   "name": "v1",
>   "mountPath": "test-v1"
> }
>   ]
> },
> {
>   "name": "ct2",
>   "resources": {
> "cpus": 0.1,
> "mem": 32
>   },
>   "exec": {
> "command": {
>   "shell": "while true; do echo -n $PING ' '; cat ./etc/clock; sleep 
> 1; done"
> }
>   },
>   "volumeMounts": [
> {
>   "name": "v1",
>   "mountPath": "etc"
> },
> {
>   "name": "v2",
>   "mountPath": "docker"
> }
>   ]
> }
>   ],
>   "networks": [
> {
>   "mode": "host"
> }
>   ],
>   "volumes": [
> {
>   "name": "v1"
> },
> {
>   "name": "v2",
>   "host": "/var/lib/docker"
> }
>   ]
> }
> {code}
> mesos will successfully kill all {{ct2}} containers but fail to kill all/some 
> of the {{ct1}} containers. I've attached both master and agent logs. The 
> interesting part starts after marathon issues 6 kills:
> {code:java}
> Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal 
> mesos-master[4708]: I1004 14:58:25.209966  4746 master.cpp:5297] Processing 
> KILL call for task 'simple-pod.instance-3c1098e5-a914-11e7-bcd5-e63c853d
> bf20.ct1' of framework bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) 
> at scheduler-c61c493c-728f-4bd9-be60-7373574749af@10.0.5.229:15101
> Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal 
> mesos-master[4708]: I1004 14:58:25.210033  4746 master.cpp:5371] Telling 
> agent bae11d5d-20c2-4d66-9ec3-773d1d717e58-S1 at slave(1)@10.0.1.207:5051 (
> 10.0.1.207) to kill task 
> simple-pod.instance-3c1098e5-a914-11e7-bcd5-e63c853dbf20.ct1 of framework 
> bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) at 
> scheduler-c61c493c-728f-4bd9-be60-7373574749af@10.0.5
> .229:15101
> Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal 
> mesos-master[4708]: I1004 14:58:25.210471  4748 master.cpp:5297] Processing 
> KILL call for task 'simple-pod.instance-3c1098e5-a914-11e7-bcd5-e63c853d
> bf20.ct2' of framework bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) 
> at scheduler-c61c493c-728f-4bd9-be60-7373574749af@10.0.5.229:15101
> Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal 
> mesos-master[4708]: I1004 14:58:25.210518  4748 master.cpp:5371] Telling 
> agent bae11d5d-20c2-4d66-9ec3-773d1d717e58-S1 at slave(1)@10.0.1.207:5051 (
> 10.0.1.207) to kill task 
> simple-pod.instance-3c1098e5-a914-11e7-bcd5-e63c853dbf20.ct2 of framework 
> bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) at 
> scheduler-c61c493c-728f-4bd9-be60-7373574749af@10.0.5
> .229:15101
> Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal 
> mesos-master[4708]: I1004 14:58:25.210602  4748 master.cpp:5297] Processing 
> KILL call for task 'simple-pod.instance-3c0ffca4-a914-11e7-bcd5-e63c853d
> bf20.ct1' of framework bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) 
> at scheduler-c61c493c-728f-4bd9-be60-7373574749af@10.0.5.229:15101
> Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal 
> mesos-master[4708]: I1004 14:58:25.210639  4748 master.cpp:5371] Telling 
> agent bae11d5d-20c2-4d66-9ec3-773d1d717e58-S1 at slave(1)@10.0.1.207:5051 (
> 10.0.1.207) to kill task 
> simple-pod.instance-3c0ffca4-a914-11e7-bcd5-e63c853dbf20.ct1 of framework 
> bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) at 
> scheduler-c61c493c-728f-4bd9-be60-7373574749af@1

[jira] [Updated] (MESOS-8047) SubprocessTest.Status does not always receive a signal

2017-10-04 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-8047:
---
Labels: flaky-test  (was: )

> SubprocessTest.Status does not always receive a signal
> --
>
> Key: MESOS-8047
> URL: https://issues.apache.org/jira/browse/MESOS-8047
> Project: Mesos
>  Issue Type: Bug
>Reporter: Benno Evers
>  Labels: flaky-test
>
> This one seems to be different from MESOS-1705 and MESOS-1738. It might be 
> that previous test runs leave a mesos process running in the background, but 
> I didn't investigate very deeply:
> {code}
> [ RUN  ] SubprocessTest.Status
> /home/bevers/src/mesos/worktrees/master/3rdparty/libprocess/src/tests/subprocess_tests.cpp:281:
>  Failure
> Expecting WIFSIGNALED(s.get().status()()->get()) but  
> WIFEXITED(s.get().status()()->get()) is true and 
> WEXITSTATUS(s.get().status()()->get()) is 0
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7589) CommandExecutorCheckTest.CommandCheckDeliveredAndReconciled is flaky

2017-10-04 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-7589:
---
Labels: flaky-test mesosphere  (was: mesosphere)

> CommandExecutorCheckTest.CommandCheckDeliveredAndReconciled is flaky
> 
>
> Key: MESOS-7589
> URL: https://issues.apache.org/jira/browse/MESOS-7589
> Project: Mesos
>  Issue Type: Bug
>Reporter: Neil Conway
>  Labels: flaky-test, mesosphere
> Attachments: command_check_fail.txt
>
>
> See attached test log; observed on ASF CI.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7971) PersistentVolumeEndpointsTest.EndpointCreateThenOfferRemove test if flaky

2017-10-04 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-7971:
---
Labels: flaky-test mesosphere  (was: )

> PersistentVolumeEndpointsTest.EndpointCreateThenOfferRemove test if flaky
> -
>
> Key: MESOS-7971
> URL: https://issues.apache.org/jira/browse/MESOS-7971
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.4.0
>Reporter: Vinod Kone
>  Labels: flaky-test, mesosphere
>
> Saw this when testing 1.4.0-rc5
> {code}
> [ RUN  ] PersistentVolumeEndpointsTest.EndpointCreateThenOfferRemove
> I0912 05:40:27.335222 30860 cluster.cpp:162] Creating default 'local' 
> authorizer
> I0912 05:40:27.338429 30867 master.cpp:442] Master 
> 2bd1e8eb-e314-4181-9ed3-d397ec1dbede (6aa774430302) started on 
> 172.17.0.3:54639
> I0912 05:40:27.338472 30867 master.cpp:444] Flags at startup: --acls="" 
> --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
> --allocation_interval="50ms" --allocator="HierarchicalDRF" 
> --authenticate_agents="true" --authenticate_frameworks="true" 
> --authenticate_http_frameworks="true" --authenticate_http_readonly="true" 
> --authenticate_http_readwrite="true" --authenticators="crammd5" 
> --authorizers="local" --credentials="/tmp/hH0YXe/credentials" 
> --filter_gpu_resources="true" --framework_sorter="drf" --help="false" 
> --hostname_lookup="true" --http_authenticators="basic" 
> --http_framework_authenticators="basic" --initialize_driver_logging="true" 
> --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
> --max_agent_ping_timeouts="5" --max_completed_frameworks="50" 
> --max_completed_tasks_per_framework="1000" 
> --max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false" 
> --recovery_agent_removal_limit="100%" --registry="in_memory" 
> --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" 
> --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" 
> --registry_store_timeout="100secs" --registry_strict="false" --roles="role1" 
> --root_submissions="true" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/hH0YXe/master" 
> --zk_session_timeout="10secs"
> I0912 05:40:27.338778 30867 master.cpp:494] Master only allowing 
> authenticated frameworks to register
> I0912 05:40:27.338788 30867 master.cpp:508] Master only allowing 
> authenticated agents to register
> I0912 05:40:27.338793 30867 master.cpp:521] Master only allowing 
> authenticated HTTP frameworks to register
> I0912 05:40:27.338799 30867 credentials.hpp:37] Loading credentials for 
> authentication from '/tmp/hH0YXe/credentials'
> I0912 05:40:27.353009 30867 master.cpp:566] Using default 'crammd5' 
> authenticator
> I0912 05:40:27.353183 30867 http.cpp:1026] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-readonly'
> I0912 05:40:27.353364 30867 http.cpp:1026] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-readwrite'
> I0912 05:40:27.353482 30867 http.cpp:1026] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-scheduler'
> I0912 05:40:27.353588 30867 master.cpp:646] Authorization enabled
> W0912 05:40:27.353605 30867 master.cpp:709] The '--roles' flag is deprecated. 
> This flag will be removed in the future. See the Mesos 0.27 upgrade notes for 
> more information
> I0912 05:40:27.353742 30868 hierarchical.cpp:171] Initialized hierarchical 
> allocator process
> I0912 05:40:27.353775 30872 whitelist_watcher.cpp:77] No whitelist given
> I0912 05:40:27.356655 30873 master.cpp:2163] Elected as the leading master!
> I0912 05:40:27.356675 30873 master.cpp:1702] Recovering from registrar
> I0912 05:40:27.356868 30874 registrar.cpp:347] Recovering registrar
> I0912 05:40:27.357390 30874 registrar.cpp:391] Successfully fetched the 
> registry (0B) in 494080ns
> I0912 05:40:27.357483 30874 registrar.cpp:495] Applied 1 operations in 
> 31911ns; attempting to update the registry
> I0912 05:40:27.357919 30874 registrar.cpp:552] Successfully updated the 
> registry in 391936ns
> I0912 05:40:27.358018 30874 registrar.cpp:424] Successfully recovered 
> registrar
> I0912 05:40:27.358413 30868 master.cpp:1801] Recovered 0 agents from the 
> registry (129B); allowing 10mins for agents to re-register
> I0912 05:40:27.358482 30867 hierarchical.cpp:209] Skipping recovery of 
> hierarchical allocator: nothing to recover
> W0912 05:40:27.364050 30860 process.cpp:3196] Attempted to spawn already 
> running process files@172.17.0.3:54639
> I0912 05:40:27.365372 30860 containerizer.cpp:246] Using isolation: 
> posix/cpu,posix/mem,filesystem/posix,network/cni,environment_secret
> W0912 05:40:27.365909 30860 backend.cpp:76] Failed to create 'aufs' backend: 
> AufsBacke

[jira] [Assigned] (MESOS-7739) RegisterSlaveValidationTest.DropInvalidReregistration is flaky

2017-10-04 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov reassigned MESOS-7739:
--

Assignee: (was: Neil Conway)

> RegisterSlaveValidationTest.DropInvalidReregistration is flaky
> --
>
> Key: MESOS-7739
> URL: https://issues.apache.org/jira/browse/MESOS-7739
> Project: Mesos
>  Issue Type: Bug
>Reporter: Vinod Kone
>  Labels: flaky-test, mesosphere-oncall
>
> Observed this on ASF CI.
> Seems a bit different from MESOS-7441.
> {code}
> [ RUN  ] RegisterSlaveValidationTest.DropInvalidReregistration
> I0629 05:23:17.367363  2252 cluster.cpp:162] Creating default 'local' 
> authorizer
> I0629 05:23:17.370198  2276 master.cpp:436] Master 
> 25091bef-3845-4bb6-ae23-e18ac0f4d174 (b3c104d65da7) started on 
> 172.17.0.3:42034
> I0629 05:23:17.370234  2276 master.cpp:438] Flags at startup: --acls="" 
> --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
> --allocation_interval="1secs" -
> -allocator="HierarchicalDRF" --authenticate_agents="true" 
> --authenticate_frameworks="true" --authenticate_http_frameworks="true" 
> --authenticate_http_readonly="true" --au
> thenticate_http_readwrite="true" --authenticators="crammd5" 
> --authorizers="local" --credentials="/tmp/V0UvSM/credentials" 
> --framework_sorter="drf" --help="false" --hostn
> ame_lookup="true" --http_authenticators="basic" 
> --http_framework_authenticators="basic" --initialize_driver_logging="true" 
> --log_auto_initialize="true" --logbufsecs="0" 
> --logging_level="INFO" --max_agent_ping_timeouts="5" 
> --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
> --max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false" 
> --recovery_agent_removal_limit="100%" --registry="in_memory" 
> --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" 
> --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" 
> --registry_store_timeout="100secs" --registry_strict="false" 
> --root_submissions="true" --user_sorter="drf" --version="false" 
> --webui_dir="/mesos/mesos-1.3.1/_inst/share/mesos/webui" 
> --work_dir="/tmp/V0UvSM/master" --zk_session_timeout="10secs"
> I0629 05:23:17.370513  2276 master.cpp:488] Master only allowing 
> authenticated frameworks to register
> I0629 05:23:17.370525  2276 master.cpp:502] Master only allowing 
> authenticated agents to register
> I0629 05:23:17.370534  2276 master.cpp:515] Master only allowing 
> authenticated HTTP frameworks to register
> I0629 05:23:17.370543  2276 credentials.hpp:37] Loading credentials for 
> authentication from '/tmp/V0UvSM/credentials'
> I0629 05:23:17.370806  2276 master.cpp:560] Using default 'crammd5' 
> authenticator
> I0629 05:23:17.370929  2276 http.cpp:975] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-readonly'
> I0629 05:23:17.371073  2276 http.cpp:975] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-readwrite'
> I0629 05:23:17.371193  2276 http.cpp:975] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-scheduler'
> I0629 05:23:17.371318  2276 master.cpp:640] Authorization enabled
> I0629 05:23:17.371455  2272 hierarchical.cpp:158] Initialized hierarchical 
> allocator process
> I0629 05:23:17.371477  2290 whitelist_watcher.cpp:77] No whitelist given
> I0629 05:23:17.373731  2277 master.cpp:2161] Elected as the leading master!
> I0629 05:23:17.373760  2277 master.cpp:1700] Recovering from registrar
> I0629 05:23:17.373891  2280 registrar.cpp:345] Recovering registrar
> I0629 05:23:17.374527  2280 registrar.cpp:389] Successfully fetched the 
> registry (0B) in 593152ns
> I0629 05:23:17.374625  2280 registrar.cpp:493] Applied 1 operations in 
> 19216ns; attempting to update the registry
> I0629 05:23:17.375228  2280 registrar.cpp:550] Successfully updated the 
> registry in 555008ns
> I0629 05:23:17.375336  2280 registrar.cpp:422] Successfully recovered 
> registrar
> I0629 05:23:17.375826  2282 hierarchical.cpp:185] Skipping recovery of 
> hierarchical allocator: nothing to recover
> I0629 05:23:17.375850  2290 master.cpp:1799] Recovered 0 agents from the 
> registry (129B); allowing 10mins for agents to re-register
> I0629 05:23:17.380674  2252 containerizer.cpp:221] Using isolation: 
> posix/cpu,posix/mem,filesystem/posix,network/cni
> W0629 05:23:17.381237  2252 backend.cpp:76] Failed to create 'aufs' backend: 
> AufsBackend requires root privileges
> W0629 05:23:17.381350  2252 backend.cpp:76] Failed to create 'bind' backend: 
> BindBackend requires root privileges
> I0629 05:23:17.381384  2252 provisioner.cpp:249] Using default backend 'copy'
> I0629 05:23:17.383884  2252 cluster.cpp:448] Creating default 'local' 
> authorizer
> I0629 05:23:17.385763  2281 slave.cpp:231] Mesos

[jira] [Updated] (MESOS-7739) RegisterSlaveValidationTest.DropInvalidReregistration is flaky.

2017-10-04 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-7739:
---
Summary: RegisterSlaveValidationTest.DropInvalidReregistration is flaky.  
(was: RegisterSlaveValidationTest.DropInvalidReregistration is flaky)

> RegisterSlaveValidationTest.DropInvalidReregistration is flaky.
> ---
>
> Key: MESOS-7739
> URL: https://issues.apache.org/jira/browse/MESOS-7739
> Project: Mesos
>  Issue Type: Bug
>Reporter: Vinod Kone
>  Labels: flaky-test, mesosphere-oncall
>
> Observed this on ASF CI.
> Seems a bit different from MESOS-7441.
> {code}
> [ RUN  ] RegisterSlaveValidationTest.DropInvalidReregistration
> I0629 05:23:17.367363  2252 cluster.cpp:162] Creating default 'local' 
> authorizer
> I0629 05:23:17.370198  2276 master.cpp:436] Master 
> 25091bef-3845-4bb6-ae23-e18ac0f4d174 (b3c104d65da7) started on 
> 172.17.0.3:42034
> I0629 05:23:17.370234  2276 master.cpp:438] Flags at startup: --acls="" 
> --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
> --allocation_interval="1secs" -
> -allocator="HierarchicalDRF" --authenticate_agents="true" 
> --authenticate_frameworks="true" --authenticate_http_frameworks="true" 
> --authenticate_http_readonly="true" --au
> thenticate_http_readwrite="true" --authenticators="crammd5" 
> --authorizers="local" --credentials="/tmp/V0UvSM/credentials" 
> --framework_sorter="drf" --help="false" --hostn
> ame_lookup="true" --http_authenticators="basic" 
> --http_framework_authenticators="basic" --initialize_driver_logging="true" 
> --log_auto_initialize="true" --logbufsecs="0" 
> --logging_level="INFO" --max_agent_ping_timeouts="5" 
> --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
> --max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false" 
> --recovery_agent_removal_limit="100%" --registry="in_memory" 
> --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" 
> --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" 
> --registry_store_timeout="100secs" --registry_strict="false" 
> --root_submissions="true" --user_sorter="drf" --version="false" 
> --webui_dir="/mesos/mesos-1.3.1/_inst/share/mesos/webui" 
> --work_dir="/tmp/V0UvSM/master" --zk_session_timeout="10secs"
> I0629 05:23:17.370513  2276 master.cpp:488] Master only allowing 
> authenticated frameworks to register
> I0629 05:23:17.370525  2276 master.cpp:502] Master only allowing 
> authenticated agents to register
> I0629 05:23:17.370534  2276 master.cpp:515] Master only allowing 
> authenticated HTTP frameworks to register
> I0629 05:23:17.370543  2276 credentials.hpp:37] Loading credentials for 
> authentication from '/tmp/V0UvSM/credentials'
> I0629 05:23:17.370806  2276 master.cpp:560] Using default 'crammd5' 
> authenticator
> I0629 05:23:17.370929  2276 http.cpp:975] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-readonly'
> I0629 05:23:17.371073  2276 http.cpp:975] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-readwrite'
> I0629 05:23:17.371193  2276 http.cpp:975] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-scheduler'
> I0629 05:23:17.371318  2276 master.cpp:640] Authorization enabled
> I0629 05:23:17.371455  2272 hierarchical.cpp:158] Initialized hierarchical 
> allocator process
> I0629 05:23:17.371477  2290 whitelist_watcher.cpp:77] No whitelist given
> I0629 05:23:17.373731  2277 master.cpp:2161] Elected as the leading master!
> I0629 05:23:17.373760  2277 master.cpp:1700] Recovering from registrar
> I0629 05:23:17.373891  2280 registrar.cpp:345] Recovering registrar
> I0629 05:23:17.374527  2280 registrar.cpp:389] Successfully fetched the 
> registry (0B) in 593152ns
> I0629 05:23:17.374625  2280 registrar.cpp:493] Applied 1 operations in 
> 19216ns; attempting to update the registry
> I0629 05:23:17.375228  2280 registrar.cpp:550] Successfully updated the 
> registry in 555008ns
> I0629 05:23:17.375336  2280 registrar.cpp:422] Successfully recovered 
> registrar
> I0629 05:23:17.375826  2282 hierarchical.cpp:185] Skipping recovery of 
> hierarchical allocator: nothing to recover
> I0629 05:23:17.375850  2290 master.cpp:1799] Recovered 0 agents from the 
> registry (129B); allowing 10mins for agents to re-register
> I0629 05:23:17.380674  2252 containerizer.cpp:221] Using isolation: 
> posix/cpu,posix/mem,filesystem/posix,network/cni
> W0629 05:23:17.381237  2252 backend.cpp:76] Failed to create 'aufs' backend: 
> AufsBackend requires root privileges
> W0629 05:23:17.381350  2252 backend.cpp:76] Failed to create 'bind' backend: 
> BindBackend requires root privileges
> I0629 05:23:17.381384  2252 provisioner.cpp:249] Using default backend 'copy'
> I0629 05:23:17.383884  2252

[jira] [Updated] (MESOS-8051) Killing TASK_GROUP fails to kill some tasks

2017-10-04 Thread A. Dukhovniy (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

A. Dukhovniy updated MESOS-8051:

Description: 
When starting following pod definition via marathon:

{code:java}
{
  "id": "/simple-pod",
  "scaling": {
"kind": "fixed",
"instances": 3
  },
  "environment": {
"PING": "PONG"
  },
  "containers": [
{
  "name": "ct1",
  "resources": {
"cpus": 0.1,
"mem": 32
  },
  "image": {
"kind": "MESOS",
"id": "busybox"
  },
  "exec": {
"command": {
  "shell": "while true; do echo the current time is $(date) > 
./test-v1/clock; sleep 1; done"
}
  },
  "volumeMounts": [
{
  "name": "v1",
  "mountPath": "test-v1"
}
  ]
},
{
  "name": "ct2",
  "resources": {
"cpus": 0.1,
"mem": 32
  },
  "exec": {
"command": {
  "shell": "while true; do echo -n $PING ' '; cat ./etc/clock; sleep 1; 
done"
}
  },
  "volumeMounts": [
{
  "name": "v1",
  "mountPath": "etc"
},
{
  "name": "v2",
  "mountPath": "docker"
}
  ]
}
  ],
  "networks": [
{
  "mode": "host"
}
  ],
  "volumes": [
{
  "name": "v1"
},
{
  "name": "v2",
  "host": "/var/lib/docker"
}
  ]
}
{code}

mesos will successfully kill all {{ct2}} containers but fail to kill all/some 
of the {{ct1}} containers. I've attached both master and agent logs. The 
interesting part starts after marathon issues 6 kills:

{code:java}
Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal mesos-master[4708]: 
I1004 14:58:25.209966  4746 master.cpp:5297] Processing KILL call for task 
'simple-pod.instance-3c1098e5-a914-11e7-bcd5-e63c853d
bf20.ct1' of framework bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) at 
scheduler-c61c493c-728f-4bd9-be60-7373574749af@10.0.5.229:15101
Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal mesos-master[4708]: 
I1004 14:58:25.210033  4746 master.cpp:5371] Telling agent 
bae11d5d-20c2-4d66-9ec3-773d1d717e58-S1 at slave(1)@10.0.1.207:5051 (
10.0.1.207) to kill task 
simple-pod.instance-3c1098e5-a914-11e7-bcd5-e63c853dbf20.ct1 of framework 
bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) at 
scheduler-c61c493c-728f-4bd9-be60-7373574749af@10.0.5
.229:15101

Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal mesos-master[4708]: 
I1004 14:58:25.210471  4748 master.cpp:5297] Processing KILL call for task 
'simple-pod.instance-3c1098e5-a914-11e7-bcd5-e63c853d
bf20.ct2' of framework bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) at 
scheduler-c61c493c-728f-4bd9-be60-7373574749af@10.0.5.229:15101
Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal mesos-master[4708]: 
I1004 14:58:25.210518  4748 master.cpp:5371] Telling agent 
bae11d5d-20c2-4d66-9ec3-773d1d717e58-S1 at slave(1)@10.0.1.207:5051 (
10.0.1.207) to kill task 
simple-pod.instance-3c1098e5-a914-11e7-bcd5-e63c853dbf20.ct2 of framework 
bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) at 
scheduler-c61c493c-728f-4bd9-be60-7373574749af@10.0.5
.229:15101

Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal mesos-master[4708]: 
I1004 14:58:25.210602  4748 master.cpp:5297] Processing KILL call for task 
'simple-pod.instance-3c0ffca4-a914-11e7-bcd5-e63c853d
bf20.ct1' of framework bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) at 
scheduler-c61c493c-728f-4bd9-be60-7373574749af@10.0.5.229:15101
Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal mesos-master[4708]: 
I1004 14:58:25.210639  4748 master.cpp:5371] Telling agent 
bae11d5d-20c2-4d66-9ec3-773d1d717e58-S1 at slave(1)@10.0.1.207:5051 (
10.0.1.207) to kill task 
simple-pod.instance-3c0ffca4-a914-11e7-bcd5-e63c853dbf20.ct1 of framework 
bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) at 
scheduler-c61c493c-728f-4bd9-be60-7373574749af@10.0.5
.229:15101

Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal mesos-master[4708]: 
I1004 14:58:25.210932  4753 master.cpp:5297] Processing KILL call for task 
'simple-pod.instance-3c0ffca4-a914-11e7-bcd5-e63c853d
bf20.ct2' of framework bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) at 
scheduler-c61c493c-728f-4bd9-be60-7373574749af@10.0.5.229:15101
Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal mesos-master[4708]: 
I1004 14:58:25.210968  4753 master.cpp:5371] Telling agent 
bae11d5d-20c2-4d66-9ec3-773d1d717e58-S1 at slave(1)@10.0.1.207:5051 (
10.0.1.207) to kill task 
simple-pod.instance-3c0ffca4-a914-11e7-bcd5-e63c853dbf20.ct2 of framework 
bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) at 
scheduler-c61c493c-728f-4bd9-be60-7373574749af@10.0.5
.229:15101

Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal mesos-master[4708]: 
I1004 14:58:25.211210  4747 master.cpp:5297] Processing 

[jira] [Updated] (MESOS-7082) ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.KillTask/0 is flaky.

2017-10-04 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-7082:
---
Summary: 
ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.KillTask/0 is 
flaky.  (was: 
ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.KillTask/0 is 
flaky)

> ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.KillTask/0 is 
> flaky.
> -
>
> Key: MESOS-7082
> URL: https://issues.apache.org/jira/browse/MESOS-7082
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.2.0
> Environment: ubuntu 16.04 with/without SSL
> Fedora 23
>Reporter: Anand Mazumdar
>Priority: Critical
>  Labels: flaky, flaky-test, mesosphere
>
> Showed up on our internal CI
> {noformat}
> 07:00:17 [ RUN  ] 
> ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.KillTask/0
> 07:00:17 I0207 07:00:17.775459  2952 cluster.cpp:160] Creating default 
> 'local' authorizer
> 07:00:17 I0207 07:00:17.776511  2970 master.cpp:383] Master 
> fa1554c4-572a-4b89-8994-a89460f588d3 (ip-10-153-254-29.ec2.internal) started 
> on 10.153.254.29:38570
> 07:00:17 I0207 07:00:17.776538  2970 master.cpp:385] Flags at startup: 
> --acls="" --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate_agents="true" --authenticate_frameworks="true" 
> --authenticate_http_frameworks="true" --authenticate_http_readonly="true" 
> --authenticate_http_readwrite="true" --authenticators="crammd5" 
> --authorizers="local" --credentials="/tmp/ZROfJk/credentials" 
> --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --http_authenticators="basic" --http_framework_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" 
> --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
> --max_unreachable_tasks_per_framework="1000" --quiet="false" 
> --recovery_agent_removal_limit="100%" --registry="in_memory" 
> --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" 
> --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" 
> --registry_store_timeout="100secs" --registry_strict="false" 
> --root_submissions="true" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/ZROfJk/master" 
> --zk_session_timeout="10secs"
> 07:00:17 I0207 07:00:17.776674  2970 master.cpp:435] Master only allowing 
> authenticated frameworks to register
> 07:00:17 I0207 07:00:17.776687  2970 master.cpp:449] Master only allowing 
> authenticated agents to register
> 07:00:17 I0207 07:00:17.776695  2970 master.cpp:462] Master only allowing 
> authenticated HTTP frameworks to register
> 07:00:17 I0207 07:00:17.776703  2970 credentials.hpp:37] Loading credentials 
> for authentication from '/tmp/ZROfJk/credentials'
> 07:00:17 I0207 07:00:17.776779  2970 master.cpp:507] Using default 'crammd5' 
> authenticator
> 07:00:17 I0207 07:00:17.776841  2970 http.cpp:919] Using default 'basic' HTTP 
> authenticator for realm 'mesos-master-readonly'
> 07:00:17 I0207 07:00:17.776919  2970 http.cpp:919] Using default 'basic' HTTP 
> authenticator for realm 'mesos-master-readwrite'
> 07:00:17 I0207 07:00:17.776970  2970 http.cpp:919] Using default 'basic' HTTP 
> authenticator for realm 'mesos-master-scheduler'
> 07:00:17 I0207 07:00:17.777009  2970 master.cpp:587] Authorization enabled
> 07:00:17 I0207 07:00:17.777122  2975 hierarchical.cpp:161] Initialized 
> hierarchical allocator process
> 07:00:17 I0207 07:00:17.777138  2974 whitelist_watcher.cpp:77] No whitelist 
> given
> 07:00:17 I0207 07:00:17.04  2976 master.cpp:2123] Elected as the leading 
> master!
> 07:00:17 I0207 07:00:17.26  2976 master.cpp:1645] Recovering from 
> registrar
> 07:00:17 I0207 07:00:17.84  2975 registrar.cpp:329] Recovering registrar
> 07:00:17 I0207 07:00:17.777989  2973 registrar.cpp:362] Successfully fetched 
> the registry (0B) in 176384ns
> 07:00:17 I0207 07:00:17.778023  2973 registrar.cpp:461] Applied 1 operations 
> in 7573ns; attempting to update the registry
> 07:00:17 I0207 07:00:17.778249  2976 registrar.cpp:506] Successfully updated 
> the registry in 210944ns
> 07:00:17 I0207 07:00:17.778290  2976 registrar.cpp:392] Successfully 
> recovered registrar
> 07:00:17 I0207 07:00:17.778373  2976 master.cpp:1761] Recovered 0 agents from 
> the registry (172B); allowing 10mins for agents to re-register
> 07:00:17 I0207 07:00:17.778394  2974 hierarchical.cpp:188] Skipping recovery 
> of hierarchical allocator: nothing to recover
> 07:00:17 I0207 07:00:17.869381  2952 containerizer.cpp:220] Using isol

[jira] [Commented] (MESOS-8051) Killing TASK_GROUP fails to kill some tasks

2017-10-04 Thread A. Dukhovniy (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16191502#comment-16191502
 ] 

A. Dukhovniy commented on MESOS-8051:
-

Here logs for one of the failing tasks from master:

{code:java}
40268:Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal 
mesos-master[4708]: I1004 14:58:25.210602  4748 master.cpp:5297] Processing 
KILL call for task 
'simple-pod.instance-3c0ffca4-a914-11e7-bcd5-e63c853dbf20.ct1' of framework 
bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) at 
scheduler-c61c493c-728f-4bd9-be60-7373574749af@10.0.5.229:15101
40269:Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal 
mesos-master[4708]: I1004 14:58:25.210639  4748 master.cpp:5371] Telling agent 
bae11d5d-20c2-4d66-9ec3-773d1d717e58-S1 at slave(1)@10.0.1.207:5051 
(10.0.1.207) to kill task 
simple-pod.instance-3c0ffca4-a914-11e7-bcd5-e63c853dbf20.ct1 of framework 
bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) at 
scheduler-c61c493c-728f-4bd9-be60-7373574749af@10.0.5.229:15101
40287:Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal 
mesos-master[4708]: I1004 14:58:25.331063  4747 master.cpp:6841] Status update 
TASK_KILLING (UUID: 23c6e28b-4370-4da3-981c-13a121b145c0) for task 
simple-pod.instance-3c0ffca4-a914-11e7-bcd5-e63c853dbf20.ct1 of framework 
bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 from agent 
bae11d5d-20c2-4d66-9ec3-773d1d717e58-S1 at slave(1)@10.0.1.207:5051 (10.0.1.207)
40288:Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal 
mesos-master[4708]: I1004 14:58:25.331110  4747 master.cpp:6903] Forwarding 
status update TASK_KILLING (UUID: 23c6e28b-4370-4da3-981c-13a121b145c0) for 
task simple-pod.instance-3c0ffca4-a914-11e7-bcd5-e63c853dbf20.ct1 of framework 
bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001
40289:Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal 
mesos-master[4708]: I1004 14:58:25.331193  4747 master.cpp:8928] Updating the 
state of task simple-pod.instance-3c0ffca4-a914-11e7-bcd5-e63c853dbf20.ct1 of 
framework bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (latest state: 
TASK_KILLING, status update state: TASK_KILLING)
40297:Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal 
mesos-master[4708]: I1004 14:58:25.341003  4750 master.cpp:5479] Processing 
ACKNOWLEDGE call 23c6e28b-4370-4da3-981c-13a121b145c0 for task 
simple-pod.instance-3c0ffca4-a914-11e7-bcd5-e63c853dbf20.ct1 of framework 
bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) at 
scheduler-c61c493c-728f-4bd9-be60-7373574749af@10.0.5.229:15101 on agent 
bae11d5d-20c2-4d66-9ec3-773d1d717e58-S1
40337:Oct 04 14:58:35 ip-10-0-5-229.eu-central-1.compute.internal 
mesos-master[4708]: I1004 14:58:35.229382  4746 master.cpp:5297] Processing 
KILL call for task 
'simple-pod.instance-3c0ffca4-a914-11e7-bcd5-e63c853dbf20.ct1' of framework 
bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) at 
scheduler-c61c493c-728f-4bd9-be60-7373574749af@10.0.5.229:15101
40338:Oct 04 14:58:35 ip-10-0-5-229.eu-central-1.compute.internal 
mesos-master[4708]: I1004 14:58:35.229418  4746 master.cpp:5371] Telling agent 
bae11d5d-20c2-4d66-9ec3-773d1d717e58-S1 at slave(1)@10.0.1.207:5051 
(10.0.1.207) to kill task 
simple-pod.instance-3c0ffca4-a914-11e7-bcd5-e63c853dbf20.ct1 of framework 
bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) at 
scheduler-c61c493c-728f-4bd9-be60-7373574749af@10.0.5.229:15101
40372:Oct 04 14:58:55 ip-10-0-5-229.eu-central-1.compute.internal 
mesos-master[4708]: I1004 14:58:55.168781  4752 master.cpp:6841] Status update 
TASK_FAILED (UUID: 57b5c03e-517c-4dc2-8592-c24e5c875fde) for task 
simple-pod.instance-3c0ffca4-a914-11e7-bcd5-e63c853dbf20.ct1 of framework 
bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 from agent 
bae11d5d-20c2-4d66-9ec3-773d1d717e58-S1 at slave(1)@10.0.1.207:5051 (10.0.1.207)
{code}

It takes ~30s, marathon issues 2 kills in the meantime and eventually 
{{TASK_FAILED}} is received. 

> Killing TASK_GROUP fails to kill some tasks
> ---
>
> Key: MESOS-8051
> URL: https://issues.apache.org/jira/browse/MESOS-8051
> Project: Mesos
>  Issue Type: Bug
>  Components: agent, executor
>Affects Versions: 1.4.0
>Reporter: A. Dukhovniy
>Priority: Critical
> Attachments: dcos-mesos-master.log.gz, dcos-mesos-slave.log.gz, 
> screenshot-1.png
>
>
> When starting following pod definition via marathon:
> {code:java}
> {
>   "id": "/simple-pod",
>   "scaling": {
> "kind": "fixed",
> "instances": 3
>   },
>   "environment": {
> "PING": "PONG"
>   },
>   "containers": [
> {
>   "name": "ct1",
>   "resources": {
> "cpus": 0.1,
> "mem": 32
>   },
>   "image": {
> "kind": "MESOS",
> "id": "busybox"
>   },
>   "exec": {
> "command": {
>   "shell": "whi

[jira] [Updated] (MESOS-8051) Killing TASK_GROUP fails to kill some tasks

2017-10-04 Thread A. Dukhovniy (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

A. Dukhovniy updated MESOS-8051:

Attachment: screenshot-1.png

> Killing TASK_GROUP fails to kill some tasks
> ---
>
> Key: MESOS-8051
> URL: https://issues.apache.org/jira/browse/MESOS-8051
> Project: Mesos
>  Issue Type: Bug
>  Components: agent, executor
>Affects Versions: 1.4.0
>Reporter: A. Dukhovniy
>Priority: Critical
> Attachments: dcos-mesos-master.log.gz, dcos-mesos-slave.log.gz, 
> screenshot-1.png
>
>
> When starting following pod definition via marathon:
> {code:java}
> {
>   "id": "/simple-pod",
>   "scaling": {
> "kind": "fixed",
> "instances": 3
>   },
>   "environment": {
> "PING": "PONG"
>   },
>   "containers": [
> {
>   "name": "ct1",
>   "resources": {
> "cpus": 0.1,
> "mem": 32
>   },
>   "image": {
> "kind": "MESOS",
> "id": "busybox"
>   },
>   "exec": {
> "command": {
>   "shell": "while true; do echo the current time is $(date) > 
> ./test-v1/clock; sleep 1; done"
> }
>   },
>   "volumeMounts": [
> {
>   "name": "v1",
>   "mountPath": "test-v1"
> }
>   ]
> },
> {
>   "name": "ct2",
>   "resources": {
> "cpus": 0.1,
> "mem": 32
>   },
>   "exec": {
> "command": {
>   "shell": "while true; do echo -n $PING ' '; cat ./etc/clock; sleep 
> 1; done"
> }
>   },
>   "volumeMounts": [
> {
>   "name": "v1",
>   "mountPath": "etc"
> },
> {
>   "name": "v2",
>   "mountPath": "docker"
> }
>   ]
> }
>   ],
>   "networks": [
> {
>   "mode": "host"
> }
>   ],
>   "volumes": [
> {
>   "name": "v1"
> },
> {
>   "name": "v2",
>   "host": "/var/lib/docker"
> }
>   ]
> }
> {code}
> mesos will successfully kill all {{ct2}} containers but fail to kill all/some 
> of the {{ct1}} containers. I've attached both master and agent logs. The 
> interesting part starts after marathon issues 6 kills:
> {code:java}
> Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal 
> mesos-master[4708]: I1004 14:58:25.209966  4746 master.cpp:5297] Processing 
> KILL call for task 'simple-pod.instance-3c1098e5-a914-11e7-bcd5-e63c853d
> bf20.ct1' of framework bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) 
> at scheduler-c61c493c-728f-4bd9-be60-7373574749af@10.0.5.229:15101
> Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal 
> mesos-master[4708]: I1004 14:58:25.210033  4746 master.cpp:5371] Telling 
> agent bae11d5d-20c2-4d66-9ec3-773d1d717e58-S1 at slave(1)@10.0.1.207:5051 (
> 10.0.1.207) to kill task 
> simple-pod.instance-3c1098e5-a914-11e7-bcd5-e63c853dbf20.ct1 of framework 
> bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) at 
> scheduler-c61c493c-728f-4bd9-be60-7373574749af@10.0.5
> .229:15101
> Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal 
> mesos-master[4708]: I1004 14:58:25.210471  4748 master.cpp:5297] Processing 
> KILL call for task 'simple-pod.instance-3c1098e5-a914-11e7-bcd5-e63c853d
> bf20.ct2' of framework bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) 
> at scheduler-c61c493c-728f-4bd9-be60-7373574749af@10.0.5.229:15101
> Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal 
> mesos-master[4708]: I1004 14:58:25.210518  4748 master.cpp:5371] Telling 
> agent bae11d5d-20c2-4d66-9ec3-773d1d717e58-S1 at slave(1)@10.0.1.207:5051 (
> 10.0.1.207) to kill task 
> simple-pod.instance-3c1098e5-a914-11e7-bcd5-e63c853dbf20.ct2 of framework 
> bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) at 
> scheduler-c61c493c-728f-4bd9-be60-7373574749af@10.0.5
> .229:15101
> Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal 
> mesos-master[4708]: I1004 14:58:25.210602  4748 master.cpp:5297] Processing 
> KILL call for task 'simple-pod.instance-3c0ffca4-a914-11e7-bcd5-e63c853d
> bf20.ct1' of framework bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) 
> at scheduler-c61c493c-728f-4bd9-be60-7373574749af@10.0.5.229:15101
> Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal 
> mesos-master[4708]: I1004 14:58:25.210639  4748 master.cpp:5371] Telling 
> agent bae11d5d-20c2-4d66-9ec3-773d1d717e58-S1 at slave(1)@10.0.1.207:5051 (
> 10.0.1.207) to kill task 
> simple-pod.instance-3c0ffca4-a914-11e7-bcd5-e63c853dbf20.ct1 of framework 
> bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) at 
> scheduler-c61c493c-728f-4bd9-be60-7373574749af@10.0.5
> .229:15101
> Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal 
> mesos-master[4708]: I1004 14:58:25.210932  4753 master.cpp:5297] Processing 
> KILL call for task 'simple-pod.instance-3c0ffca

[jira] [Updated] (MESOS-6086) PersistentVolumeEndpointsTest.EndpointCreateThenOfferRemove is flaky.

2017-10-04 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6086:
---
Labels: flaky-test tech-debt  (was: tech-debt)

> PersistentVolumeEndpointsTest.EndpointCreateThenOfferRemove is flaky.
> -
>
> Key: MESOS-6086
> URL: https://issues.apache.org/jira/browse/MESOS-6086
> Project: Mesos
>  Issue Type: Bug
>Reporter: Benjamin Mahler
>Assignee: Neil Conway
>  Labels: flaky-test, tech-debt
>
> Observed this when running on a CentOS 7 machine.
> Good Run:
> {noformat}
> [ RUN  ] PersistentVolumeEndpointsTest.EndpointCreateThenOfferRemove
> I0824 14:24:15.585021 19320 cluster.cpp:157] Creating default 'local' 
> authorizer
> I0824 14:24:15.590765 19320 replica.cpp:776] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0824 14:24:15.593570 19370 recover.cpp:451] Starting replica recovery
> I0824 14:24:15.594476 19370 recover.cpp:477] Replica is in EMPTY status
> I0824 14:24:15.597961 19352 replica.cpp:673] Replica in EMPTY status received 
> a broadcasted recover request from __req_res__(1)@10.0.49.2:38017
> I0824 14:24:15.599189 19351 recover.cpp:197] Received a recover response from 
> a replica in EMPTY status
> I0824 14:24:15.600607 19364 recover.cpp:568] Updating replica status to 
> STARTING
> I0824 14:24:15.601824 19336 replica.cpp:320] Persisted replica status to 
> STARTING
> I0824 14:24:15.602224 19351 recover.cpp:477] Replica is in STARTING status
> I0824 14:24:15.603526 19373 replica.cpp:673] Replica in STARTING status 
> received a broadcasted recover request from __req_res__(2)@10.0.49.2:38017
> I0824 14:24:15.603824 19375 recover.cpp:197] Received a recover response from 
> a replica in STARTING status
> I0824 14:24:15.604395 19380 recover.cpp:568] Updating replica status to VOTING
> I0824 14:24:15.605470 19334 replica.cpp:320] Persisted replica status to 
> VOTING
> I0824 14:24:15.605612 19375 recover.cpp:582] Successfully joined the Paxos 
> group
> I0824 14:24:15.607223 19367 master.cpp:379] Master 
> dff6317e-46bf-4bf1-8a56-3fcdfb3df5e5 (core-dev) started on 10.0.49.2:38017
> I0824 14:24:15.607286 19367 master.cpp:381] Flags at startup: --acls="" 
> --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
> --allocation_interval="50ms" --allocator="HierarchicalDRF" 
> --authenticate_agents="true" --authenticate_frameworks="true" 
> --authenticate_http_frameworks="true" --authenticate_http_readonly="true" 
> --authenticate_http_readwrite="true" --authenticators="crammd5" 
> --authorizers="local" --credentials="/tmp/DZsoQK/credentials" 
> --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --http_authenticators="basic" --http_framework_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" 
> --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
> --quiet="false" --recovery_agent_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="100secs" --registry_strict="true" --roles="role1" 
> --root_submissions="true" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/DZsoQK/master" 
> --zk_session_timeout="10secs"
> I0824 14:24:15.609459 19367 master.cpp:431] Master only allowing 
> authenticated frameworks to register
> I0824 14:24:15.609486 19367 master.cpp:445] Master only allowing 
> authenticated agents to register
> I0824 14:24:15.609566 19367 master.cpp:458] Master only allowing 
> authenticated HTTP frameworks to register
> I0824 14:24:15.609591 19367 credentials.hpp:37] Loading credentials for 
> authentication from '/tmp/DZsoQK/credentials'
> I0824 14:24:15.610335 19367 master.cpp:503] Using default 'crammd5' 
> authenticator
> I0824 14:24:15.610589 19367 authenticator.cpp:519] Initializing server SASL
> I0824 14:24:15.611868 19367 http.cpp:883] Using default 'basic' HTTP 
> authenticator for realm 'mesos-master-readonly'
> I0824 14:24:15.612370 19367 http.cpp:883] Using default 'basic' HTTP 
> authenticator for realm 'mesos-master-readwrite'
> I0824 14:24:15.612555 19367 http.cpp:883] Using default 'basic' HTTP 
> authenticator for realm 'mesos-master-scheduler'
> I0824 14:24:15.612905 19367 master.cpp:583] Authorization enabled
> W0824 14:24:15.612949 19367 master.cpp:646] The '--roles' flag is deprecated. 
> This flag will be removed in the future. See the Mesos 0.27 upgrade notes for 
> more information
> I0824 14:24:15.624155 19356 master.cpp:1855] Elected as the leading master!
> I0824 14:24:15.624238 19356 master.cpp:1551] Recovering from registrar
> I0824 14:24:15.626255 19336 log.cpp:553] Attempting

[jira] [Updated] (MESOS-8051) Killing TASK_GROUP fails to kill some tasks

2017-10-04 Thread A. Dukhovniy (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

A. Dukhovniy updated MESOS-8051:

Attachment: dcos-mesos-master.log.gz
dcos-mesos-slave.log.gz

Master and agent logs

> Killing TASK_GROUP fails to kill some tasks
> ---
>
> Key: MESOS-8051
> URL: https://issues.apache.org/jira/browse/MESOS-8051
> Project: Mesos
>  Issue Type: Bug
>  Components: agent, executor
>Affects Versions: 1.4.0
>Reporter: A. Dukhovniy
>Priority: Critical
> Attachments: dcos-mesos-master.log.gz, dcos-mesos-slave.log.gz
>
>
> When starting following pod definition via marathon:
> {code:java}
> {
>   "id": "/simple-pod",
>   "scaling": {
> "kind": "fixed",
> "instances": 3
>   },
>   "environment": {
> "PING": "PONG"
>   },
>   "containers": [
> {
>   "name": "ct1",
>   "resources": {
> "cpus": 0.1,
> "mem": 32
>   },
>   "image": {
> "kind": "MESOS",
> "id": "busybox"
>   },
>   "exec": {
> "command": {
>   "shell": "while true; do echo the current time is $(date) > 
> ./test-v1/clock; sleep 1; done"
> }
>   },
>   "volumeMounts": [
> {
>   "name": "v1",
>   "mountPath": "test-v1"
> }
>   ]
> },
> {
>   "name": "ct2",
>   "resources": {
> "cpus": 0.1,
> "mem": 32
>   },
>   "exec": {
> "command": {
>   "shell": "while true; do echo -n $PING ' '; cat ./etc/clock; sleep 
> 1; done"
> }
>   },
>   "volumeMounts": [
> {
>   "name": "v1",
>   "mountPath": "etc"
> },
> {
>   "name": "v2",
>   "mountPath": "docker"
> }
>   ]
> }
>   ],
>   "networks": [
> {
>   "mode": "host"
> }
>   ],
>   "volumes": [
> {
>   "name": "v1"
> },
> {
>   "name": "v2",
>   "host": "/var/lib/docker"
> }
>   ]
> }
> {code}
> mesos will successfully kill all {{ct2}} containers but fail to kill all/some 
> of the {{ct1}} containers. I've attached both master and agent logs. The 
> interesting part starts after marathon issues 6 kills:
> {code:java}
> Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal 
> mesos-master[4708]: I1004 14:58:25.209966  4746 master.cpp:5297] Processing 
> KILL call for task 'simple-pod.instance-3c1098e5-a914-11e7-bcd5-e63c853d
> bf20.ct1' of framework bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) 
> at scheduler-c61c493c-728f-4bd9-be60-7373574749af@10.0.5.229:15101
> Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal 
> mesos-master[4708]: I1004 14:58:25.210033  4746 master.cpp:5371] Telling 
> agent bae11d5d-20c2-4d66-9ec3-773d1d717e58-S1 at slave(1)@10.0.1.207:5051 (
> 10.0.1.207) to kill task 
> simple-pod.instance-3c1098e5-a914-11e7-bcd5-e63c853dbf20.ct1 of framework 
> bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) at 
> scheduler-c61c493c-728f-4bd9-be60-7373574749af@10.0.5
> .229:15101
> Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal 
> mesos-master[4708]: I1004 14:58:25.210471  4748 master.cpp:5297] Processing 
> KILL call for task 'simple-pod.instance-3c1098e5-a914-11e7-bcd5-e63c853d
> bf20.ct2' of framework bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) 
> at scheduler-c61c493c-728f-4bd9-be60-7373574749af@10.0.5.229:15101
> Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal 
> mesos-master[4708]: I1004 14:58:25.210518  4748 master.cpp:5371] Telling 
> agent bae11d5d-20c2-4d66-9ec3-773d1d717e58-S1 at slave(1)@10.0.1.207:5051 (
> 10.0.1.207) to kill task 
> simple-pod.instance-3c1098e5-a914-11e7-bcd5-e63c853dbf20.ct2 of framework 
> bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) at 
> scheduler-c61c493c-728f-4bd9-be60-7373574749af@10.0.5
> .229:15101
> Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal 
> mesos-master[4708]: I1004 14:58:25.210602  4748 master.cpp:5297] Processing 
> KILL call for task 'simple-pod.instance-3c0ffca4-a914-11e7-bcd5-e63c853d
> bf20.ct1' of framework bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) 
> at scheduler-c61c493c-728f-4bd9-be60-7373574749af@10.0.5.229:15101
> Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal 
> mesos-master[4708]: I1004 14:58:25.210639  4748 master.cpp:5371] Telling 
> agent bae11d5d-20c2-4d66-9ec3-773d1d717e58-S1 at slave(1)@10.0.1.207:5051 (
> 10.0.1.207) to kill task 
> simple-pod.instance-3c0ffca4-a914-11e7-bcd5-e63c853dbf20.ct1 of framework 
> bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) at 
> scheduler-c61c493c-728f-4bd9-be60-7373574749af@10.0.5
> .229:15101
> Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal 
> mesos-master[4708]: I1004 14:58:25.210932  4753 master.cpp:5297] Processing 

[jira] [Updated] (MESOS-8005) Mesos.SlaveTest.ShutdownUnregisteredExecutor is flaky

2017-10-04 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-8005:
---
Labels: flaky-test mesosphere  (was: )

> Mesos.SlaveTest.ShutdownUnregisteredExecutor is flaky
> -
>
> Key: MESOS-8005
> URL: https://issues.apache.org/jira/browse/MESOS-8005
> Project: Mesos
>  Issue Type: Bug
>Reporter: Benno Evers
>  Labels: flaky-test, mesosphere
> Attachments: jenkins.log.gz
>
>
> Executed on Ubuntu 17.04 w/ SSL enabled:
> {code}
> ../../src/tests/cluster.cpp:580
> Value of: containers->empty()
>   Actual: false
> Expected: true
> Failed to destroy containers: { 86d690bc-4248-4d26-bdc7-28901d8cf2ab }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8051) Killing TASK_GROUP fails to kill some tasks

2017-10-04 Thread A. Dukhovniy (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

A. Dukhovniy updated MESOS-8051:

Description: 
When starting following pod definition via marathon:

{code:java}
{
  "id": "/simple-pod",
  "scaling": {
"kind": "fixed",
"instances": 3
  },
  "environment": {
"PING": "PONG"
  },
  "containers": [
{
  "name": "ct1",
  "resources": {
"cpus": 0.1,
"mem": 32
  },
  "image": {
"kind": "MESOS",
"id": "busybox"
  },
  "exec": {
"command": {
  "shell": "while true; do echo the current time is $(date) > 
./test-v1/clock; sleep 1; done"
}
  },
  "volumeMounts": [
{
  "name": "v1",
  "mountPath": "test-v1"
}
  ]
},
{
  "name": "ct2",
  "resources": {
"cpus": 0.1,
"mem": 32
  },
  "exec": {
"command": {
  "shell": "while true; do echo -n $PING ' '; cat ./etc/clock; sleep 1; 
done"
}
  },
  "volumeMounts": [
{
  "name": "v1",
  "mountPath": "etc"
},
{
  "name": "v2",
  "mountPath": "docker"
}
  ]
}
  ],
  "networks": [
{
  "mode": "host"
}
  ],
  "volumes": [
{
  "name": "v1"
},
{
  "name": "v2",
  "host": "/var/lib/docker"
}
  ]
}
{code}

mesos will successfully kill all {{ct2}} containers but fail to kill all/some 
of the {{ct1}} containers. I've attached both master and agent logs. The 
interesting part starts after marathon issues 6 kills:

{code:java}
Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal mesos-master[4708]: 
I1004 14:58:25.209966  4746 master.cpp:5297] Processing KILL call for task 
'simple-pod.instance-3c1098e5-a914-11e7-bcd5-e63c853d
bf20.ct1' of framework bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) at 
scheduler-c61c493c-728f-4bd9-be60-7373574749af@10.0.5.229:15101
Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal mesos-master[4708]: 
I1004 14:58:25.210033  4746 master.cpp:5371] Telling agent 
bae11d5d-20c2-4d66-9ec3-773d1d717e58-S1 at slave(1)@10.0.1.207:5051 (
10.0.1.207) to kill task 
simple-pod.instance-3c1098e5-a914-11e7-bcd5-e63c853dbf20.ct1 of framework 
bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) at 
scheduler-c61c493c-728f-4bd9-be60-7373574749af@10.0.5
.229:15101

Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal mesos-master[4708]: 
I1004 14:58:25.210471  4748 master.cpp:5297] Processing KILL call for task 
'simple-pod.instance-3c1098e5-a914-11e7-bcd5-e63c853d
bf20.ct2' of framework bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) at 
scheduler-c61c493c-728f-4bd9-be60-7373574749af@10.0.5.229:15101
Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal mesos-master[4708]: 
I1004 14:58:25.210518  4748 master.cpp:5371] Telling agent 
bae11d5d-20c2-4d66-9ec3-773d1d717e58-S1 at slave(1)@10.0.1.207:5051 (
10.0.1.207) to kill task 
simple-pod.instance-3c1098e5-a914-11e7-bcd5-e63c853dbf20.ct2 of framework 
bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) at 
scheduler-c61c493c-728f-4bd9-be60-7373574749af@10.0.5
.229:15101

Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal mesos-master[4708]: 
I1004 14:58:25.210602  4748 master.cpp:5297] Processing KILL call for task 
'simple-pod.instance-3c0ffca4-a914-11e7-bcd5-e63c853d
bf20.ct1' of framework bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) at 
scheduler-c61c493c-728f-4bd9-be60-7373574749af@10.0.5.229:15101
Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal mesos-master[4708]: 
I1004 14:58:25.210639  4748 master.cpp:5371] Telling agent 
bae11d5d-20c2-4d66-9ec3-773d1d717e58-S1 at slave(1)@10.0.1.207:5051 (
10.0.1.207) to kill task 
simple-pod.instance-3c0ffca4-a914-11e7-bcd5-e63c853dbf20.ct1 of framework 
bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) at 
scheduler-c61c493c-728f-4bd9-be60-7373574749af@10.0.5
.229:15101

Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal mesos-master[4708]: 
I1004 14:58:25.210932  4753 master.cpp:5297] Processing KILL call for task 
'simple-pod.instance-3c0ffca4-a914-11e7-bcd5-e63c853d
bf20.ct2' of framework bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) at 
scheduler-c61c493c-728f-4bd9-be60-7373574749af@10.0.5.229:15101
Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal mesos-master[4708]: 
I1004 14:58:25.210968  4753 master.cpp:5371] Telling agent 
bae11d5d-20c2-4d66-9ec3-773d1d717e58-S1 at slave(1)@10.0.1.207:5051 (
10.0.1.207) to kill task 
simple-pod.instance-3c0ffca4-a914-11e7-bcd5-e63c853dbf20.ct2 of framework 
bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) at 
scheduler-c61c493c-728f-4bd9-be60-7373574749af@10.0.5
.229:15101

Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal mesos-master[4708]: 
I1004 14:58:25.211210  4747 master.cpp:5297] Processing 

[jira] [Created] (MESOS-8051) Killing TASK_GROUP fails to kill some tasks

2017-10-04 Thread A. Dukhovniy (JIRA)
A. Dukhovniy created MESOS-8051:
---

 Summary: Killing TASK_GROUP fails to kill some tasks
 Key: MESOS-8051
 URL: https://issues.apache.org/jira/browse/MESOS-8051
 Project: Mesos
  Issue Type: Bug
  Components: agent, executor
Affects Versions: 1.4.0
Reporter: A. Dukhovniy
Priority: Critical


When starting following pod definition via marathon:

{code:java}
{
  "id": "/simple-pod",
  "scaling": {
"kind": "fixed",
"instances": 3
  },
  "environment": {
"PING": "PONG"
  },
  "containers": [
{
  "name": "ct1",
  "resources": {
"cpus": 0.1,
"mem": 32
  },
  "image": {
"kind": "MESOS",
"id": "busybox"
  },
  "exec": {
"command": {
  "shell": "while true; do echo the current time is $(date) > 
./test-v1/clock; sleep 1; done"
}
  },
  "volumeMounts": [
{
  "name": "v1",
  "mountPath": "test-v1"
}
  ]
},
{
  "name": "ct2",
  "resources": {
"cpus": 0.1,
"mem": 32
  },
  "exec": {
"command": {
  "shell": "while true; do echo -n $PING ' '; cat ./etc/clock; sleep 1; 
done"
}
  },
  "volumeMounts": [
{
  "name": "v1",
  "mountPath": "etc"
},
{
  "name": "v2",
  "mountPath": "docker"
}
  ]
}
  ],
  "networks": [
{
  "mode": "host"
}
  ],
  "volumes": [
{
  "name": "v1"
},
{
  "name": "v2",
  "host": "/var/lib/docker"
}
  ]
}
{code}

mesos will successfully kill all {{ct2}} containers but fail to kill all/some 
of the {{ct1}} containers. I've attached both master and agent logs. The 
interesting part starts after marathon issues 6 kills:

{code:java}
Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal mesos-master[4708]: 
I1004 14:58:25.209966  4746 master.cpp:5297] Processing KILL call for task 
'simple-pod.instance-3c1098e5-a914-11e7-bcd5-e63c853d
bf20.ct1' of framework bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) at 
scheduler-c61c493c-728f-4bd9-be60-7373574749af@10.0.5.229:15101
Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal mesos-master[4708]: 
I1004 14:58:25.210033  4746 master.cpp:5371] Telling agent 
bae11d5d-20c2-4d66-9ec3-773d1d717e58-S1 at slave(1)@10.0.1.207:5051 (
10.0.1.207) to kill task 
simple-pod.instance-3c1098e5-a914-11e7-bcd5-e63c853dbf20.ct1 of framework 
bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) at 
scheduler-c61c493c-728f-4bd9-be60-7373574749af@10.0.5
.229:15101

Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal mesos-master[4708]: 
I1004 14:58:25.210471  4748 master.cpp:5297] Processing KILL call for task 
'simple-pod.instance-3c1098e5-a914-11e7-bcd5-e63c853d
bf20.ct2' of framework bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) at 
scheduler-c61c493c-728f-4bd9-be60-7373574749af@10.0.5.229:15101
Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal mesos-master[4708]: 
I1004 14:58:25.210518  4748 master.cpp:5371] Telling agent 
bae11d5d-20c2-4d66-9ec3-773d1d717e58-S1 at slave(1)@10.0.1.207:5051 (
10.0.1.207) to kill task 
simple-pod.instance-3c1098e5-a914-11e7-bcd5-e63c853dbf20.ct2 of framework 
bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) at 
scheduler-c61c493c-728f-4bd9-be60-7373574749af@10.0.5
.229:15101

Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal mesos-master[4708]: 
I1004 14:58:25.210602  4748 master.cpp:5297] Processing KILL call for task 
'simple-pod.instance-3c0ffca4-a914-11e7-bcd5-e63c853d
bf20.ct1' of framework bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) at 
scheduler-c61c493c-728f-4bd9-be60-7373574749af@10.0.5.229:15101
Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal mesos-master[4708]: 
I1004 14:58:25.210639  4748 master.cpp:5371] Telling agent 
bae11d5d-20c2-4d66-9ec3-773d1d717e58-S1 at slave(1)@10.0.1.207:5051 (
10.0.1.207) to kill task 
simple-pod.instance-3c0ffca4-a914-11e7-bcd5-e63c853dbf20.ct1 of framework 
bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) at 
scheduler-c61c493c-728f-4bd9-be60-7373574749af@10.0.5
.229:15101

Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal mesos-master[4708]: 
I1004 14:58:25.210932  4753 master.cpp:5297] Processing KILL call for task 
'simple-pod.instance-3c0ffca4-a914-11e7-bcd5-e63c853d
bf20.ct2' of framework bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) at 
scheduler-c61c493c-728f-4bd9-be60-7373574749af@10.0.5.229:15101
Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal mesos-master[4708]: 
I1004 14:58:25.210968  4753 master.cpp:5371] Telling agent 
bae11d5d-20c2-4d66-9ec3-773d1d717e58-S1 at slave(1)@10.0.1.207:5051 (
10.0.1.207) to kill task 
simple-pod.instance-3c0ffca4-a914-11e7-bcd5-e63c853dbf20.ct2 of framework 
bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (m

[jira] [Updated] (MESOS-7986) ExecutorHttpApiTest.ValidJsonButInvalidProtobuf and ExecutorHttpApiTest.NoContentType fail in parallel test execution

2017-10-04 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-7986:
---
Labels: flaky-test mesosphere  (was: mesosphere)

> ExecutorHttpApiTest.ValidJsonButInvalidProtobuf and 
> ExecutorHttpApiTest.NoContentType fail in parallel test execution
> -
>
> Key: MESOS-7986
> URL: https://issues.apache.org/jira/browse/MESOS-7986
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.5.0
>Reporter: Benjamin Bannier
>  Labels: flaky-test, mesosphere
> Attachments: test.log
>
>
> When running cmake-built Mesos test in parallel, I reliably encounter failing 
> {{ExecutorHttpApiTest.ValidJsonButInvalidProtobuf}} or 
> {{ExecutorHttpApiTest.NoContentType}},
> {noformat}
> $ ../support/mesos-gtest-runner.py ./src/mesos-tests -j10
> [ RUN  ] ExecutorHttpApiTest.ValidJsonButInvalidProtobuf
> ../src/tests/executor_http_api_tests.cpp:197: Failure
> Value of: (response).get().status
>   Actual: "401 Unauthorized"
> Expected: BadRequest().status
> Which is: "400 Bad Request"
> [  FAILED  ] ExecutorHttpApiTest.ValidJsonButInvalidProtobuf (17 ms)
> {noformat}
> {noformat}
> [ RUN  ] ExecutorHttpApiTest.NoContentType
> ../src/tests/executor_http_api_tests.cpp:158: Failure
> Value of: (response).get().status
>   Actual: "401 Unauthorized"
> Expected: BadRequest().status
> Which is: "400 Bad Request"
> [  FAILED  ] ExecutorHttpApiTest.NoContentType (20 ms)
> {noformat}
> The machine has 16 physical cores.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-3160) CgroupsAnyHierarchyMemoryPressureTest.ROOT_IncreaseRSS Flaky

2017-10-04 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-3160:
---
Labels: cgroups flaky-test mesosphere  (was: cgroups mesosphere)

> CgroupsAnyHierarchyMemoryPressureTest.ROOT_IncreaseRSS Flaky
> 
>
> Key: MESOS-3160
> URL: https://issues.apache.org/jira/browse/MESOS-3160
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.24.0, 0.26.0
>Reporter: Paul Brett
>  Labels: cgroups, flaky-test, mesosphere
>
> Test will occasionally with:
> [ RUN  ] CgroupsAnyHierarchyMemoryPressureTest.ROOT_IncreaseUnlockedRSS
> ../../src/tests/containerizer/cgroups_tests.cpp:1103: Failure
> helper.increaseRSS(getpagesize()): Failed to sync with the subprocess
> ../../src/tests/containerizer/cgroups_tests.cpp:1103: Failure
> helper.increaseRSS(getpagesize()): The subprocess has not been spawned yet
> [  FAILED  ] CgroupsAnyHierarchyMemoryPressureTest.ROOT_IncreaseUnlockedRSS 
> (223 ms)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7504) Parent's mount namespace cannot be determined when launching a nested container.

2017-10-04 Thread Andrei Budnik (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16191456#comment-16191456
 ] 

Andrei Budnik commented on MESOS-7504:
--

{{(launch).failure(): Cannot get target mount namespace from process 10991: 
Cannot get 'mnt' namespace for 2nd-level child process '11001': Failed to stat 
mnt namespace handle for pid 11001: No such file or directory}}

> Parent's mount namespace cannot be determined when launching a nested 
> container.
> 
>
> Key: MESOS-7504
> URL: https://issues.apache.org/jira/browse/MESOS-7504
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 1.3.0
> Environment: Ubuntu 16.04
>Reporter: Alexander Rukletsov
>Assignee: Andrei Budnik
>  Labels: containerizer, flaky-test, mesosphere
>
> I've observed this failure twice in different Linux environments. Here is an 
> example of such failure:
> {noformat}
> [ RUN  ] 
> NestedMesosContainerizerTest.ROOT_CGROUPS_DestroyDebugContainerOnRecover
> I0509 21:53:25.471657 17167 containerizer.cpp:221] Using isolation: 
> cgroups/cpu,filesystem/linux,namespaces/pid,network/cni,volume/image
> I0509 21:53:25.475124 17167 linux_launcher.cpp:150] Using 
> /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher
> I0509 21:53:25.475407 17167 provisioner.cpp:249] Using default backend 
> 'overlay'
> I0509 21:53:25.481232 17186 containerizer.cpp:608] Recovering containerizer
> I0509 21:53:25.482295 17186 provisioner.cpp:410] Provisioner recovery complete
> I0509 21:53:25.482587 17187 containerizer.cpp:1001] Starting container 
> 21bc372c-0f2c-49f5-b8ab-8d32c232b95d for executor 'executor' of framework 
> I0509 21:53:25.482918 17189 cgroups.cpp:410] Creating cgroup at 
> '/sys/fs/cgroup/cpu,cpuacct/mesos_test_d989f526-efe0-4553-bf79-936ad66c3753/21bc372c-0f2c-49f5-b8ab-8d32c232b95d'
>  for container 21bc372c-0f2c-49f5-b8ab-8d32c232b95d
> I0509 21:53:25.484103 17190 cpu.cpp:101] Updated 'cpu.shares' to 1024 (cpus 
> 1) for container 21bc372c-0f2c-49f5-b8ab-8d32c232b95d
> I0509 21:53:25.484808 17186 containerizer.cpp:1524] Launching 
> 'mesos-containerizer' with flags '--help="false" 
> --launch_info="{"clone_namespaces":[131072,536870912],"command":{"shell":true,"value":"sleep
>  
> 1000"},"environment":{"variables":[{"name":"MESOS_SANDBOX","type":"VALUE","value":"\/tmp\/NestedMesosContainerizerTest_ROOT_CGROUPS_DestroyDebugContainerOnRecover_zlywyr"}]},"pre_exec_commands":[{"arguments":["mesos-containerizer","mount","--help=false","--operation=make-rslave","--path=\/"],"shell":false,"value":"\/home\/ubuntu\/workspace\/mesos\/Mesos_CI-build\/FLAG\/SSL\/label\/mesos-ec2-ubuntu-16.04\/mesos\/build\/src\/mesos-containerizer"},{"shell":true,"value":"mount
>  -n -t proc proc \/proc -o 
> nosuid,noexec,nodev"}],"working_directory":"\/tmp\/NestedMesosContainerizerTest_ROOT_CGROUPS_DestroyDebugContainerOnRecover_zlywyr"}"
>  --pipe_read="29" --pipe_write="32" 
> --runtime_directory="/tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_DestroyDebugContainerOnRecover_sKhtj7/containers/21bc372c-0f2c-49f5-b8ab-8d32c232b95d"
>  --unshare_namespace_mnt="false"'
> I0509 21:53:25.484978 17189 linux_launcher.cpp:429] Launching container 
> 21bc372c-0f2c-49f5-b8ab-8d32c232b95d and cloning with namespaces CLONE_NEWNS 
> | CLONE_NEWPID
> I0509 21:53:25.513890 17186 containerizer.cpp:1623] Checkpointing container's 
> forked pid 1873 to 
> '/tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_DestroyDebugContainerOnRecover_Rdjw6M/meta/slaves/frameworks/executors/executor/runs/21bc372c-0f2c-49f5-b8ab-8d32c232b95d/pids/forked.pid'
> I0509 21:53:25.515878 17190 fetcher.cpp:353] Starting to fetch URIs for 
> container: 21bc372c-0f2c-49f5-b8ab-8d32c232b95d, directory: 
> /tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_DestroyDebugContainerOnRecover_zlywyr
> I0509 21:53:25.517715 17193 containerizer.cpp:1791] Starting nested container 
> 21bc372c-0f2c-49f5-b8ab-8d32c232b95d.ea991d38-e1a5-44fe-a522-622b15142e35
> I0509 21:53:25.518569 17193 switchboard.cpp:545] Launching 
> 'mesos-io-switchboard' with flags '--heartbeat_interval="30secs" 
> --help="false" 
> --socket_address="/tmp/mesos-io-switchboard-ca463cf2-70ba-4121-a5c6-1a170ae40c1b"
>  --stderr_from_fd="36" --stderr_to_fd="2" --stdin_to_fd="32" 
> --stdout_from_fd="33" --stdout_to_fd="1" --tty="false" 
> --wait_for_connection="true"' for container 
> 21bc372c-0f2c-49f5-b8ab-8d32c232b95d.ea991d38-e1a5-44fe-a522-622b15142e35
> I0509 21:53:25.521229 17193 switchboard.cpp:575] Created I/O switchboard 
> server (pid: 1881) listening on socket file 
> '/tmp/mesos-io-switchboard-ca463cf2-70ba-4121-a5c6-1a170ae40c1b' for 
> container 
> 21bc372c-0f2c-49f5-b8ab-8d32c232b95d.ea991d38-e1a5-44fe-a522

[jira] [Commented] (MESOS-4812) Mesos fails to escape command health checks

2017-10-04 Thread Andrei Budnik (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16191117#comment-16191117
 ] 

Andrei Budnik commented on MESOS-4812:
--

I have closed [/r/62381|https://reviews.apache.org/r/62381/], for details see 
comment in discard reason.

> Mesos fails to escape command health checks
> ---
>
> Key: MESOS-4812
> URL: https://issues.apache.org/jira/browse/MESOS-4812
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.25.0
>Reporter: Lukas Loesche
>Assignee: Andrei Budnik
>  Labels: health-check, mesosphere, tech-debt
> Attachments: health_task.gif
>
>
> As described in https://github.com/mesosphere/marathon/issues/
> I would like to run a command health check
> {noformat}
> /bin/bash -c " {noformat}
> The health check fails because Mesos, while running the command inside double 
> quotes of a sh -c "" doesn't escape the double quotes in the command.
> If I escape the double quotes myself the command health check succeeds. But 
> this would mean that the user needs intimate knowledge of how Mesos executes 
> his commands which can't be right.
> I was told this is not a Marathon but a Mesos issue so am opening this JIRA. 
> I don't know if this only affects the command health check.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)