[jira] [Updated] (MESOS-8264) Publish custom images used in Mesos unit tests to the official Mesos Docker hub account

2017-11-23 Thread Qian Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qian Zhang updated MESOS-8264:
--
Summary: Publish custom images used in Mesos unit tests to the official 
Mesos Docker hub account  (was: Publish custom images used in Mesos unit tests 
to https://hub.docker.com/u/mesos/)

> Publish custom images used in Mesos unit tests to the official Mesos Docker 
> hub account
> ---
>
> Key: MESOS-8264
> URL: https://issues.apache.org/jira/browse/MESOS-8264
> Project: Mesos
>  Issue Type: Improvement
>  Components: test
>Reporter: Qian Zhang
>
> Currently there are a couple of custom Docker images used in Mesos unit tests 
> (see below), but they are not in the official Mesos Docker hub account 
> (https://hub.docker.com/u/mesos/).
> * mesosphere/inky
> * mesosphere/alpine-expect
> * mesosphere/test-executor
> * haosdent/https-server
> * zhq527725/whiteout
> * zhq527725/https-server
> * chhsiao/overwrite
> We should publish them to the official Mesos Docker hub account, and also 
> commit their sources (e.g., Dockerfile and other necessary source files) in 
> Mesos code repo.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (MESOS-8264) Publish custom images used in Mesos unit tests to https://hub.docker.com/u/mesos/

2017-11-23 Thread Qian Zhang (JIRA)
Qian Zhang created MESOS-8264:
-

 Summary: Publish custom images used in Mesos unit tests to 
https://hub.docker.com/u/mesos/
 Key: MESOS-8264
 URL: https://issues.apache.org/jira/browse/MESOS-8264
 Project: Mesos
  Issue Type: Improvement
  Components: test
Reporter: Qian Zhang


Currently there are a couple of custom Docker images used in Mesos unit tests 
(see below), but they are not in the official Mesos Docker hub account 
(https://hub.docker.com/u/mesos/).
* mesosphere/inky
* mesosphere/alpine-expect
* mesosphere/test-executor
* haosdent/https-server
* zhq527725/whiteout
* zhq527725/https-server
* chhsiao/overwrite

We should publish them to the official Mesos Docker hub account, and also 
commit their sources (e.g., Dockerfile and other necessary source files) in 
Mesos code repo.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-6003) Add logging module for logging to an external program

2017-11-23 Thread Will Rouesnel (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16264839#comment-16264839
 ] 

Will Rouesnel commented on MESOS-6003:
--

Is there any further changes needed to try and advance this to being merged?

> Add logging module for logging to an external program
> -
>
> Key: MESOS-6003
> URL: https://issues.apache.org/jira/browse/MESOS-6003
> Project: Mesos
>  Issue Type: Improvement
>  Components: modules
>Reporter: Will Rouesnel
>Assignee: Will Rouesnel
>Priority: Minor
>
> In the vein of the logrotate module for logging, there should be a similar 
> module which provides support for logging to an arbitrary log handling 
> program, with suitable task metadata provided by environment variables or 
> command line arguments.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8240) Add an option to build the new CLI and run unit tests.

2017-11-23 Thread Armand Grillet (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Armand Grillet updated MESOS-8240:
--
Sprint: Mesosphere Sprint 69  (was: Mesosphere Sprint 68)

> Add an option to build the new CLI and run unit tests.
> --
>
> Key: MESOS-8240
> URL: https://issues.apache.org/jira/browse/MESOS-8240
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Armand Grillet
>Assignee: Armand Grillet
>
> An update of the discarded https://reviews.apache.org/r/52543/



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (MESOS-8094) Leverage helper functions to reduce boilerplate code related to v1 API.

2017-11-23 Thread Armand Grillet (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Armand Grillet reassigned MESOS-8094:
-

Assignee: (was: Armand Grillet)

> Leverage helper functions to reduce boilerplate code related to v1 API.
> ---
>
> Key: MESOS-8094
> URL: https://issues.apache.org/jira/browse/MESOS-8094
> Project: Mesos
>  Issue Type: Improvement
>  Components: test
>Reporter: Alexander Rukletsov
>  Labels: mesosphere, newbie
>
> https://reviews.apache.org/r/61982/ created an example how test code related 
> to scheduler v1 API can be simplified with appropriate usage of helper 
> function. For example, instead of crafting a subscribe call manually like in
> {noformat}
>   {
> v1::scheduler::Call call;
> call.set_type(v1::scheduler::Call::SUBSCRIBE);
> v1::scheduler::Call::Subscribe* subscribe = call.mutable_subscribe();
> subscribe->mutable_framework_info()->CopyFrom(v1::DEFAULT_FRAMEWORK_INFO);
> mesos.send(call);
>   }
> {noformat}
> a helper function {{v1::scheduler::SendSubscribe()}} shall be invoked.
> To find all occurrences that shall be fixed, one can grep the test codebase 
> for {{call.set_type}}. At the moment I see the following files:
> {noformat}
> api_tests.cpp
> check_tests.cpp
> http_fault_tolerant_tests.cpp
> master_maintenance_tests.cpp
> master_tests.cpp
> scheduler_tests.cpp
> slave_authorization_tests.cpp
> slave_recovery_tests.cpp
> slave_tests.cpp
> {noformat}
> The same applies for sending status update acks; 
> {{v1::scheduler::SendAcknowledge()}} action shall be used instead of manually 
> crafting acks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8263) ResourceProviderManagerHttpApiTest.ConvertResources is flaky

2017-11-23 Thread Jan Schlicht (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Schlicht updated MESOS-8263:

  Sprint: Mesosphere Sprint 68
Story Points: 2
  Labels: mesosphere test  (was: test)

> ResourceProviderManagerHttpApiTest.ConvertResources is flaky
> 
>
> Key: MESOS-8263
> URL: https://issues.apache.org/jira/browse/MESOS-8263
> Project: Mesos
>  Issue Type: Bug
>  Components: flaky
>Reporter: Jan Schlicht
>Assignee: Jan Schlicht
>  Labels: mesosphere, test
>
> From a ASF CI run:
> {noformat}
> 3: [   OK ] 
> ContentType/ResourceProviderManagerHttpApiTest.ConvertResources/0 (1048 ms)
> 3: [ RUN  ] 
> ContentType/ResourceProviderManagerHttpApiTest.ConvertResources/1
> 3: I1123 08:06:04.233137 20036 cluster.cpp:162] Creating default 'local' 
> authorizer
> 3: I1123 08:06:04.237293 20060 master.cpp:448] Master 
> 7c9d8e8c-3fb3-44c5-8505-488ada3e848e (dce3e4c418cb) started on 
> 172.17.0.2:35090
> 3: I1123 08:06:04.237325 20060 master.cpp:450] Flags at startup: --acls="" 
> --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate_agents="true" --authenticate_frameworks="true" 
> --authenticate_http_frameworks="true" --authenticate_http_readonly="true" 
> --authenticate_http_readwrite="true" --authenticators="crammd5" 
> --authorizers="local" --credentials="/tmp/EpiTO7/credentials" 
> --filter_gpu_resources="true" --framework_sorter="drf" --help="false" 
> --hostname_lookup="true" --http_authenticators="basic" 
> --http_framework_authenticators="basic" --initialize_driver_logging="true" 
> --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
> --max_agent_ping_timeouts="5" --max_completed_frameworks="50" 
> --max_completed_tasks_per_framework="1000" 
> --max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false" 
> --recovery_agent_removal_limit="100%" --registry="in_memory" 
> --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" 
> --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" 
> --registry_store_timeout="100secs" --registry_strict="false" 
> --root_submissions="true" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/EpiTO7/master" 
> --zk_session_timeout="10secs"
> 3: I1123 08:06:04.237727 20060 master.cpp:499] Master only allowing 
> authenticated frameworks to register
> 3: I1123 08:06:04.237743 20060 master.cpp:505] Master only allowing 
> authenticated agents to register
> 3: I1123 08:06:04.237753 20060 master.cpp:511] Master only allowing 
> authenticated HTTP frameworks to register
> 3: I1123 08:06:04.237764 20060 credentials.hpp:37] Loading credentials for 
> authentication from '/tmp/EpiTO7/credentials'
> 3: I1123 08:06:04.238149 20060 master.cpp:555] Using default 'crammd5' 
> authenticator
> 3: I1123 08:06:04.238358 20060 http.cpp:1045] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-readonly'
> 3: I1123 08:06:04.238575 20060 http.cpp:1045] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-readwrite'
> 3: I1123 08:06:04.238764 20060 http.cpp:1045] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-scheduler'
> 3: I1123 08:06:04.238939 20060 master.cpp:634] Authorization enabled
> 3: I1123 08:06:04.239159 20043 whitelist_watcher.cpp:77] No whitelist given
> 3: I1123 08:06:04.239187 20045 hierarchical.cpp:173] Initialized hierarchical 
> allocator process
> 3: I1123 08:06:04.242822 20041 master.cpp:2215] Elected as the leading master!
> 3: I1123 08:06:04.242857 20041 master.cpp:1695] Recovering from registrar
> 3: I1123 08:06:04.243067 20052 registrar.cpp:347] Recovering registrar
> 3: I1123 08:06:04.243808 20052 registrar.cpp:391] Successfully fetched the 
> registry (0B) in 690944ns
> 3: I1123 08:06:04.243953 20052 registrar.cpp:495] Applied 1 operations in 
> 37370ns; attempting to update the registry
> 3: I1123 08:06:04.244638 20052 registrar.cpp:552] Successfully updated the 
> registry in 620032ns
> 3: I1123 08:06:04.244798 20052 registrar.cpp:424] Successfully recovered 
> registrar
> 3: I1123 08:06:04.245352 20058 hierarchical.cpp:211] Skipping recovery of 
> hierarchical allocator: nothing to recover
> 3: I1123 08:06:04.245358 20057 master.cpp:1808] Recovered 0 agents from the 
> registry (129B); allowing 10mins for agents to re-register
> 3: W1123 08:06:04.251852 20036 process.cpp:2756] Attempted to spawn already 
> running process files@172.17.0.2:35090
> 3: I1123 08:06:04.253250 20036 containerizer.cpp:301] Using isolation { 
> environment_secret, posix/cpu, posix/mem, filesystem/posix, network/cni }
> 3: W1123 08:06:04.253965 20036 backend.cpp:76] 

[jira] [Created] (MESOS-8263) ResourceProviderManagerHttpApiTest.ConvertResources is flaky

2017-11-23 Thread Jan Schlicht (JIRA)
Jan Schlicht created MESOS-8263:
---

 Summary: ResourceProviderManagerHttpApiTest.ConvertResources is 
flaky
 Key: MESOS-8263
 URL: https://issues.apache.org/jira/browse/MESOS-8263
 Project: Mesos
  Issue Type: Bug
  Components: flaky
Reporter: Jan Schlicht
Assignee: Jan Schlicht


>From a ASF CI run:

{noformat}
3: [   OK ] 
ContentType/ResourceProviderManagerHttpApiTest.ConvertResources/0 (1048 ms)
3: [ RUN  ] 
ContentType/ResourceProviderManagerHttpApiTest.ConvertResources/1
3: I1123 08:06:04.233137 20036 cluster.cpp:162] Creating default 'local' 
authorizer
3: I1123 08:06:04.237293 20060 master.cpp:448] Master 
7c9d8e8c-3fb3-44c5-8505-488ada3e848e (dce3e4c418cb) started on 172.17.0.2:35090
3: I1123 08:06:04.237325 20060 master.cpp:450] Flags at startup: --acls="" 
--agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
--allocation_interval="1secs" --allocator="HierarchicalDRF" 
--authenticate_agents="true" --authenticate_frameworks="true" 
--authenticate_http_frameworks="true" --authenticate_http_readonly="true" 
--authenticate_http_readwrite="true" --authenticators="crammd5" 
--authorizers="local" --credentials="/tmp/EpiTO7/credentials" 
--filter_gpu_resources="true" --framework_sorter="drf" --help="false" 
--hostname_lookup="true" --http_authenticators="basic" 
--http_framework_authenticators="basic" --initialize_driver_logging="true" 
--log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
--max_agent_ping_timeouts="5" --max_completed_frameworks="50" 
--max_completed_tasks_per_framework="1000" 
--max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false" 
--recovery_agent_removal_limit="100%" --registry="in_memory" 
--registry_fetch_timeout="1mins" --registry_gc_interval="15mins" 
--registry_max_agent_age="2weeks" --registry_max_agent_count="102400" 
--registry_store_timeout="100secs" --registry_strict="false" 
--root_submissions="true" --user_sorter="drf" --version="false" 
--webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/EpiTO7/master" 
--zk_session_timeout="10secs"
3: I1123 08:06:04.237727 20060 master.cpp:499] Master only allowing 
authenticated frameworks to register
3: I1123 08:06:04.237743 20060 master.cpp:505] Master only allowing 
authenticated agents to register
3: I1123 08:06:04.237753 20060 master.cpp:511] Master only allowing 
authenticated HTTP frameworks to register
3: I1123 08:06:04.237764 20060 credentials.hpp:37] Loading credentials for 
authentication from '/tmp/EpiTO7/credentials'
3: I1123 08:06:04.238149 20060 master.cpp:555] Using default 'crammd5' 
authenticator
3: I1123 08:06:04.238358 20060 http.cpp:1045] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-readonly'
3: I1123 08:06:04.238575 20060 http.cpp:1045] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-readwrite'
3: I1123 08:06:04.238764 20060 http.cpp:1045] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-scheduler'
3: I1123 08:06:04.238939 20060 master.cpp:634] Authorization enabled
3: I1123 08:06:04.239159 20043 whitelist_watcher.cpp:77] No whitelist given
3: I1123 08:06:04.239187 20045 hierarchical.cpp:173] Initialized hierarchical 
allocator process
3: I1123 08:06:04.242822 20041 master.cpp:2215] Elected as the leading master!
3: I1123 08:06:04.242857 20041 master.cpp:1695] Recovering from registrar
3: I1123 08:06:04.243067 20052 registrar.cpp:347] Recovering registrar
3: I1123 08:06:04.243808 20052 registrar.cpp:391] Successfully fetched the 
registry (0B) in 690944ns
3: I1123 08:06:04.243953 20052 registrar.cpp:495] Applied 1 operations in 
37370ns; attempting to update the registry
3: I1123 08:06:04.244638 20052 registrar.cpp:552] Successfully updated the 
registry in 620032ns
3: I1123 08:06:04.244798 20052 registrar.cpp:424] Successfully recovered 
registrar
3: I1123 08:06:04.245352 20058 hierarchical.cpp:211] Skipping recovery of 
hierarchical allocator: nothing to recover
3: I1123 08:06:04.245358 20057 master.cpp:1808] Recovered 0 agents from the 
registry (129B); allowing 10mins for agents to re-register
3: W1123 08:06:04.251852 20036 process.cpp:2756] Attempted to spawn already 
running process files@172.17.0.2:35090
3: I1123 08:06:04.253250 20036 containerizer.cpp:301] Using isolation { 
environment_secret, posix/cpu, posix/mem, filesystem/posix, network/cni }
3: W1123 08:06:04.253965 20036 backend.cpp:76] Failed to create 'aufs' backend: 
AufsBackend requires root privileges
3: W1123 08:06:04.254109 20036 backend.cpp:76] Failed to create 'bind' backend: 
BindBackend requires root privileges
3: I1123 08:06:04.254148 20036 provisioner.cpp:259] Using default backend 'copy'
3: I1123 08:06:04.256542 20036 cluster.cpp:448] Creating default 'local' 
authorizer
3: I1123 08:06:04.260066 20057 slave.cpp:262] Mesos agent started on 
(784)@172.17.0.2:35090
3: I1123 

[jira] [Created] (MESOS-8262) CMake build with java enabled fails during linking step.

2017-11-23 Thread Alexander Rukletsov (JIRA)
Alexander Rukletsov created MESOS-8262:
--

 Summary: CMake build with java enabled fails during linking step.
 Key: MESOS-8262
 URL: https://issues.apache.org/jira/browse/MESOS-8262
 Project: Mesos
  Issue Type: Bug
  Components: build
Affects Versions: 1.5.0
 Environment: Mac OS 10.11.6
Reporter: Alexander Rukletsov


I've enabled JAVA in cmake build and have run the complete build via {{ninja 
check}}. Build failed with the following output:
{noformat}
[312/689] Linking CXX shared library src/libmesos-java.dylib
FAILED: src/libmesos-java.dylib 
: && /Library/Developer/CommandLineTools/usr/bin/c++ -std=c++11 
-Wformat-security -fstack-protector-strong  -dynamiclib 
-Wl,-headerpad_max_install_names  -o src/libmesos-java.dylib -install_name 
@rpath/libmesos-java.dylib src/CMakeFiles/mesos-java.dir/java/jni/convert.cpp.o 
src/CMakeFiles/mesos-java.dir/java/jni/construct.cpp.o 
src/CMakeFiles/mesos-java.dir/java/jni/org_apache_mesos_Log.cpp.o 
src/CMakeFiles/mesos-java.dir/java/jni/org_apache_mesos_MesosExecutorDriver.cpp.o
 
src/CMakeFiles/mesos-java.dir/java/jni/org_apache_mesos_MesosNativeLibrary.cpp.o
 
src/CMakeFiles/mesos-java.dir/java/jni/org_apache_mesos_MesosSchedulerDriver.cpp.o
 
src/CMakeFiles/mesos-java.dir/java/jni/org_apache_mesos_state_AbstractState.cpp.o
 
src/CMakeFiles/mesos-java.dir/java/jni/org_apache_mesos_state_LevelDBState.cpp.o
 src/CMakeFiles/mesos-java.dir/java/jni/org_apache_mesos_state_LogState.cpp.o 
src/CMakeFiles/mesos-java.dir/java/jni/org_apache_mesos_state_Variable.cpp.o 
src/CMakeFiles/mesos-java.dir/java/jni/org_apache_mesos_state_ZooKeeperState.cpp.o
 
src/CMakeFiles/mesos-java.dir/java/jni/org_apache_mesos_v1_scheduler_V1Mesos.cpp.o
 
src/CMakeFiles/mesos-java.dir/java/jni/org_apache_mesos_v1_scheduler_V0Mesos.cpp.o
 src/CMakeFiles/mesos-java.dir/jvm/jvm.cpp.o 
src/CMakeFiles/mesos-java.dir/jvm/org/apache/log4j.cpp.o 
src/CMakeFiles/mesos-java.dir/jvm/org/apache/zookeeper.cpp.o  
-Wl,-rpath,/Users/alex/Projects/mesos.build/src 
-Wl,-rpath,/Users/alex/Projects/mesos.build/3rdparty/libprocess/src 
src/libmesos-protobufs.dylib 3rdparty/libprocess/src/libprocess.dylib 
3rdparty/zookeeper-3.4.8/src/zookeeper-3.4.8-build/libzookeeper.a -framework 
JavaVM -framework JavaVM 
3rdparty/protobuf-3.5.0/src/protobuf-3.5.0-build/libprotobuf.dylib 
/usr/local/opt/apr/libexec/lib/libapr-1.dylib /usr/lib/libcurl.dylib 
3rdparty/glog-0.3.3/src/glog-0.3.3-build/lib/libglog.dylib /usr/lib/libz.dylib 
/usr/local/opt/subversion/lib/libsvn_delta-1.dylib 
/usr/local/opt/subversion/lib/libsvn_diff-1.dylib 
/usr/local/opt/subversion/lib/libsvn_subr-1.dylib 
3rdparty/http_parser-2.6.2/src/http_parser-2.6.2-build/libhttp_parser.a 
3rdparty/zookeeper-3.4.8/src/zookeeper-3.4.8-build/libhashtable.a && :
Undefined symbols for architecture x86_64:
  "mesos::MesosExecutorDriver::MesosExecutorDriver(mesos::Executor*)", 
referenced from:
  _Java_org_apache_mesos_MesosExecutorDriver_initialize in 
org_apache_mesos_MesosExecutorDriver.cpp.o
<...>
{noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8262) CMake build with java enabled fails during linking step.

2017-11-23 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-8262:
---
Labels: build cmake  (was: build)

> CMake build with java enabled fails during linking step.
> 
>
> Key: MESOS-8262
> URL: https://issues.apache.org/jira/browse/MESOS-8262
> Project: Mesos
>  Issue Type: Bug
>  Components: build
>Affects Versions: 1.5.0
> Environment: Mac OS 10.11.6
>Reporter: Alexander Rukletsov
>  Labels: build, cmake
>
> I've enabled JAVA in cmake build and have run the complete build via {{ninja 
> check}}. Build failed with the following output:
> {noformat}
> [312/689] Linking CXX shared library src/libmesos-java.dylib
> FAILED: src/libmesos-java.dylib 
> : && /Library/Developer/CommandLineTools/usr/bin/c++ -std=c++11 
> -Wformat-security -fstack-protector-strong  -dynamiclib 
> -Wl,-headerpad_max_install_names  -o src/libmesos-java.dylib -install_name 
> @rpath/libmesos-java.dylib 
> src/CMakeFiles/mesos-java.dir/java/jni/convert.cpp.o 
> src/CMakeFiles/mesos-java.dir/java/jni/construct.cpp.o 
> src/CMakeFiles/mesos-java.dir/java/jni/org_apache_mesos_Log.cpp.o 
> src/CMakeFiles/mesos-java.dir/java/jni/org_apache_mesos_MesosExecutorDriver.cpp.o
>  
> src/CMakeFiles/mesos-java.dir/java/jni/org_apache_mesos_MesosNativeLibrary.cpp.o
>  
> src/CMakeFiles/mesos-java.dir/java/jni/org_apache_mesos_MesosSchedulerDriver.cpp.o
>  
> src/CMakeFiles/mesos-java.dir/java/jni/org_apache_mesos_state_AbstractState.cpp.o
>  
> src/CMakeFiles/mesos-java.dir/java/jni/org_apache_mesos_state_LevelDBState.cpp.o
>  src/CMakeFiles/mesos-java.dir/java/jni/org_apache_mesos_state_LogState.cpp.o 
> src/CMakeFiles/mesos-java.dir/java/jni/org_apache_mesos_state_Variable.cpp.o 
> src/CMakeFiles/mesos-java.dir/java/jni/org_apache_mesos_state_ZooKeeperState.cpp.o
>  
> src/CMakeFiles/mesos-java.dir/java/jni/org_apache_mesos_v1_scheduler_V1Mesos.cpp.o
>  
> src/CMakeFiles/mesos-java.dir/java/jni/org_apache_mesos_v1_scheduler_V0Mesos.cpp.o
>  src/CMakeFiles/mesos-java.dir/jvm/jvm.cpp.o 
> src/CMakeFiles/mesos-java.dir/jvm/org/apache/log4j.cpp.o 
> src/CMakeFiles/mesos-java.dir/jvm/org/apache/zookeeper.cpp.o  
> -Wl,-rpath,/Users/alex/Projects/mesos.build/src 
> -Wl,-rpath,/Users/alex/Projects/mesos.build/3rdparty/libprocess/src 
> src/libmesos-protobufs.dylib 3rdparty/libprocess/src/libprocess.dylib 
> 3rdparty/zookeeper-3.4.8/src/zookeeper-3.4.8-build/libzookeeper.a -framework 
> JavaVM -framework JavaVM 
> 3rdparty/protobuf-3.5.0/src/protobuf-3.5.0-build/libprotobuf.dylib 
> /usr/local/opt/apr/libexec/lib/libapr-1.dylib /usr/lib/libcurl.dylib 
> 3rdparty/glog-0.3.3/src/glog-0.3.3-build/lib/libglog.dylib 
> /usr/lib/libz.dylib /usr/local/opt/subversion/lib/libsvn_delta-1.dylib 
> /usr/local/opt/subversion/lib/libsvn_diff-1.dylib 
> /usr/local/opt/subversion/lib/libsvn_subr-1.dylib 
> 3rdparty/http_parser-2.6.2/src/http_parser-2.6.2-build/libhttp_parser.a 
> 3rdparty/zookeeper-3.4.8/src/zookeeper-3.4.8-build/libhashtable.a && :
> Undefined symbols for architecture x86_64:
>   "mesos::MesosExecutorDriver::MesosExecutorDriver(mesos::Executor*)", 
> referenced from:
>   _Java_org_apache_mesos_MesosExecutorDriver_initialize in 
> org_apache_mesos_MesosExecutorDriver.cpp.o
> <...>
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (MESOS-8261) PersistentVolumeEndpointsTest.ReserveAndSlaveRemoval is flaky.

2017-11-23 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov reassigned MESOS-8261:
--

Assignee: Alexander Rukletsov

> PersistentVolumeEndpointsTest.ReserveAndSlaveRemoval is flaky.
> --
>
> Key: MESOS-8261
> URL: https://issues.apache.org/jira/browse/MESOS-8261
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.5.0
> Environment: Ubuntu 17.04
>Reporter: Alexander Rukletsov
>Assignee: Alexander Rukletsov
>  Labels: flaky-test
> Attachments: ReserveAndSlaveRemoval-badrun.txt
>
>
> {noformat}
> /home/ubuntu/workspace/mesos/Mesos_CI-build/FLAG/CMake/label/mesos-ec2-ubuntu-17.04/mesos/src/tests/persistent_volume_endpoints_tests.cpp:1886
> Actual function call count doesn't match EXPECT_CALL(sched, offerRescinded(_, 
> _))...
>  Expected: to be called once
>Actual: never called - unsatisfied and active
> {noformat}
> Full log attached.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-8261) PersistentVolumeEndpointsTest.ReserveAndSlaveRemoval is flaky.

2017-11-23 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16264255#comment-16264255
 ] 

Alexander Rukletsov commented on MESOS-8261:


This is likely the cause:
{noformat}
I1122 19:37:14.881408  8650 sched.cpp:2009] Asked to stop the driver
I1122 19:37:14.881644  8657 sched.cpp:927] Ignoring rescind offer message 
because the driver is not running!
{noformat}


> PersistentVolumeEndpointsTest.ReserveAndSlaveRemoval is flaky.
> --
>
> Key: MESOS-8261
> URL: https://issues.apache.org/jira/browse/MESOS-8261
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.5.0
> Environment: Ubuntu 17.04
>Reporter: Alexander Rukletsov
>  Labels: flaky-test
> Attachments: ReserveAndSlaveRemoval-badrun.txt
>
>
> {noformat}
> /home/ubuntu/workspace/mesos/Mesos_CI-build/FLAG/CMake/label/mesos-ec2-ubuntu-17.04/mesos/src/tests/persistent_volume_endpoints_tests.cpp:1886
> Actual function call count doesn't match EXPECT_CALL(sched, offerRescinded(_, 
> _))...
>  Expected: to be called once
>Actual: never called - unsatisfied and active
> {noformat}
> Full log attached.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (MESOS-8261) PersistentVolumeEndpointsTest.ReserveAndSlaveRemoval is flaky.

2017-11-23 Thread Alexander Rukletsov (JIRA)
Alexander Rukletsov created MESOS-8261:
--

 Summary: PersistentVolumeEndpointsTest.ReserveAndSlaveRemoval is 
flaky.
 Key: MESOS-8261
 URL: https://issues.apache.org/jira/browse/MESOS-8261
 Project: Mesos
  Issue Type: Bug
  Components: test
Affects Versions: 1.5.0
 Environment: Ubuntu 17.04
Reporter: Alexander Rukletsov
 Attachments: ReserveAndSlaveRemoval-badrun.txt

{noformat}
/home/ubuntu/workspace/mesos/Mesos_CI-build/FLAG/CMake/label/mesos-ec2-ubuntu-17.04/mesos/src/tests/persistent_volume_endpoints_tests.cpp:1886
Actual function call count doesn't match EXPECT_CALL(sched, offerRescinded(_, 
_))...
 Expected: to be called once
   Actual: never called - unsatisfied and active
{noformat}
Full log attached.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8261) PersistentVolumeEndpointsTest.ReserveAndSlaveRemoval is flaky.

2017-11-23 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-8261:
---
Attachment: ReserveAndSlaveRemoval-badrun.txt

> PersistentVolumeEndpointsTest.ReserveAndSlaveRemoval is flaky.
> --
>
> Key: MESOS-8261
> URL: https://issues.apache.org/jira/browse/MESOS-8261
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.5.0
> Environment: Ubuntu 17.04
>Reporter: Alexander Rukletsov
>  Labels: flaky-test
> Attachments: ReserveAndSlaveRemoval-badrun.txt
>
>
> {noformat}
> /home/ubuntu/workspace/mesos/Mesos_CI-build/FLAG/CMake/label/mesos-ec2-ubuntu-17.04/mesos/src/tests/persistent_volume_endpoints_tests.cpp:1886
> Actual function call count doesn't match EXPECT_CALL(sched, offerRescinded(_, 
> _))...
>  Expected: to be called once
>Actual: never called - unsatisfied and active
> {noformat}
> Full log attached.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8260) ZooKeeperMasterContenderDetectorTest.ContenderDetectorShutdownNetwork is flaky.

2017-11-23 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-8260:
---
Attachment: ContenderDetectorShutdownNetwork-badrun.txt

> ZooKeeperMasterContenderDetectorTest.ContenderDetectorShutdownNetwork is 
> flaky.
> ---
>
> Key: MESOS-8260
> URL: https://issues.apache.org/jira/browse/MESOS-8260
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.5.0
> Environment: CentOS 6
>Reporter: Alexander Rukletsov
>  Labels: flaky-test
> Attachments: ContenderDetectorShutdownNetwork-badrun.txt
>
>
> {noformat}
> ../../src/tests/master_contender_detector_tests.cpp:524
> Failed to wait 15secs for contended
> {noformat}
> Full log attached.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (MESOS-8260) ZooKeeperMasterContenderDetectorTest.ContenderDetectorShutdownNetwork is flaky.

2017-11-23 Thread Alexander Rukletsov (JIRA)
Alexander Rukletsov created MESOS-8260:
--

 Summary: 
ZooKeeperMasterContenderDetectorTest.ContenderDetectorShutdownNetwork is flaky.
 Key: MESOS-8260
 URL: https://issues.apache.org/jira/browse/MESOS-8260
 Project: Mesos
  Issue Type: Bug
  Components: test
Affects Versions: 1.5.0
 Environment: CentOS 6
Reporter: Alexander Rukletsov
 Attachments: ContenderDetectorShutdownNetwork-badrun.txt

{noformat}
../../src/tests/master_contender_detector_tests.cpp:524
Failed to wait 15secs for contended
{noformat}
Full log attached.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (MESOS-8259) IOSwitchboardTest.ContainerAttach is flaky

2017-11-23 Thread Alexander Rukletsov (JIRA)
Alexander Rukletsov created MESOS-8259:
--

 Summary: IOSwitchboardTest.ContainerAttach is flaky
 Key: MESOS-8259
 URL: https://issues.apache.org/jira/browse/MESOS-8259
 Project: Mesos
  Issue Type: Bug
  Components: test
 Environment: Ubuntu 16.04
Reporter: Alexander Rukletsov


Observed it today in out CI:
{noformat}
[ RUN  ] IOSwitchboardTest.ContainerAttach
I1123 01:28:29.068318  4542 containerizer.cpp:301] Using isolation { 
environment_secret, network/cni, filesystem/posix, posix/cpu }
I1123 01:28:29.069078  4542 provisioner.cpp:259] Using default backend 'overlay'
I1123 01:28:29.070669 17203 containerizer.cpp:668] Recovering containerizer
I1123 01:28:29.071708 17203 provisioner.cpp:455] Provisioner recovery complete
I1123 01:28:29.072427 17207 containerizer.cpp:1195] Starting container 
203c6c53-7ca0-46d7-b4b7-28948bcfe44f
I1123 01:28:29.072916 17207 containerizer.cpp:1367] Checkpointed 
ContainerConfig at 
'/tmp/IOSwitchboardTest_ContainerAttach_oVtobZ/containers/203c6c53-7ca0-46d7-b4b7-28948bcfe44f/config'
I1123 01:28:29.072935 17207 containerizer.cpp:2919] Transitioning the state of 
container 203c6c53-7ca0-46d7-b4b7-28948bcfe44f from PROVISIONING to PREPARING
I1123 01:28:29.074801 17207 switchboard.cpp:416] Allocated pseudo terminal 
'/dev/pts/0' for container 203c6c53-7ca0-46d7-b4b7-28948bcfe44f
I1123 01:28:29.075064 17207 switchboard.cpp:545] Launching 
'mesos-io-switchboard' with flags '--heartbeat_interval="30secs" --help="false" 
--socket_address="/tmp/mesos-io-switchboard-ba0fb5bc-4942-4cb4-bd20-6235a3bcf0f8"
 --stderr_from_fd="7" --stderr_to_fd="2" --stdin_to_fd="7" --stdout_from_fd="7" 
--stdout_to_fd="1" --tty="true" --wait_for_connection="false"' for container 
203c6c53-7ca0-46d7-b4b7-28948bcfe44f
I1123 01:28:29.078411 17207 switchboard.cpp:575] Created I/O switchboard server 
(pid: 17758) listening on socket file 
'/tmp/mesos-io-switchboard-ba0fb5bc-4942-4cb4-bd20-6235a3bcf0f8' for container 
203c6c53-7ca0-46d7-b4b7-28948bcfe44f
I1123 01:28:29.080448 17208 containerizer.cpp:1836] Launching 
'mesos-containerizer' with flags '--help="false" 
--launch_info="{"command":{"shell":true,"value":"sleep 
1000"},"environment":{"variables":[{"name":"MESOS_SANDBOX","type":"VALUE","value":"\/tmp\/IOSwitchboardTest_ContainerAttach_CDGOU4"}]},"task_environment":{},"tty_slave_path":"\/dev\/pts\/0","working_directory":"\/tmp\/IOSwitchboardTest_ContainerAttach_CDGOU4"}"
 --pipe_read="7" --pipe_write="9" 
--runtime_directory="/tmp/IOSwitchboardTest_ContainerAttach_oVtobZ/containers/203c6c53-7ca0-46d7-b4b7-28948bcfe44f"
 --unshare_namespace_mnt="false"'
I1123 01:28:29.083534 17208 launcher.cpp:140] Forked child with pid '17759' for 
container '203c6c53-7ca0-46d7-b4b7-28948bcfe44f'
I1123 01:28:29.084763 17208 containerizer.cpp:2919] Transitioning the state of 
container 203c6c53-7ca0-46d7-b4b7-28948bcfe44f from PREPARING to ISOLATING
I1123 01:28:29.092840 17204 containerizer.cpp:2919] Transitioning the state of 
container 203c6c53-7ca0-46d7-b4b7-28948bcfe44f from ISOLATING to FETCHING
I1123 01:28:29.093013 17204 fetcher.cpp:379] Starting to fetch URIs for 
container: 203c6c53-7ca0-46d7-b4b7-28948bcfe44f, directory: 
/tmp/IOSwitchboardTest_ContainerAttach_CDGOU4
I1123 01:28:29.093564 17204 containerizer.cpp:2919] Transitioning the state of 
container 203c6c53-7ca0-46d7-b4b7-28948bcfe44f from FETCHING to RUNNING
/home/ubuntu/workspace/mesos/Mesos_CI-build/FLAG/CMake/label/mesos-ec2-ubuntu-16.04/mesos/src/tests/containerizer/io_switchboard_tests.cpp:755:
 Failure
(connection).failure(): Failed to connect to 
/tmp/mesos-io-switchboard-ba0fb5bc-4942-4cb4-bd20-6235a3bcf0f8: Connection 
refused
[  FAILED  ] IOSwitchboardTest.ContainerAttach (194 ms)
{noformat}




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8259) IOSwitchboardTest.ContainerAttach is flaky.

2017-11-23 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-8259:
---
Summary: IOSwitchboardTest.ContainerAttach is flaky.  (was: 
IOSwitchboardTest.ContainerAttach is flaky)

> IOSwitchboardTest.ContainerAttach is flaky.
> ---
>
> Key: MESOS-8259
> URL: https://issues.apache.org/jira/browse/MESOS-8259
> Project: Mesos
>  Issue Type: Bug
>  Components: test
> Environment: Ubuntu 16.04
>Reporter: Alexander Rukletsov
>  Labels: flaky-test
>
> Observed it today in out CI:
> {noformat}
> [ RUN  ] IOSwitchboardTest.ContainerAttach
> I1123 01:28:29.068318  4542 containerizer.cpp:301] Using isolation { 
> environment_secret, network/cni, filesystem/posix, posix/cpu }
> I1123 01:28:29.069078  4542 provisioner.cpp:259] Using default backend 
> 'overlay'
> I1123 01:28:29.070669 17203 containerizer.cpp:668] Recovering containerizer
> I1123 01:28:29.071708 17203 provisioner.cpp:455] Provisioner recovery complete
> I1123 01:28:29.072427 17207 containerizer.cpp:1195] Starting container 
> 203c6c53-7ca0-46d7-b4b7-28948bcfe44f
> I1123 01:28:29.072916 17207 containerizer.cpp:1367] Checkpointed 
> ContainerConfig at 
> '/tmp/IOSwitchboardTest_ContainerAttach_oVtobZ/containers/203c6c53-7ca0-46d7-b4b7-28948bcfe44f/config'
> I1123 01:28:29.072935 17207 containerizer.cpp:2919] Transitioning the state 
> of container 203c6c53-7ca0-46d7-b4b7-28948bcfe44f from PROVISIONING to 
> PREPARING
> I1123 01:28:29.074801 17207 switchboard.cpp:416] Allocated pseudo terminal 
> '/dev/pts/0' for container 203c6c53-7ca0-46d7-b4b7-28948bcfe44f
> I1123 01:28:29.075064 17207 switchboard.cpp:545] Launching 
> 'mesos-io-switchboard' with flags '--heartbeat_interval="30secs" 
> --help="false" 
> --socket_address="/tmp/mesos-io-switchboard-ba0fb5bc-4942-4cb4-bd20-6235a3bcf0f8"
>  --stderr_from_fd="7" --stderr_to_fd="2" --stdin_to_fd="7" 
> --stdout_from_fd="7" --stdout_to_fd="1" --tty="true" 
> --wait_for_connection="false"' for container 
> 203c6c53-7ca0-46d7-b4b7-28948bcfe44f
> I1123 01:28:29.078411 17207 switchboard.cpp:575] Created I/O switchboard 
> server (pid: 17758) listening on socket file 
> '/tmp/mesos-io-switchboard-ba0fb5bc-4942-4cb4-bd20-6235a3bcf0f8' for 
> container 203c6c53-7ca0-46d7-b4b7-28948bcfe44f
> I1123 01:28:29.080448 17208 containerizer.cpp:1836] Launching 
> 'mesos-containerizer' with flags '--help="false" 
> --launch_info="{"command":{"shell":true,"value":"sleep 
> 1000"},"environment":{"variables":[{"name":"MESOS_SANDBOX","type":"VALUE","value":"\/tmp\/IOSwitchboardTest_ContainerAttach_CDGOU4"}]},"task_environment":{},"tty_slave_path":"\/dev\/pts\/0","working_directory":"\/tmp\/IOSwitchboardTest_ContainerAttach_CDGOU4"}"
>  --pipe_read="7" --pipe_write="9" 
> --runtime_directory="/tmp/IOSwitchboardTest_ContainerAttach_oVtobZ/containers/203c6c53-7ca0-46d7-b4b7-28948bcfe44f"
>  --unshare_namespace_mnt="false"'
> I1123 01:28:29.083534 17208 launcher.cpp:140] Forked child with pid '17759' 
> for container '203c6c53-7ca0-46d7-b4b7-28948bcfe44f'
> I1123 01:28:29.084763 17208 containerizer.cpp:2919] Transitioning the state 
> of container 203c6c53-7ca0-46d7-b4b7-28948bcfe44f from PREPARING to ISOLATING
> I1123 01:28:29.092840 17204 containerizer.cpp:2919] Transitioning the state 
> of container 203c6c53-7ca0-46d7-b4b7-28948bcfe44f from ISOLATING to FETCHING
> I1123 01:28:29.093013 17204 fetcher.cpp:379] Starting to fetch URIs for 
> container: 203c6c53-7ca0-46d7-b4b7-28948bcfe44f, directory: 
> /tmp/IOSwitchboardTest_ContainerAttach_CDGOU4
> I1123 01:28:29.093564 17204 containerizer.cpp:2919] Transitioning the state 
> of container 203c6c53-7ca0-46d7-b4b7-28948bcfe44f from FETCHING to RUNNING
> /home/ubuntu/workspace/mesos/Mesos_CI-build/FLAG/CMake/label/mesos-ec2-ubuntu-16.04/mesos/src/tests/containerizer/io_switchboard_tests.cpp:755:
>  Failure
> (connection).failure(): Failed to connect to 
> /tmp/mesos-io-switchboard-ba0fb5bc-4942-4cb4-bd20-6235a3bcf0f8: Connection 
> refused
> [  FAILED  ] IOSwitchboardTest.ContainerAttach (194 ms)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8258) Mesos.DockerContainerizerTest.ROOT_DOCKER_SlaveRecoveryTaskContainer is flaky.

2017-11-23 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-8258:
---
Attachment: ROOT_DOCKER_SlaveRecoveryTaskContainer-badrun.txt

> Mesos.DockerContainerizerTest.ROOT_DOCKER_SlaveRecoveryTaskContainer is flaky.
> --
>
> Key: MESOS-8258
> URL: https://issues.apache.org/jira/browse/MESOS-8258
> Project: Mesos
>  Issue Type: Bug
>  Components: test
> Environment: Ubuntu 17.04
>Reporter: Alexander Rukletsov
>  Labels: flaky-test
> Attachments: ROOT_DOCKER_SlaveRecoveryTaskContainer-badrun.txt
>
>
> {noformat}
> /home/ubuntu/workspace/mesos/Mesos_CI-build/FLAG/CMake/label/mesos-ec2-ubuntu-17.04/mesos/src/tests/containerizer/docker_containerizer_tests.cpp:2772
>   Expected: 1
> To be equal to: reregister.updates_size()
>   Which is: 2
> {noformat}
> Full log attached.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (MESOS-8258) Mesos.DockerContainerizerTest.ROOT_DOCKER_SlaveRecoveryTaskContainer is flaky.

2017-11-23 Thread Alexander Rukletsov (JIRA)
Alexander Rukletsov created MESOS-8258:
--

 Summary: 
Mesos.DockerContainerizerTest.ROOT_DOCKER_SlaveRecoveryTaskContainer is flaky.
 Key: MESOS-8258
 URL: https://issues.apache.org/jira/browse/MESOS-8258
 Project: Mesos
  Issue Type: Bug
  Components: test
 Environment: Ubuntu 17.04
Reporter: Alexander Rukletsov
 Attachments: ROOT_DOCKER_SlaveRecoveryTaskContainer-badrun.txt

{noformat}
/home/ubuntu/workspace/mesos/Mesos_CI-build/FLAG/CMake/label/mesos-ec2-ubuntu-17.04/mesos/src/tests/containerizer/docker_containerizer_tests.cpp:2772
  Expected: 1
To be equal to: reregister.updates_size()
  Which is: 2
{noformat}
Full log attached.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (MESOS-8247) Executor registered message is lost

2017-11-23 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16262862#comment-16262862
 ] 

Alexander Rukletsov edited comment on MESOS-8247 at 11/23/17 10:41 AM:
---

There are two problems here.

1. Docker executor does not receive the registration confirmation from the 
agent though the agent sends it out. In other words, 
{{ExecutorRegisteredMessage}} is lost. I do not yet know the reason, why the 
message has been lost. All messages except task status updates have 
"at-most-once" delivery policy, so this is theoretically possible. I will 
continue investigation after fixing the problem mentioned below.

2. If docker executor receives a kill task request and the task has never been 
launch, the request is ignored. We now know that: the executor has never 
received the registration confirmation, hence has ignored the launch task 
request, hence the task has never started. And this is how the executor enters 
an idle state, waiting for registration and ignoring kill task requests. 

These patches ensure that the driver-based executors react at kill task 
requests even if the task has not been launched:
https://reviews.apache.org/r/64032/
https://reviews.apache.org/r/64033/


was (Author: alexr):
These patches ensure that the driver-based executors react at kill task 
requests even if the task has not been launched:
https://reviews.apache.org/r/64032/
https://reviews.apache.org/r/64033/

> Executor registered message is lost
> ---
>
> Key: MESOS-8247
> URL: https://issues.apache.org/jira/browse/MESOS-8247
> Project: Mesos
>  Issue Type: Bug
>Reporter: Andrei Budnik
>
> h3. Brief description of successful agent-executor communication.
> Executor sends `RegisterExecutorMessage` message to Agent during 
> initialization step. Agent sends a `ExecutorRegisteredMessage` message as a 
> response to the Executor in `registerExecutor()` method. Whenever executor 
> receives `ExecutorRegisteredMessage`, it prints a `Executor registered on 
> agent...` to stderr logs.
> h3. Problem description.
> The agent launches built-in docker executor, which is stuck in `STAGING` 
> state.
> stderr logs of the docker executor:
> {code}
> I1114 23:03:17.919090 14322 exec.cpp:162] Version: 1.2.3
> {code}
> It doesn't contain a message like `Executor registered on agent...`. At the 
> same time agent received `RegisterExecutorMessage` and sent `runTask` message 
> to the executor.
> stdout logs consists of the same repeating message:
> {code}
> Received killTask for task ...
> {code}
> Also, the docker executor process doesn't contain child processes.
> Currently, executor [doesn't 
> attempt|https://github.com/apache/mesos/blob/2a253093ecdc7d743c9c0874d6e01b68f6a813e4/src/exec/exec.cpp#L320]
>  to launch a task if it is not registered at the agent, while [task 
> killing|https://github.com/apache/mesos/blob/2a253093ecdc7d743c9c0874d6e01b68f6a813e4/src/exec/exec.cpp#L343]
>  doesn't have such a check.
> It looks like `ExecutorRegisteredMessage` has been lost.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)