[jira] [Updated] (MESOS-7428) Report exit code of tasks from default and command executors

2017-05-16 Thread Zhitao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhitao Li updated MESOS-7428:
-
Component/s: containerization

> Report exit code of tasks from default and command executors
> 
>
> Key: MESOS-7428
> URL: https://issues.apache.org/jira/browse/MESOS-7428
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
>Reporter: Zhitao Li
>Assignee: Zhitao Li
>
> Use case: some tasks should only be retried if the exit code matches certain 
> user requirement.
> Based on [~gilbert], we already checkpoint the exit code in containerizer 
> now, and we need to clarify how to report exit code for executor containers 
> v.s. nested containers, and we should do this consistently for command and 
> default executor.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-7428) Report exit code of tasks from default and command executors

2017-05-16 Thread Zhitao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhitao Li updated MESOS-7428:
-
Target Version/s: 1.4.0

> Report exit code of tasks from default and command executors
> 
>
> Key: MESOS-7428
> URL: https://issues.apache.org/jira/browse/MESOS-7428
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Zhitao Li
>Assignee: Zhitao Li
>
> Use case: some tasks should only be retried if the exit code matches certain 
> user requirement.
> Based on [~gilbert], we already checkpoint the exit code in containerizer 
> now, and we need to clarify how to report exit code for executor containers 
> v.s. nested containers, and we should do this consistently for command and 
> default executor.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-7507) Add a metric for the network size of replicas for the registry.

2017-05-16 Thread Yan Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Xu updated MESOS-7507:
--
Description: Maintaining quorum is an important aspect of the high 
availability of Mesos master but right now the best metric one can use is the 
number of master instances that are up. This is inadequate because this is not 
the same as the number of peers the replicas in the network can actually see. 
The proposed name for the new metric is {{registrar/log/ensemble_size}}.  (was: 
Maintaining quorum is an important aspect of the high availability of Mesos 
master but right now the best metric one can use is the number of master 
instances that are up. This is inadequate because this is not the same as the 
number of peers the replicas in the network can actually see. The proposed name 
for the new metric is {{registrar/log/network_size}}.)

> Add a metric for the network size of replicas for the registry.
> ---
>
> Key: MESOS-7507
> URL: https://issues.apache.org/jira/browse/MESOS-7507
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Reporter: Yan Xu
>Assignee: Yan Xu
> Fix For: 1.4.0
>
>
> Maintaining quorum is an important aspect of the high availability of Mesos 
> master but right now the best metric one can use is the number of master 
> instances that are up. This is inadequate because this is not the same as the 
> number of peers the replicas in the network can actually see. The proposed 
> name for the new metric is {{registrar/log/ensemble_size}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-7115) Agent should prefer LOG(FATAL) over EXIT().

2017-05-16 Thread Yan Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16013315#comment-16013315
 ] 

Yan Xu commented on MESOS-7115:
---

{noformat:title=}
commit 342aab64c60d8118468d17974ee0b863d341b1cb
Author: James Peach 
Date:   Fri May 12 14:09:25 2017 -0700

Use the EXIT() macro more consistently in agent startup.

Review: https://reviews.apache.org/r/56680/

commit fff43d202f23a26fd2c73b83989f3c03223f128b
Author: James Peach 
Date:   Tue May 16 17:36:24 2017 -0700

Use glog to log EXIT() messages.

Review: https://reviews.apache.org/r/56681/
{noformat}

> Agent should prefer LOG(FATAL) over EXIT().
> ---
>
> Key: MESOS-7115
> URL: https://issues.apache.org/jira/browse/MESOS-7115
> Project: Mesos
>  Issue Type: Bug
>  Components: agent
>Reporter: James Peach
>Assignee: James Peach
>Priority: Minor
>
> I saw the agent exit with an auth failure:
> {noformat}
> I0210 14:16:49.731459  9503 authenticatee.cpp:259] Received SASL 
> authentication step
> Master master@17.174.144.199:5050 refused authentication
> {noformat}
> Note the lack of log metadata on the exit message. This message (from 
> {{slave.cpp}} and a number of others in the same file should all use 
> {{LOG(FATAL)}} so that log aggregation can pick up the timestamp, error 
> severity, etc.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-7517) HealthCheckTest.ConsecutiveFailures is flaky

2017-05-16 Thread Neil Conway (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16013304#comment-16013304
 ] 

Neil Conway commented on MESOS-7517:


cc [~bmahler]

> HealthCheckTest.ConsecutiveFailures is flaky
> 
>
> Key: MESOS-7517
> URL: https://issues.apache.org/jira/browse/MESOS-7517
> Project: Mesos
>  Issue Type: Bug
>Reporter: Neil Conway
>  Labels: mesosphere
>
> {noformat}
> [ RUN  ] HealthCheckTest.ConsecutiveFailures
> I0516 17:12:44.380421 28941 cluster.cpp:162] Creating default 'local' 
> authorizer
> I0516 17:12:44.389566 28996 master.cpp:436] Master 
> 2b745611-28cc-491b-80ea-2b6e94a9cab8 (core-dev) started on 10.0.49.2:37598
> I0516 17:12:44.389619 28996 master.cpp:438] Flags at startup: --acls="" 
> --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate_agents="true" --authenticate_frameworks="true" 
> --authenticate_http_frameworks="true" --authenticate_http_readonly="true" 
> --authenticate_http_readwrite="true" --authenticators="crammd5" 
> --authorizers="local" --credentials="/tmp/kYELQI/credentials" 
> --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --http_authenticators="basic" --http_framework_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" 
> --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
> --max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false" 
> --recovery_agent_removal_limit="100%" --registry="in_memory" 
> --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" 
> --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" 
> --registry_store_timeout="100secs" --registry_strict="false" 
> --root_submissions="true" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/kYELQI/master" 
> --zk_session_timeout="10secs"
> I0516 17:12:44.389943 28996 master.cpp:488] Master only allowing 
> authenticated frameworks to register
> I0516 17:12:44.389971 28996 master.cpp:502] Master only allowing 
> authenticated agents to register
> I0516 17:12:44.389988 28996 master.cpp:515] Master only allowing 
> authenticated HTTP frameworks to register
> I0516 17:12:44.390012 28996 credentials.hpp:37] Loading credentials for 
> authentication from '/tmp/kYELQI/credentials'
> I0516 17:12:44.390353 28996 master.cpp:560] Using default 'crammd5' 
> authenticator
> I0516 17:12:44.390504 28996 http.cpp:975] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-readonly'
> I0516 17:12:44.390661 28996 http.cpp:975] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-readwrite'
> I0516 17:12:44.390993 28996 http.cpp:975] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-scheduler'
> I0516 17:12:44.391158 28996 master.cpp:640] Authorization enabled
> I0516 17:12:44.393784 28958 master.cpp:2161] Elected as the leading master!
> I0516 17:12:44.393831 28958 master.cpp:1700] Recovering from registrar
> I0516 17:12:44.394521 28969 registrar.cpp:389] Successfully fetched the 
> registry (0B) in 536064ns
> I0516 17:12:44.394621 28969 registrar.cpp:493] Applied 1 operations in 
> 16653ns; attempting to update the registry
> I0516 17:12:44.395346 28969 registrar.cpp:550] Successfully updated the 
> registry in 664832ns
> I0516 17:12:44.395448 28969 registrar.cpp:422] Successfully recovered 
> registrar
> I0516 17:12:44.395992 28958 master.cpp:1799] Recovered 0 agents from the 
> registry (119B); allowing 10mins for agents to re-register
> I0516 17:12:44.404881 28941 containerizer.cpp:221] Using isolation: 
> posix/cpu,posix/mem,filesystem/posix,network/cni
> W0516 17:12:44.405333 28941 backend.cpp:76] Failed to create 'overlay' 
> backend: OverlayBackend requires root privileges
> W0516 17:12:44.405426 28941 backend.cpp:76] Failed to create 'bind' backend: 
> BindBackend requires root privileges
> I0516 17:12:44.405462 28941 provisioner.cpp:249] Using default backend 'copy'
> I0516 17:12:44.406657 28941 cluster.cpp:448] Creating default 'local' 
> authorizer
> I0516 17:12:44.407929 28989 slave.cpp:225] Mesos agent started on 
> (203)@10.0.49.2:37598
> I0516 17:12:44.407973 28989 slave.cpp:226] Flags at startup: --acls="" 
> --appc_simple_discovery_uri_prefix="http://; 
> --appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="true" 
> --authenticate_http_readwrite="true" --authenticatee="crammd5" 
> --authentication_backoff_factor="1secs" --authorizer="local" 
> --cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" 
> --cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" 

[jira] [Created] (MESOS-7517) HealthCheckTest.ConsecutiveFailures is flaky

2017-05-16 Thread Neil Conway (JIRA)
Neil Conway created MESOS-7517:
--

 Summary: HealthCheckTest.ConsecutiveFailures is flaky
 Key: MESOS-7517
 URL: https://issues.apache.org/jira/browse/MESOS-7517
 Project: Mesos
  Issue Type: Bug
Reporter: Neil Conway


{noformat}
[ RUN  ] HealthCheckTest.ConsecutiveFailures
I0516 17:12:44.380421 28941 cluster.cpp:162] Creating default 'local' authorizer
I0516 17:12:44.389566 28996 master.cpp:436] Master 
2b745611-28cc-491b-80ea-2b6e94a9cab8 (core-dev) started on 10.0.49.2:37598
I0516 17:12:44.389619 28996 master.cpp:438] Flags at startup: --acls="" 
--agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
--allocation_interval="1secs" --allocator="HierarchicalDRF" 
--authenticate_agents="true" --authenticate_frameworks="true" 
--authenticate_http_frameworks="true" --authenticate_http_readonly="true" 
--authenticate_http_readwrite="true" --authenticators="crammd5" 
--authorizers="local" --credentials="/tmp/kYELQI/credentials" 
--framework_sorter="drf" --help="false" --hostname_lookup="true" 
--http_authenticators="basic" --http_framework_authenticators="basic" 
--initialize_driver_logging="true" --log_auto_initialize="true" 
--logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" 
--max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
--max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false" 
--recovery_agent_removal_limit="100%" --registry="in_memory" 
--registry_fetch_timeout="1mins" --registry_gc_interval="15mins" 
--registry_max_agent_age="2weeks" --registry_max_agent_count="102400" 
--registry_store_timeout="100secs" --registry_strict="false" 
--root_submissions="true" --user_sorter="drf" --version="false" 
--webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/kYELQI/master" 
--zk_session_timeout="10secs"
I0516 17:12:44.389943 28996 master.cpp:488] Master only allowing authenticated 
frameworks to register
I0516 17:12:44.389971 28996 master.cpp:502] Master only allowing authenticated 
agents to register
I0516 17:12:44.389988 28996 master.cpp:515] Master only allowing authenticated 
HTTP frameworks to register
I0516 17:12:44.390012 28996 credentials.hpp:37] Loading credentials for 
authentication from '/tmp/kYELQI/credentials'
I0516 17:12:44.390353 28996 master.cpp:560] Using default 'crammd5' 
authenticator
I0516 17:12:44.390504 28996 http.cpp:975] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-readonly'
I0516 17:12:44.390661 28996 http.cpp:975] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-readwrite'
I0516 17:12:44.390993 28996 http.cpp:975] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-scheduler'
I0516 17:12:44.391158 28996 master.cpp:640] Authorization enabled
I0516 17:12:44.393784 28958 master.cpp:2161] Elected as the leading master!
I0516 17:12:44.393831 28958 master.cpp:1700] Recovering from registrar
I0516 17:12:44.394521 28969 registrar.cpp:389] Successfully fetched the 
registry (0B) in 536064ns
I0516 17:12:44.394621 28969 registrar.cpp:493] Applied 1 operations in 16653ns; 
attempting to update the registry
I0516 17:12:44.395346 28969 registrar.cpp:550] Successfully updated the 
registry in 664832ns
I0516 17:12:44.395448 28969 registrar.cpp:422] Successfully recovered registrar
I0516 17:12:44.395992 28958 master.cpp:1799] Recovered 0 agents from the 
registry (119B); allowing 10mins for agents to re-register
I0516 17:12:44.404881 28941 containerizer.cpp:221] Using isolation: 
posix/cpu,posix/mem,filesystem/posix,network/cni
W0516 17:12:44.405333 28941 backend.cpp:76] Failed to create 'overlay' backend: 
OverlayBackend requires root privileges
W0516 17:12:44.405426 28941 backend.cpp:76] Failed to create 'bind' backend: 
BindBackend requires root privileges
I0516 17:12:44.405462 28941 provisioner.cpp:249] Using default backend 'copy'
I0516 17:12:44.406657 28941 cluster.cpp:448] Creating default 'local' authorizer
I0516 17:12:44.407929 28989 slave.cpp:225] Mesos agent started on 
(203)@10.0.49.2:37598
I0516 17:12:44.407973 28989 slave.cpp:226] Flags at startup: --acls="" 
--appc_simple_discovery_uri_prefix="http://; 
--appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="true" 
--authenticate_http_readwrite="true" --authenticatee="crammd5" 
--authentication_backoff_factor="1secs" --authorizer="local" 
--cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" 
--cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" 
--cgroups_root="mesos" --container_disk_watch_interval="15secs" 
--containerizers="mesos" 
--credential="/tmp/HealthCheckTest_ConsecutiveFailures_wNp6VH/credential" 
--default_role="*" --disk_watch_interval="1mins" --docker="docker" 
--docker_kill_orphans="true" --docker_registry="https://registry-1.docker.io; 
--docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock" 

[jira] [Commented] (MESOS-5994) Add Windows support for modules

2017-05-16 Thread Andrew Schwartzmeyer (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16013272#comment-16013272
 ] 

Andrew Schwartzmeyer commented on MESOS-5994:
-

I _think_ we might be able to switch from static libraries to shared libraries 
now that the export all support is available; this may be pretty simple now.

> Add Windows support for modules 
> 
>
> Key: MESOS-5994
> URL: https://issues.apache.org/jira/browse/MESOS-5994
> Project: Mesos
>  Issue Type: Improvement
>  Components: modules
> Environment: Windows
>Reporter: Joseph Wu
>Assignee: Jeff Coffler
>Priority: Minor
>  Labels: agent, master, mesos, mesosphere, windows
>
> Modules are currently not supported on Windows due to a couple limitations:
> * GCC and Clang export all symbols to shared libraries by default.  MSVC has 
> the opposite behavior and does not export any symbols by default.  To 
> properly create a shared library on Windows, one must 
> {{__declspec(dllexport)}} every single exposed function/class.
> * CMake 3.4+ has utilities for auto-generating exports, but upgrading the 
> CMake requirement has other version incompatibilities.
> * We can't load a statically linked module due to a runtime check in the 
> protobuf library.
> For now, module-related code is not compiled on Windows.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-5994) Add Windows support for modules

2017-05-16 Thread Andrew Schwartzmeyer (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16013270#comment-16013270
 ] 

Andrew Schwartzmeyer commented on MESOS-5994:
-

On Windows, VS 2017 is forcing us to move the minimum required version to 3.8, 
so this is now available:

set(CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS ON)

> Add Windows support for modules 
> 
>
> Key: MESOS-5994
> URL: https://issues.apache.org/jira/browse/MESOS-5994
> Project: Mesos
>  Issue Type: Improvement
>  Components: modules
> Environment: Windows
>Reporter: Joseph Wu
>Assignee: Jeff Coffler
>Priority: Minor
>  Labels: agent, master, mesos, mesosphere, windows
>
> Modules are currently not supported on Windows due to a couple limitations:
> * GCC and Clang export all symbols to shared libraries by default.  MSVC has 
> the opposite behavior and does not export any symbols by default.  To 
> properly create a shared library on Windows, one must 
> {{__declspec(dllexport)}} every single exposed function/class.
> * CMake 3.4+ has utilities for auto-generating exports, but upgrading the 
> CMake requirement has other version incompatibilities.
> * We can't load a statically linked module due to a runtime check in the 
> protobuf library.
> For now, module-related code is not compiled on Windows.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-5994) Add Windows support for modules

2017-05-16 Thread Andrew Schwartzmeyer (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16013268#comment-16013268
 ] 

Andrew Schwartzmeyer commented on MESOS-5994:
-

I think the description was referencing this: 
http://stackoverflow.com/questions/225432/export-all-symbols-when-creating-a-dll

> Add Windows support for modules 
> 
>
> Key: MESOS-5994
> URL: https://issues.apache.org/jira/browse/MESOS-5994
> Project: Mesos
>  Issue Type: Improvement
>  Components: modules
> Environment: Windows
>Reporter: Joseph Wu
>Assignee: Jeff Coffler
>Priority: Minor
>  Labels: agent, master, mesos, mesosphere, windows
>
> Modules are currently not supported on Windows due to a couple limitations:
> * GCC and Clang export all symbols to shared libraries by default.  MSVC has 
> the opposite behavior and does not export any symbols by default.  To 
> properly create a shared library on Windows, one must 
> {{__declspec(dllexport)}} every single exposed function/class.
> * CMake 3.4+ has utilities for auto-generating exports, but upgrading the 
> CMake requirement has other version incompatibilities.
> * We can't load a statically linked module due to a runtime check in the 
> protobuf library.
> For now, module-related code is not compiled on Windows.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-7488) Add `--ip6` and `--ip6_discovery_command` flag to Mesos agent

2017-05-16 Thread Avinash Sridharan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Avinash Sridharan updated MESOS-7488:
-
Labels: mesosphere  (was: )

> Add `--ip6` and `--ip6_discovery_command` flag to Mesos agent
> -
>
> Key: MESOS-7488
> URL: https://issues.apache.org/jira/browse/MESOS-7488
> Project: Mesos
>  Issue Type: Task
>  Components: agent
>Reporter: Avinash Sridharan
>Assignee: Avinash Sridharan
>  Labels: mesosphere
> Fix For: 1.4.0
>
>
> As a first step to support IPv6 containers on Mesos, we need to provide 
> `--ip6` and `--ip6_discovery_command` flags to the agent so that the operator 
> can  specify an IPv6 address for the `libprocess` actor on the agent. In this 
> ticket we will not aim to add IPv6 communication support for Mesos but will 
> aim to use the IPv6 address provided by the operator to fill in the v6 
> address for any containers running on the host network in a dual stack 
> environment.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-7488) Add `--ip6` and `--ip6_discovery_command` flag to Mesos agent

2017-05-16 Thread Avinash Sridharan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Avinash Sridharan updated MESOS-7488:
-
  Sprint: Mesosphere Sprint 57
Story Points: 5

> Add `--ip6` and `--ip6_discovery_command` flag to Mesos agent
> -
>
> Key: MESOS-7488
> URL: https://issues.apache.org/jira/browse/MESOS-7488
> Project: Mesos
>  Issue Type: Task
>  Components: agent
>Reporter: Avinash Sridharan
>Assignee: Avinash Sridharan
>  Labels: mesosphere
> Fix For: 1.4.0
>
>
> As a first step to support IPv6 containers on Mesos, we need to provide 
> `--ip6` and `--ip6_discovery_command` flags to the agent so that the operator 
> can  specify an IPv6 address for the `libprocess` actor on the agent. In this 
> ticket we will not aim to add IPv6 communication support for Mesos but will 
> aim to use the IPv6 address provided by the operator to fill in the v6 
> address for any containers running on the host network in a dual stack 
> environment.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-7488) Add `--ip6` and `--ip6_discovery_command` flag to Mesos agent

2017-05-16 Thread Avinash Sridharan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Avinash Sridharan updated MESOS-7488:
-
Fix Version/s: 1.4.0

> Add `--ip6` and `--ip6_discovery_command` flag to Mesos agent
> -
>
> Key: MESOS-7488
> URL: https://issues.apache.org/jira/browse/MESOS-7488
> Project: Mesos
>  Issue Type: Task
>  Components: agent
>Reporter: Avinash Sridharan
>Assignee: Avinash Sridharan
> Fix For: 1.4.0
>
>
> As a first step to support IPv6 containers on Mesos, we need to provide 
> `--ip6` and `--ip6_discovery_command` flags to the agent so that the operator 
> can  specify an IPv6 address for the `libprocess` actor on the agent. In this 
> ticket we will not aim to add IPv6 communication support for Mesos but will 
> aim to use the IPv6 address provided by the operator to fill in the v6 
> address for any containers running on the host network in a dual stack 
> environment.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-6223) Allow agents to re-register post a host reboot

2017-05-16 Thread Megha Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16013251#comment-16013251
 ] 

Megha Sharma commented on MESOS-6223:
-

Review Request
https://reviews.apache.org/r/56895/

> Allow agents to re-register post a host reboot
> --
>
> Key: MESOS-6223
> URL: https://issues.apache.org/jira/browse/MESOS-6223
> Project: Mesos
>  Issue Type: Improvement
>  Components: agent
>Reporter: Megha Sharma
>Assignee: Megha Sharma
>
> Agent does’t recover its state post a host reboot, it registers with the 
> master and gets a new SlaveID. With partition awareness, the agents are now 
> allowed to re-register after they have been marked Unreachable. The executors 
> are anyway terminated on the agent when it reboots so there is no harm in 
> letting the agent keep its SlaveID, re-register with the master and reconcile 
> the lost executors. This is a pre-requisite for supporting 
> persistent/restartable tasks in mesos (MESOS-3545).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-6223) Allow agents to re-register post a host reboot

2017-05-16 Thread Megha Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16013053#comment-16013053
 ] 

Megha Sharma commented on MESOS-6223:
-

[~xds2000] Sorry was on vacation last week. I am going to address the Vinod's 
comments on priority and try to close it asap.

> Allow agents to re-register post a host reboot
> --
>
> Key: MESOS-6223
> URL: https://issues.apache.org/jira/browse/MESOS-6223
> Project: Mesos
>  Issue Type: Improvement
>  Components: agent
>Reporter: Megha Sharma
>Assignee: Megha Sharma
>
> Agent does’t recover its state post a host reboot, it registers with the 
> master and gets a new SlaveID. With partition awareness, the agents are now 
> allowed to re-register after they have been marked Unreachable. The executors 
> are anyway terminated on the agent when it reboots so there is no harm in 
> letting the agent keep its SlaveID, re-register with the master and reconcile 
> the lost executors. This is a pre-requisite for supporting 
> persistent/restartable tasks in mesos (MESOS-3545).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-7407) Windows 10 Creators Update broke opt-in NTFS long path support

2017-05-16 Thread John Kordich (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16013018#comment-16013018
 ] 

John Kordich commented on MESOS-7407:
-

Confirming that I have encountered this as well, with similar error messages:

E0516 11:45:23.131611 19864 default_executor.cpp:468] Received '500 Internal 
Server Error' (Failed to create nested sandbox directory 
'C:\Users\JOHNKO~1.RED\AppData\Local\Temp\jEiLUX\slaves\53282974-b770-413f-b324-781b9efaf341-S0\frameworks\53282974-b770-413f-b324-781b9efaf341-\executors\default\runs\c8a2c2ca-5c09-4732-9c82-ed54c8d47d32\containers\a7845cfc-9690-41d7-a80e-85c91ca6b7f8':
 No such file or directory) while launching child container

> Windows 10 Creators Update broke opt-in NTFS long path support
> --
>
> Key: MESOS-7407
> URL: https://issues.apache.org/jira/browse/MESOS-7407
> Project: Mesos
>  Issue Type: Bug
>  Components: agent
> Environment: Windows 10 with "Creators Update" (Build 15063)
>Reporter: Andrew Schwartzmeyer
>Assignee: Andrew Schwartzmeyer
>  Labels: windows
>
> The Creators Update  _seems_ to have broken the notion of opting into NTFS 
> long path support (i.e. by setting a registry key or group policy). I've 
> verified this on both my work and home desktops, which updated to the 
> Creators Update, and subsequently the DefaultExecutorTests started to fail 
> with "path is too long".
> I would appreciate other verification.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-6350) Raise minimum required cmake version

2017-05-16 Thread Benjamin Bannier (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16012990#comment-16012990
 ] 

Benjamin Bannier commented on MESOS-6350:
-

[~kaysoky] What else is missing here to justify taking action?

There are a number of issues caused by the currently required cmake version 
which lead to us adding cmake construct not recommended anymore, or certain 
features being outright blocked. The longer we put off modernizing our cmake 
requirements, the more cruft will be added.

> Raise minimum required cmake version
> 
>
> Key: MESOS-6350
> URL: https://issues.apache.org/jira/browse/MESOS-6350
> Project: Mesos
>  Issue Type: Improvement
>  Components: cmake
>Reporter: Benjamin Bannier
>  Labels: mesosphere, microsoft, tech-debt
>
> We currently require at least cmake-2.8 which had its first point release 
> 2010 and last update 2013. Meanwhile upstream is preparing the release of 
> 3.7.0. While cmake support in Mesos is still experimental we should evaluate 
> how much we can increase the minimal required version so we are not locked 
> into an old version lacking desirable features.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-7516) HookTest.VerifySlaveResourcesAndAttributesDecorator is flaky

2017-05-16 Thread Neil Conway (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16012893#comment-16012893
 ] 

Neil Conway commented on MESOS-7516:


cc [~flx42]

> HookTest.VerifySlaveResourcesAndAttributesDecorator is flaky
> 
>
> Key: MESOS-7516
> URL: https://issues.apache.org/jira/browse/MESOS-7516
> Project: Mesos
>  Issue Type: Bug
>Reporter: Neil Conway
>  Labels: mesosphere
>
> Takes a few hundred iterations to repro, but does repro consistently:
> {noformat}
> [ RUN  ] HookTest.VerifySlaveResourcesAndAttributesDecorator
> I0516 11:32:43.248517 27528 cluster.cpp:162] Creating default 'local' 
> authorizer
> I0516 11:32:43.263743 27551 master.cpp:436] Master 
> e6c479e5-b7e6-439e-a7ad-018faf297fad (core-dev) started on 10.0.49.2:33039
> I0516 11:32:43.263772 27551 master.cpp:438] Flags at startup: --acls="" 
> --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate_agents="true" --authenticate_frameworks="true" 
> --authenticate_http_frameworks="true" --authenticate_http_readonly="true" 
> --authenticate_http_readwrite="true" --authenticators="crammd5" 
> --authorizers="local" --credentials="/tmp/McnBom/credentials" 
> --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --http_authenticators="basic" --http_framework_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" 
> --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
> --max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false" 
> --recovery_agent_removal_limit="100%" --registry="in_memory" 
> --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" 
> --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" 
> --registry_store_timeout="100secs" --registry_strict="false" 
> --root_submissions="true" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/McnBom/master" 
> --zk_session_timeout="10secs"
> I0516 11:32:43.263958 27551 master.cpp:488] Master only allowing 
> authenticated frameworks to register
> I0516 11:32:43.263996 27551 master.cpp:502] Master only allowing 
> authenticated agents to register
> I0516 11:32:43.264010 27551 master.cpp:515] Master only allowing 
> authenticated HTTP frameworks to register
> I0516 11:32:43.264025 27551 credentials.hpp:37] Loading credentials for 
> authentication from '/tmp/McnBom/credentials'
> I0516 11:32:43.264264 27551 master.cpp:560] Using default 'crammd5' 
> authenticator
> I0516 11:32:43.264365 27551 http.cpp:975] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-readonly'
> I0516 11:32:43.264456 27551 http.cpp:975] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-readwrite'
> I0516 11:32:43.264750 27551 http.cpp:975] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-scheduler'
> I0516 11:32:43.264885 27551 master.cpp:640] Authorization enabled
> I0516 11:32:43.267530 27581 master.cpp:2161] Elected as the leading master!
> I0516 11:32:43.267578 27581 master.cpp:1700] Recovering from registrar
> I0516 11:32:43.268239 27570 registrar.cpp:389] Successfully fetched the 
> registry (0B) in 477952ns
> I0516 11:32:43.268348 27570 registrar.cpp:493] Applied 1 operations in 
> 15690ns; attempting to update the registry
> I0516 11:32:43.268817 27570 registrar.cpp:550] Successfully updated the 
> registry in 409344ns
> I0516 11:32:43.268924 27570 registrar.cpp:422] Successfully recovered 
> registrar
> I0516 11:32:43.269623 27568 master.cpp:1799] Recovered 0 agents from the 
> registry (119B); allowing 10mins for agents to re-register
> I0516 11:32:43.288718 27528 cluster.cpp:448] Creating default 'local' 
> authorizer
> I0516 11:32:43.289685 27572 slave.cpp:225] Mesos agent started on 
> (123)@10.0.49.2:33039
> I0516 11:32:43.289724 27572 slave.cpp:226] Flags at startup: --acls="" 
> --appc_simple_discovery_uri_prefix="http://; 
> --appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="true" 
> --authenticate_http_readwrite="true" --authenticatee="crammd5" 
> --authentication_backoff_factor="1secs" --authorizer="local" 
> --cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" 
> --cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" 
> --cgroups_root="mesos" --container_disk_watch_interval="15secs" 
> --containerizers="mesos" 
> --credential="/tmp/HookTest_VerifySlaveResourcesAndAttributesDecorator_0N95tZ/credential"
>  --default_role="*" --disk_watch_interval="1mins" --docker="docker" 
> --docker_kill_orphans="true" 

[jira] [Created] (MESOS-7516) HookTest.VerifySlaveResourcesAndAttributesDecorator is flaky

2017-05-16 Thread Neil Conway (JIRA)
Neil Conway created MESOS-7516:
--

 Summary: HookTest.VerifySlaveResourcesAndAttributesDecorator is 
flaky
 Key: MESOS-7516
 URL: https://issues.apache.org/jira/browse/MESOS-7516
 Project: Mesos
  Issue Type: Bug
Reporter: Neil Conway


Takes a few hundred iterations to repro, but does repro consistently:

{noformat}
[ RUN  ] HookTest.VerifySlaveResourcesAndAttributesDecorator
I0516 11:32:43.248517 27528 cluster.cpp:162] Creating default 'local' authorizer
I0516 11:32:43.263743 27551 master.cpp:436] Master 
e6c479e5-b7e6-439e-a7ad-018faf297fad (core-dev) started on 10.0.49.2:33039
I0516 11:32:43.263772 27551 master.cpp:438] Flags at startup: --acls="" 
--agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
--allocation_interval="1secs" --allocator="HierarchicalDRF" 
--authenticate_agents="true" --authenticate_frameworks="true" 
--authenticate_http_frameworks="true" --authenticate_http_readonly="true" 
--authenticate_http_readwrite="true" --authenticators="crammd5" 
--authorizers="local" --credentials="/tmp/McnBom/credentials" 
--framework_sorter="drf" --help="false" --hostname_lookup="true" 
--http_authenticators="basic" --http_framework_authenticators="basic" 
--initialize_driver_logging="true" --log_auto_initialize="true" 
--logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" 
--max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
--max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false" 
--recovery_agent_removal_limit="100%" --registry="in_memory" 
--registry_fetch_timeout="1mins" --registry_gc_interval="15mins" 
--registry_max_agent_age="2weeks" --registry_max_agent_count="102400" 
--registry_store_timeout="100secs" --registry_strict="false" 
--root_submissions="true" --user_sorter="drf" --version="false" 
--webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/McnBom/master" 
--zk_session_timeout="10secs"
I0516 11:32:43.263958 27551 master.cpp:488] Master only allowing authenticated 
frameworks to register
I0516 11:32:43.263996 27551 master.cpp:502] Master only allowing authenticated 
agents to register
I0516 11:32:43.264010 27551 master.cpp:515] Master only allowing authenticated 
HTTP frameworks to register
I0516 11:32:43.264025 27551 credentials.hpp:37] Loading credentials for 
authentication from '/tmp/McnBom/credentials'
I0516 11:32:43.264264 27551 master.cpp:560] Using default 'crammd5' 
authenticator
I0516 11:32:43.264365 27551 http.cpp:975] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-readonly'
I0516 11:32:43.264456 27551 http.cpp:975] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-readwrite'
I0516 11:32:43.264750 27551 http.cpp:975] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-scheduler'
I0516 11:32:43.264885 27551 master.cpp:640] Authorization enabled
I0516 11:32:43.267530 27581 master.cpp:2161] Elected as the leading master!
I0516 11:32:43.267578 27581 master.cpp:1700] Recovering from registrar
I0516 11:32:43.268239 27570 registrar.cpp:389] Successfully fetched the 
registry (0B) in 477952ns
I0516 11:32:43.268348 27570 registrar.cpp:493] Applied 1 operations in 15690ns; 
attempting to update the registry
I0516 11:32:43.268817 27570 registrar.cpp:550] Successfully updated the 
registry in 409344ns
I0516 11:32:43.268924 27570 registrar.cpp:422] Successfully recovered registrar
I0516 11:32:43.269623 27568 master.cpp:1799] Recovered 0 agents from the 
registry (119B); allowing 10mins for agents to re-register
I0516 11:32:43.288718 27528 cluster.cpp:448] Creating default 'local' authorizer
I0516 11:32:43.289685 27572 slave.cpp:225] Mesos agent started on 
(123)@10.0.49.2:33039
I0516 11:32:43.289724 27572 slave.cpp:226] Flags at startup: --acls="" 
--appc_simple_discovery_uri_prefix="http://; 
--appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="true" 
--authenticate_http_readwrite="true" --authenticatee="crammd5" 
--authentication_backoff_factor="1secs" --authorizer="local" 
--cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" 
--cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" 
--cgroups_root="mesos" --container_disk_watch_interval="15secs" 
--containerizers="mesos" 
--credential="/tmp/HookTest_VerifySlaveResourcesAndAttributesDecorator_0N95tZ/credential"
 --default_role="*" --disk_watch_interval="1mins" --docker="docker" 
--docker_kill_orphans="true" --docker_registry="https://registry-1.docker.io; 
--docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock" 
--docker_stop_timeout="0ns" --docker_store_dir="/tmp/mesos/store/docker" 
--docker_volume_checkpoint_dir="/var/run/mesos/isolators/docker/volume" 
--enforce_container_disk_quota="false" --executor_registration_timeout="1mins" 
--executor_shutdown_grace_period="5secs" 

[jira] [Created] (MESOS-7515) MasterAllocatorTest/0.ResourcesUnused is flaky

2017-05-16 Thread Neil Conway (JIRA)
Neil Conway created MESOS-7515:
--

 Summary: MasterAllocatorTest/0.ResourcesUnused is flaky
 Key: MESOS-7515
 URL: https://issues.apache.org/jira/browse/MESOS-7515
 Project: Mesos
  Issue Type: Bug
Reporter: Neil Conway


{noformat}
[ RUN  ] MasterAllocatorTest/0.ResourcesUnused
I0516 11:23:52.681485 27347 cluster.cpp:162] Creating default 'local' authorizer
I0516 11:23:52.689667 27389 master.cpp:436] Master 
0596a957-df3e-4b44-94d6-d99478d0bb6e (core-dev) started on 10.0.49.2:42110
I0516 11:23:52.689745 27389 master.cpp:438] Flags at startup: --acls="" 
--agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
--allocation_interval="1secs" --allocator="HierarchicalDRF" 
--authenticate_agents="true" --authenticate_frameworks="true" 
--authenticate_http_frameworks="true" --authenticate_http_readonly="true" 
--authenticate_http_readwrite="true" --authenticators="crammd5" 
--authorizers="local" --credentials="/tmp/5Pnjkv/credentials" 
--framework_sorter="drf" --help="false" --hostname_lookup="true" 
--http_authenticators="basic" --http_framework_authenticators="basic" 
--initialize_driver_logging="true" --log_auto_initialize="true" 
--logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" 
--max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
--max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false" 
--recovery_agent_removal_limit="100%" --registry="in_memory" 
--registry_fetch_timeout="1mins" --registry_gc_interval="15mins" 
--registry_max_agent_age="2weeks" --registry_max_agent_count="102400" 
--registry_store_timeout="100secs" --registry_strict="false" 
--root_submissions="true" --user_sorter="drf" --version="false" 
--webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/5Pnjkv/master" 
--zk_session_timeout="10secs"
I0516 11:23:52.690110 27389 master.cpp:488] Master only allowing authenticated 
frameworks to register
I0516 11:23:52.690142 27389 master.cpp:502] Master only allowing authenticated 
agents to register
I0516 11:23:52.690166 27389 master.cpp:515] Master only allowing authenticated 
HTTP frameworks to register
I0516 11:23:52.690218 27389 credentials.hpp:37] Loading credentials for 
authentication from '/tmp/5Pnjkv/credentials'
I0516 11:23:52.690475 27389 master.cpp:560] Using default 'crammd5' 
authenticator
I0516 11:23:52.690603 27389 http.cpp:975] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-readonly'
I0516 11:23:52.690723 27389 http.cpp:975] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-readwrite'
I0516 11:23:52.690870 27389 http.cpp:975] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-scheduler'
I0516 11:23:52.691264 27389 master.cpp:640] Authorization enabled
I0516 11:23:52.694108 27394 master.cpp:2161] Elected as the leading master!
I0516 11:23:52.694157 27394 master.cpp:1700] Recovering from registrar
I0516 11:23:52.695142 27362 registrar.cpp:389] Successfully fetched the 
registry (0B) in 756992ns
I0516 11:23:52.695263 27362 registrar.cpp:493] Applied 1 operations in 14433ns; 
attempting to update the registry
I0516 11:23:52.695825 27362 registrar.cpp:550] Successfully updated the 
registry in 457984ns
I0516 11:23:52.695955 27362 registrar.cpp:422] Successfully recovered registrar
I0516 11:23:52.697041 27381 master.cpp:1799] Recovered 0 agents from the 
registry (119B); allowing 10mins for agents to re-register
I0516 11:23:52.712441 27347 cluster.cpp:448] Creating default 'local' authorizer
I0516 11:23:52.713631 27375 slave.cpp:225] Mesos agent started on 
(79)@10.0.49.2:42110
I0516 11:23:52.713680 27375 slave.cpp:226] Flags at startup: --acls="" 
--appc_simple_discovery_uri_prefix="http://; 
--appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="true" 
--authenticate_http_readwrite="true" --authenticatee="crammd5" 
--authentication_backoff_factor="1secs" --authorizer="local" 
--cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" 
--cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" 
--cgroups_root="mesos" --container_disk_watch_interval="15secs" 
--containerizers="mesos" 
--credential="/tmp/MasterAllocatorTest_0_ResourcesUnused_KNgb71/credential" 
--default_role="*" --disk_watch_interval="1mins" --docker="docker" 
--docker_kill_orphans="true" --docker_registry="https://registry-1.docker.io; 
--docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock" 
--docker_stop_timeout="0ns" --docker_store_dir="/tmp/mesos/store/docker" 
--docker_volume_checkpoint_dir="/var/run/mesos/isolators/docker/volume" 
--enforce_container_disk_quota="false" --executor_registration_timeout="1mins" 
--executor_shutdown_grace_period="5secs" 
--fetcher_cache_dir="/tmp/MasterAllocatorTest_0_ResourcesUnused_KNgb71/fetch" 
--fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks" 
--gc_disk_headroom="0.1" 

[jira] [Commented] (MESOS-4095) Flaky test: MasterAllocatorTest/1.FrameworkExited

2017-05-16 Thread Neil Conway (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16012855#comment-16012855
 ] 

Neil Conway commented on MESOS-4095:


Another verbose test log, with current code and on a different box:

{noformat}
[ RUN  ] MasterAllocatorTest/1.FrameworkExited
I0516 11:20:45.899612 27146 cluster.cpp:162] Creating default 'local' authorizer
I0516 11:20:45.906642 27166 master.cpp:436] Master 
0526e771-e1a3-4bd0-830e-6084ff4bc552 (core-dev) started on 10.0.49.2:40918
I0516 11:20:45.906697 27166 master.cpp:438] Flags at startup: --acls="" 
--agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
--allocation_interval="50ms" --allocator="HierarchicalDRF" 
--authenticate_agents="true" --authenticate_frameworks="true" 
--authenticate_http_frameworks="true" --authenticate_http_readonly="true" 
--authenticate_http_readwrite="true" --authenticators="crammd5" 
--authorizers="local" --credentials="/tmp/CnBelc/credentials" 
--framework_sorter="drf" --help="false" --hostname_lookup="true" 
--http_authenticators="basic" --http_framework_authenticators="basic" 
--initialize_driver_logging="true" --log_auto_initialize="true" 
--logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" 
--max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
--max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false" 
--recovery_agent_removal_limit="100%" --registry="in_memory" 
--registry_fetch_timeout="1mins" --registry_gc_interval="15mins" 
--registry_max_agent_age="2weeks" --registry_max_agent_count="102400" 
--registry_store_timeout="100secs" --registry_strict="false" 
--root_submissions="true" --user_sorter="drf" --version="false" 
--webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/CnBelc/master" 
--zk_session_timeout="10secs"
I0516 11:20:45.906927 27166 master.cpp:488] Master only allowing authenticated 
frameworks to register
I0516 11:20:45.906965 27166 master.cpp:502] Master only allowing authenticated 
agents to register
I0516 11:20:45.906991 27166 master.cpp:515] Master only allowing authenticated 
HTTP frameworks to register
I0516 11:20:45.907013 27166 credentials.hpp:37] Loading credentials for 
authentication from '/tmp/CnBelc/credentials'
I0516 11:20:45.907254 27166 master.cpp:560] Using default 'crammd5' 
authenticator
I0516 11:20:45.907379 27166 http.cpp:975] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-readonly'
I0516 11:20:45.907744 27166 http.cpp:975] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-readwrite'
I0516 11:20:45.908084 27166 http.cpp:975] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-scheduler'
I0516 11:20:45.908445 27166 master.cpp:640] Authorization enabled
I0516 11:20:45.910624 27187 master.cpp:2161] Elected as the leading master!
I0516 11:20:45.910661 27187 master.cpp:1700] Recovering from registrar
I0516 11:20:45.913770 27179 registrar.cpp:389] Successfully fetched the 
registry (0B) in 2.981888ms
I0516 11:20:45.913894 27179 registrar.cpp:493] Applied 1 operations in 16047ns; 
attempting to update the registry
I0516 11:20:45.914415 27179 registrar.cpp:550] Successfully updated the 
registry in 454912ns
I0516 11:20:45.914546 27179 registrar.cpp:422] Successfully recovered registrar
I0516 11:20:45.915236 27207 master.cpp:1799] Recovered 0 agents from the 
registry (119B); allowing 10mins for agents to re-register
I0516 11:20:45.923874 27146 cluster.cpp:448] Creating default 'local' authorizer
I0516 11:20:45.924935 27182 slave.cpp:225] Mesos agent started on 
(3)@10.0.49.2:40918
I0516 11:20:45.924971 27182 slave.cpp:226] Flags at startup: --acls="" 
--appc_simple_discovery_uri_prefix="http://; 
--appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="true" 
--authenticate_http_readwrite="true" --authenticatee="crammd5" 
--authentication_backoff_factor="1secs" --authorizer="local" 
--cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" 
--cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" 
--cgroups_root="mesos" --container_disk_watch_interval="15secs" 
--containerizers="mesos" 
--credential="/tmp/MasterAllocatorTest_1_FrameworkExited_cMdy3f/credential" 
--default_role="*" --disk_watch_interval="1mins" --docker="docker" 
--docker_kill_orphans="true" --docker_registry="https://registry-1.docker.io; 
--docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock" 
--docker_stop_timeout="0ns" --docker_store_dir="/tmp/mesos/store/docker" 
--docker_volume_checkpoint_dir="/var/run/mesos/isolators/docker/volume" 
--enforce_container_disk_quota="false" --executor_registration_timeout="1mins" 
--executor_shutdown_grace_period="5secs" 
--fetcher_cache_dir="/tmp/MasterAllocatorTest_1_FrameworkExited_cMdy3f/fetch" 
--fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks" 
--gc_disk_headroom="0.1" --hadoop_home="" 

[jira] [Created] (MESOS-7514) ReservationTest.PreventUnreservingAlienResources is flaky

2017-05-16 Thread Neil Conway (JIRA)
Neil Conway created MESOS-7514:
--

 Summary: ReservationTest.PreventUnreservingAlienResources is flaky
 Key: MESOS-7514
 URL: https://issues.apache.org/jira/browse/MESOS-7514
 Project: Mesos
  Issue Type: Bug
Reporter: Neil Conway
Assignee: Gastón Kleiman


This repros consistently for me on a many-core box:

{noformat}
[ RUN  ] ReservationTest.PreventUnreservingAlienResources
I0516 11:15:51.562351 26940 cluster.cpp:162] Creating default 'local' authorizer
I0516 11:15:51.656114 26996 master.cpp:436] Master 
0495c63b-8319-43fb-809c-6a18c7005548 (core-dev) started on 10.0.49.2:33500
I0516 11:15:51.656159 26996 master.cpp:438] Flags at startup: --acls="" 
--agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
--allocation_interval="5ms" --allocator="HierarchicalDRF" 
--authenticate_agents="true" --authenticate_frameworks="true" 
--authenticate_http_frameworks="true" --authenticate_http_readonly="true" 
--authenticate_http_readwrite="true" --authenticators="crammd5" 
--authorizers="local" --credentials="/tmp/PGp9GX/credentials" 
--framework_sorter="drf" --help="false" --hostname_lookup="true" 
--http_authenticators="basic" --http_framework_authenticators="basic" 
--initialize_driver_logging="true" --log_auto_initialize="true" 
--logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" 
--max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
--max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false" 
--recovery_agent_removal_limit="100%" --registry="in_memory" 
--registry_fetch_timeout="1mins" --registry_gc_interval="15mins" 
--registry_max_agent_age="2weeks" --registry_max_agent_count="102400" 
--registry_store_timeout="100secs" --registry_strict="false" 
--root_submissions="true" --user_sorter="drf" --version="false" 
--webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/PGp9GX/master" 
--zk_session_timeout="10secs"
I0516 11:15:51.658516 26996 master.cpp:488] Master only allowing authenticated 
frameworks to register
I0516 11:15:51.658540 26996 master.cpp:502] Master only allowing authenticated 
agents to register
I0516 11:15:51.658597 26996 master.cpp:515] Master only allowing authenticated 
HTTP frameworks to register
I0516 11:15:51.658615 26996 credentials.hpp:37] Loading credentials for 
authentication from '/tmp/PGp9GX/credentials'
I0516 11:15:51.659363 26996 master.cpp:560] Using default 'crammd5' 
authenticator
I0516 11:15:51.659560 26996 authenticator.cpp:519] Initializing server SASL
I0516 11:15:51.660568 26996 http.cpp:975] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-readonly'
I0516 11:15:51.660923 26996 http.cpp:975] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-readwrite'
I0516 11:15:51.661057 26996 http.cpp:975] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-scheduler'
I0516 11:15:51.661219 26996 master.cpp:640] Authorization enabled
I0516 11:15:51.672745 26976 master.cpp:2161] Elected as the leading master!
I0516 11:15:51.672835 26976 master.cpp:1700] Recovering from registrar
I0516 11:15:51.676582 26987 registrar.cpp:389] Successfully fetched the 
registry (0B) in 3.4112ms
I0516 11:15:51.676875 26987 registrar.cpp:493] Applied 1 operations in 56756ns; 
attempting to update the registry
I0516 11:15:51.680197 26987 registrar.cpp:550] Successfully updated the 
registry in 3.24608ms
I0516 11:15:51.680672 26987 registrar.cpp:422] Successfully recovered registrar
I0516 11:15:51.681761 26964 master.cpp:1799] Recovered 0 agents from the 
registry (119B); allowing 10mins for agents to re-register
I0516 11:15:51.779008 26940 containerizer.cpp:221] Using isolation: 
posix/cpu,posix/mem,filesystem/posix,network/cni
W0516 11:15:51.779985 26940 backend.cpp:76] Failed to create 'overlay' backend: 
OverlayBackend requires root privileges
W0516 11:15:51.780287 26940 backend.cpp:76] Failed to create 'bind' backend: 
BindBackend requires root privileges
I0516 11:15:51.780387 26940 provisioner.cpp:249] Using default backend 'copy'
I0516 11:15:51.783296 26940 cluster.cpp:448] Creating default 'local' authorizer
I0516 11:15:51.785317 26971 slave.cpp:225] Mesos agent started on 
(1)@10.0.49.2:33500
I0516 11:15:51.785373 26971 slave.cpp:226] Flags at startup: --acls="" 
--appc_simple_discovery_uri_prefix="http://; 
--appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="true" 
--authenticate_http_readwrite="true" --authenticatee="crammd5" 
--authentication_backoff_factor="1secs" --authorizer="local" 
--cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" 
--cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" 
--cgroups_root="mesos" --container_disk_watch_interval="15secs" 
--containerizers="mesos" 
--credential="/tmp/ReservationTest_PreventUnreservingAlienResources_dpHyhe/credential"
 --default_role="*" 

[jira] [Updated] (MESOS-7513) Consider introducing an API call to get the sandbox of a running container.

2017-05-16 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-7513:
--
Labels: mesosphere  (was: )

> Consider introducing an API call to get the sandbox of a running container.
> ---
>
> Key: MESOS-7513
> URL: https://issues.apache.org/jira/browse/MESOS-7513
> Project: Mesos
>  Issue Type: Task
>Reporter: Anand Mazumdar
>  Labels: mesosphere
>
> Currently, there is no public API for getting the path to the sandbox of a 
> running container. This leads to folks reverse engineering the Mesos logic 
> for generating the path and then using it in their scripts. This is already 
> done by the Mesos Web UI and the DC/OS CLI. This is prone to errors if the 
> Mesos path logic changes in the upcoming versions.
> We should introduce a new calls on the v1 Agent API; 
> {{GET_CONTAINER_SANDBOX_PATH}}/{{GET_EXECUTOR_SANDBOX_PATH}} to get the path 
> to a running container (can be nested) and another call to get the path to 
> the executor sandbox.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (MESOS-7513) Consider introducing an API call to get the sandbox of a running container.

2017-05-16 Thread Anand Mazumdar (JIRA)
Anand Mazumdar created MESOS-7513:
-

 Summary: Consider introducing an API call to get the sandbox of a 
running container.
 Key: MESOS-7513
 URL: https://issues.apache.org/jira/browse/MESOS-7513
 Project: Mesos
  Issue Type: Task
Reporter: Anand Mazumdar


Currently, there is no public API for getting the path to the sandbox of a 
running container. This leads to folks reverse engineering the Mesos logic for 
generating the path and then using it in their scripts. This is already done by 
the Mesos Web UI and the DC/OS CLI. This is prone to errors if the Mesos path 
logic changes in the upcoming versions.

We should introduce a new calls on the v1 Agent API; 
{{GET_CONTAINER_SANDBOX_PATH}}/{{GET_EXECUTOR_SANDBOX_PATH}} to get the path to 
a running container (can be nested) and another call to get the path to the 
executor sandbox.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-7512) ContentType/AgentAPITest.LaunchNestedContainerSessionDisconnected/0 is flaky.

2017-05-16 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-7512:
---
Attachment: LaunchNestedContainerSessionDisconnected_failure_log_macos.txt

> ContentType/AgentAPITest.LaunchNestedContainerSessionDisconnected/0 is flaky.
> -
>
> Key: MESOS-7512
> URL: https://issues.apache.org/jira/browse/MESOS-7512
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
> Environment: Mac OS with SSL enabled
>Reporter: Alexander Rukletsov
>  Labels: containerizer, flaky-test, mesosphere
> Attachments: 
> LaunchNestedContainerSessionDisconnected_failure_log_macos.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (MESOS-7512) ContentType/AgentAPITest.LaunchNestedContainerSessionDisconnected/0 is flaky.

2017-05-16 Thread Alexander Rukletsov (JIRA)
Alexander Rukletsov created MESOS-7512:
--

 Summary: 
ContentType/AgentAPITest.LaunchNestedContainerSessionDisconnected/0 is flaky.
 Key: MESOS-7512
 URL: https://issues.apache.org/jira/browse/MESOS-7512
 Project: Mesos
  Issue Type: Bug
  Components: containerization
 Environment: Mac OS with SSL enabled
Reporter: Alexander Rukletsov
 Attachments: 
LaunchNestedContainerSessionDisconnected_failure_log_macos.txt





--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-7273) HealthCheckTest.ROOT_INTERNET_CURL_HealthyTaskViaHTTPSWithContainerImage fails on some Linux machines.

2017-05-16 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16012628#comment-16012628
 ] 

Alexander Rukletsov commented on MESOS-7273:


Observed a failure on Ubuntu 16.04 of a *non-TLS* version of this test. Failure 
log attached.

> HealthCheckTest.ROOT_INTERNET_CURL_HealthyTaskViaHTTPSWithContainerImage 
> fails on some Linux machines.
> --
>
> Key: MESOS-7273
> URL: https://issues.apache.org/jira/browse/MESOS-7273
> Project: Mesos
>  Issue Type: Bug
> Environment: Ubuntu 16.04
>Reporter: Alexander Rukletsov
>Assignee: haosdent
>  Labels: flaky-test, health-check, test, test-fail
> Attachments: 
> ROOT_INTERNET_CURL_HealthyTaskViaHTTPWithContainerImage_failure_ubuntu16.04.txt
>
>
> Log of a bad run: http://pastebin.com/ENa5Sd62
> Brief investigation hints that the task executable failed to start, which is 
> may or may not be related to the environment variable setup:
> {noformat}
> Overwriting environment variable 'PATH', original: 
> '/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin', new: 
> '/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin'
> Failed to execute command: No such file or directory
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-7273) HealthCheckTest.ROOT_INTERNET_CURL_HealthyTaskViaHTTPSWithContainerImage fails on some Linux machines.

2017-05-16 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-7273:
---
Attachment: 
ROOT_INTERNET_CURL_HealthyTaskViaHTTPWithContainerImage_failure_ubuntu16.04.txt

> HealthCheckTest.ROOT_INTERNET_CURL_HealthyTaskViaHTTPSWithContainerImage 
> fails on some Linux machines.
> --
>
> Key: MESOS-7273
> URL: https://issues.apache.org/jira/browse/MESOS-7273
> Project: Mesos
>  Issue Type: Bug
> Environment: Ubuntu 16.04
>Reporter: Alexander Rukletsov
>Assignee: haosdent
>  Labels: flaky-test, health-check, test, test-fail
> Attachments: 
> ROOT_INTERNET_CURL_HealthyTaskViaHTTPWithContainerImage_failure_ubuntu16.04.txt
>
>
> Log of a bad run: http://pastebin.com/ENa5Sd62
> Brief investigation hints that the task executable failed to start, which is 
> may or may not be related to the environment variable setup:
> {noformat}
> Overwriting environment variable 'PATH', original: 
> '/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin', new: 
> '/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin'
> Failed to execute command: No such file or directory
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-7273) HealthCheckTest.ROOT_INTERNET_CURL_HealthyTaskViaHTTPSWithContainerImage fails on some Linux machines.

2017-05-16 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-7273:
---
Environment: Ubuntu 16.04
 Labels: flaky-test health-check test test-fail  (was: health-check 
test test-fail)

> HealthCheckTest.ROOT_INTERNET_CURL_HealthyTaskViaHTTPSWithContainerImage 
> fails on some Linux machines.
> --
>
> Key: MESOS-7273
> URL: https://issues.apache.org/jira/browse/MESOS-7273
> Project: Mesos
>  Issue Type: Bug
> Environment: Ubuntu 16.04
>Reporter: Alexander Rukletsov
>Assignee: haosdent
>  Labels: flaky-test, health-check, test, test-fail
>
> Log of a bad run: http://pastebin.com/ENa5Sd62
> Brief investigation hints that the task executable failed to start, which is 
> may or may not be related to the environment variable setup:
> {noformat}
> Overwriting environment variable 'PATH', original: 
> '/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin', new: 
> '/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin'
> Failed to execute command: No such file or directory
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-7337) DefaultExecutorCheckTest.CommandCheckTimeout becomes flaky under load

2017-05-16 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16012604#comment-16012604
 ] 

Alexander Rukletsov commented on MESOS-7337:


[~anandmazumdar], nope, it's note expected. We shall investigate.

> DefaultExecutorCheckTest.CommandCheckTimeout becomes flaky under load
> -
>
> Key: MESOS-7337
> URL: https://issues.apache.org/jira/browse/MESOS-7337
> Project: Mesos
>  Issue Type: Bug
>  Components: flaky, test
> Environment: Mac OS 10.12.4 (16E195), SSL debug build w/o 
> optimizations, clang version 5.0.0 (http://llvm.org/git/clang 
> c511a96ffe744933459ef64bf963629538057a90) (http://llvm.org/git/llvm 
> 0cd81d8a1055f167e0f588dd1b476863b00da3d5)
>Reporter: Benjamin Bannier
>  Labels: flaky-test, mesosphere
> Attachments: DefaultExecutorCheckTest.CommandCheckTimeout.log
>
>
> The test {{DefaultExecutorCheckTest.CommandCheckTimeout}} randomly fails for 
> me when executing tests in parallel, e.g.,
> {code}
> [ RUN  ] DefaultExecutorCheckTest.CommandCheckTimeout
> ../../src/tests/check_tests.cpp:1374: Failure
> Failed to wait 15secs for updateCheckResultTimeout
> ../../src/tests/check_tests.cpp:1334: Failure
> Actual function call count doesn't match EXPECT_CALL(*scheduler, update(_, 
> _))...
>  Expected: to be called at least 3 times
>Actual: called twice - unsatisfied and active
> [  FAILED  ] DefaultExecutorCheckTest.CommandCheckTimeout (25351 ms)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-7106) Test ContentTypeAndSSLConfig/SchedulerSSLTest.RunTaskAndTeardown/1 segfaults

2017-05-16 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16012599#comment-16012599
 ] 

Alexander Rukletsov commented on MESOS-7106:


Observed this today on Mac OS with SSL enabled:
{noformat}
[ RUN  ] ContentTypeAndSSLConfig/SchedulerSSLTest.RunTaskAndTeardown/3
E0515 16:17:46.129858 4820992 process.cpp:950] Failed to accept socket: future 
discarded
*** Aborted at 1494890266 (unix time) try "date -d @1494890266" if you are 
using GNU date ***
I0515 16:17:46.132652 1918930944 process.cpp:1266] libprocess is initialized on 
10.0.49.4:65201 with 8 worker threads
I0515 16:17:46.134258 1918930944 cluster.cpp:162] Creating default 'local' 
authorizer
I0515 16:17:46.136342 1601536 master.cpp:436] Master 
71e45821-b660-4e0a-af9e-6626c61ec9bd (10.0.49.4) started on 10.0.49.4:65201
I0515 16:17:46.136370 1601536 master.cpp:438] Flags at startup: --acls="" 
--agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
--allocation_interval="1secs" --allocator="HierarchicalDRF" 
--authenticate_agents="true" --authenticate_frameworks="true" 
--authenticate_http_frameworks="true" --authenticate_http_readonly="true" 
--authenticate_http_readwrite="true" --authenticators="crammd5" 
--authorizers="local" 
--credentials="/private/var/folders/6w/rw03zh013y38ys6cyn8qppf8gn/T/2wsu4k/credentials"
 --framework_sorter="drf" --help="false" --hostname_lookup="true" 
--http_authenticators="basic" --http_framework_authenticators="basic" 
--initialize_driver_logging="true" --log_auto_initialize="true" 
--logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" 
--max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
--max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false" 
--recovery_agent_removal_limit="100%" --registry="in_memory" 
--registry_fetch_timeout="1mins" --registry_gc_interval="15mins" 
--registry_max_agent_age="2weeks" --registry_max_agent_count="102400" 
--registry_store_timeout="100secs" --registry_strict="false" 
--root_submissions="true" --user_sorter="drf" --version="false" 
--webui_dir="/usr/local/share/mesos/webui" 
--work_dir="/private/var/folders/6w/rw03zh013y38ys6cyn8qppf8gn/T/2wsu4k/master"
 --zk_session_timeout="10secs"
I0515 16:17:46.136538 1601536 master.cpp:488] Master only allowing 
authenticated frameworks to register
I0515 16:17:46.136549 1601536 master.cpp:502] Master only allowing 
authenticated agents to register
I0515 16:17:46.136560 1601536 master.cpp:515] Master only allowing 
authenticated HTTP frameworks to register
I0515 16:17:46.136580 1601536 credentials.hpp:37] Loading credentials for 
authentication from 
'/private/var/folders/6w/rw03zh013y38ys6cyn8qppf8gn/T/2wsu4k/credentials'
I0515 16:17:46.136785 1601536 master.cpp:560] Using default 'crammd5' 
authenticator
I0515 16:17:46.136847 1601536 http.cpp:975] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-readonly'
I0515 16:17:46.137094 1601536 http.cpp:975] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-readwrite'
I0515 16:17:46.137239 1601536 http.cpp:975] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-scheduler'
I0515 16:17:46.137408 1601536 master.cpp:640] Authorization enabled
I0515 16:17:46.137517 2138112 whitelist_watcher.cpp:77] No whitelist given
I0515 16:17:46.137531 3211264 hierarchical.cpp:158] Initialized hierarchical 
allocator process
I0515 16:17:46.139263 528384 master.cpp:2161] Elected as the leading master!
I0515 16:17:46.139281 528384 master.cpp:1700] Recovering from registrar
I0515 16:17:46.139355 4284416 registrar.cpp:345] Recovering registrar
I0515 16:17:46.145408 1601536 registrar.cpp:389] Successfully fetched the 
registry (0B) in 6.03008ms
I0515 16:17:46.145514 1601536 registrar.cpp:493] Applied 1 operations in 29us; 
attempting to update the registry
I0515 16:17:46.152104 1601536 registrar.cpp:550] Successfully updated the 
registry in 6.557952ms
I0515 16:17:46.152190 1601536 registrar.cpp:422] Successfully recovered 
registrar
I0515 16:17:46.152536 2138112 hierarchical.cpp:185] Skipping recovery of 
hierarchical allocator: nothing to recover
I0515 16:17:46.152530 3211264 master.cpp:1799] Recovered 0 agents from the 
registry (121B); allowing 10mins for agents to re-register
I0515 16:17:46.156253 1918930944 cluster.cpp:448] Creating default 'local' 
authorizer
I0515 16:17:46.157025 1601536 slave.cpp:225] Mesos agent started on 
(771)@10.0.49.4:65201
I0515 16:17:46.157053 1601536 slave.cpp:226] Flags at startup: --acls="" 
--appc_simple_discovery_uri_prefix="http://; 
--appc_store_dir="/var/folders/6w/rw03zh013y38ys6cyn8qppf8gn/T/mesos/store/appc"
 --authenticate_http_executors="true" --authenticate_http_readonly="true" 
--authenticate_http_readwrite="true" --authenticatee="crammd5" 
--authentication_backoff_factor="1secs" --authorizer="local" 

[jira] [Commented] (MESOS-7028) NetSocketTest.EOFBeforeRecv is flaky

2017-05-16 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16012596#comment-16012596
 ] 

Alexander Rukletsov commented on MESOS-7028:


Observed this on a Mac OS:
{noformat}
[ RUN  ] Encryption/NetSocketTest.EOFBeforeRecv/0
I0515 16:10:02.243257 1918930944 openssl.cpp:419] CA file path is unspecified! 
NOTE: Set CA file path with LIBPROCESS_SSL_CA_FILE=
I0515 16:10:02.243273 1918930944 openssl.cpp:424] CA directory path 
unspecified! NOTE: Set CA directory path with LIBPROCESS_SSL_CA_DIR=
I0515 16:10:02.243288 1918930944 openssl.cpp:429] Will not verify peer 
certificate!
NOTE: Set LIBPROCESS_SSL_VERIFY_CERT=1 to enable peer certificate verification
I0515 16:10:02.243374 1918930944 openssl.cpp:435] Will only verify peer 
certificate if presented!
NOTE: Set LIBPROCESS_SSL_REQUIRE_CERT=1 to require peer certificate verification
E0515 16:10:02.243955 4820992 process.cpp:950] Failed to accept socket: future 
discarded
I0515 16:10:02.247562 1918930944 process.cpp:1266] libprocess is initialized on 
10.0.49.4:62888 with 8 worker threads
../../../3rdparty/libprocess/src/tests/socket_tests.cpp:196: Failure
Failed to wait 15secs for client->recv()
[  FAILED  ] Encryption/NetSocketTest.EOFBeforeRecv/0, where GetParam() = "SSL" 
(15159 ms)
{noformat}

FWIW, the test run crashed because of MESOS-7106, not sure whether these two 
issues are related.

> NetSocketTest.EOFBeforeRecv is flaky
> 
>
> Key: MESOS-7028
> URL: https://issues.apache.org/jira/browse/MESOS-7028
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess, test
> Environment: ASF CI, autotools, gcc, CentOS 7, libevent/SSL enabled
> Mac OS with SSL enabled
>Reporter: Greg Mann
>Assignee: Greg Mann
>  Labels: flaky, flaky-test, libprocess, mesosphere, socket, ssl
>
> This was observed on ASF CI:
> {code}
> [ RUN  ] Encryption/NetSocketTest.EOFBeforeRecv/0
> I0128 03:48:51.444228 27745 openssl.cpp:419] CA file path is unspecified! 
> NOTE: Set CA file path with LIBPROCESS_SSL_CA_FILE=
> I0128 03:48:51.444252 27745 openssl.cpp:424] CA directory path unspecified! 
> NOTE: Set CA directory path with LIBPROCESS_SSL_CA_DIR=
> I0128 03:48:51.444257 27745 openssl.cpp:429] Will not verify peer certificate!
> NOTE: Set LIBPROCESS_SSL_VERIFY_CERT=1 to enable peer certificate verification
> I0128 03:48:51.444262 27745 openssl.cpp:435] Will only verify peer 
> certificate if presented!
> NOTE: Set LIBPROCESS_SSL_REQUIRE_CERT=1 to require peer certificate 
> verification
> I0128 03:48:51.447341 27745 process.cpp:1246] libprocess is initialized on 
> 172.17.0.2:45515 with 16 worker threads
> ../../../3rdparty/libprocess/src/tests/socket_tests.cpp:196: Failure
> Failed to wait 15secs for client->recv()
> [  FAILED  ] Encryption/NetSocketTest.EOFBeforeRecv/0, where GetParam() = 
> "SSL" (15269 ms)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-7028) NetSocketTest.EOFBeforeRecv is flaky

2017-05-16 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-7028:
---
Environment: 
ASF CI, autotools, gcc, CentOS 7, libevent/SSL enabled
Mac OS with SSL enabled

  was:ASF CI, autotools, gcc, CentOS 7, libevent/SSL enabled

 Labels: flaky flaky-test libprocess mesosphere socket ssl  (was: flaky 
libprocess socket ssl)

> NetSocketTest.EOFBeforeRecv is flaky
> 
>
> Key: MESOS-7028
> URL: https://issues.apache.org/jira/browse/MESOS-7028
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess, test
> Environment: ASF CI, autotools, gcc, CentOS 7, libevent/SSL enabled
> Mac OS with SSL enabled
>Reporter: Greg Mann
>Assignee: Greg Mann
>  Labels: flaky, flaky-test, libprocess, mesosphere, socket, ssl
>
> This was observed on ASF CI:
> {code}
> [ RUN  ] Encryption/NetSocketTest.EOFBeforeRecv/0
> I0128 03:48:51.444228 27745 openssl.cpp:419] CA file path is unspecified! 
> NOTE: Set CA file path with LIBPROCESS_SSL_CA_FILE=
> I0128 03:48:51.444252 27745 openssl.cpp:424] CA directory path unspecified! 
> NOTE: Set CA directory path with LIBPROCESS_SSL_CA_DIR=
> I0128 03:48:51.444257 27745 openssl.cpp:429] Will not verify peer certificate!
> NOTE: Set LIBPROCESS_SSL_VERIFY_CERT=1 to enable peer certificate verification
> I0128 03:48:51.444262 27745 openssl.cpp:435] Will only verify peer 
> certificate if presented!
> NOTE: Set LIBPROCESS_SSL_REQUIRE_CERT=1 to require peer certificate 
> verification
> I0128 03:48:51.447341 27745 process.cpp:1246] libprocess is initialized on 
> 172.17.0.2:45515 with 16 worker threads
> ../../../3rdparty/libprocess/src/tests/socket_tests.cpp:196: Failure
> Failed to wait 15secs for client->recv()
> [  FAILED  ] Encryption/NetSocketTest.EOFBeforeRecv/0, where GetParam() = 
> "SSL" (15269 ms)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-7508) Support scaling egress bandwidth limit with CPU

2017-05-16 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16012586#comment-16012586
 ] 

Jie Yu commented on MESOS-7508:
---

[~idownes] I am asking because we don't have a test bed for validating the 
patch. I'll take a look at the patch.

> Support scaling egress bandwidth limit with CPU
> ---
>
> Key: MESOS-7508
> URL: https://issues.apache.org/jira/browse/MESOS-7508
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Ian Downes
>Assignee: Ian Downes
>Priority: Minor
>
> The {{network/port_mapping}} isolator supports egress rate limits for 
> containers but only a fixed per-container limit. This can lead to 
> underutilization of network bandwidth for large containers or overcommit of 
> network bandwidth if there's many smaller containers.
> Improve the {{network/port_mapping}} isolator to support egress bandwidth 
> scaling with containers' resources, specifically CPU.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-7511) CniIsolatorTest.ROOT_DynamicAddDelofCniConfig is flaky.

2017-05-16 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-7511:
---
Attachment: ROOT_DynamicAddDelofCniConfig_failure_log_centos6.txt

> CniIsolatorTest.ROOT_DynamicAddDelofCniConfig is flaky.
> ---
>
> Key: MESOS-7511
> URL: https://issues.apache.org/jira/browse/MESOS-7511
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
> Environment: CentOS 6 with SSL
>Reporter: Alexander Rukletsov
>  Labels: containerizer, flaky-test, isolation, mesosphere
> Attachments: ROOT_DynamicAddDelofCniConfig_failure_log_centos6.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (MESOS-7511) CniIsolatorTest.ROOT_DynamicAddDelofCniConfig is flaky.

2017-05-16 Thread Alexander Rukletsov (JIRA)
Alexander Rukletsov created MESOS-7511:
--

 Summary: CniIsolatorTest.ROOT_DynamicAddDelofCniConfig is flaky.
 Key: MESOS-7511
 URL: https://issues.apache.org/jira/browse/MESOS-7511
 Project: Mesos
  Issue Type: Bug
  Components: containerization
 Environment: CentOS 6 with SSL
Reporter: Alexander Rukletsov






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-7434) SlaveTest.RestartSlaveRequireExecutorAuthentication is flaky.

2017-05-16 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-7434:
---
Attachment: 
RestartSlaveRequireExecutorAuthentication_failure_log_debian8.txt

> SlaveTest.RestartSlaveRequireExecutorAuthentication is flaky.
> -
>
> Key: MESOS-7434
> URL: https://issues.apache.org/jira/browse/MESOS-7434
> Project: Mesos
>  Issue Type: Bug
> Environment: Debian 8
> CentOS 6
> other Linux distros
>Reporter: Greg Mann
>Assignee: Greg Mann
>  Labels: flaky, flaky-test, mesosphere
> Attachments: 
> RestartSlaveRequireExecutorAuthentication_failure_log_debian8.txt, 
> RestartSlaveRequireExecutorAuthentication is flaky_failure_log_centos6.txt
>
>
> This test failure has been observed on an internal CI system. It occurs on a 
> variety of Linux distributions. It seems that using {{cat}} as the task 
> command may be problematic; see attached log file 
> {{SlaveTest.RestartSlaveRequireExecutorAuthentication.txt}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-7434) SlaveTest.RestartSlaveRequireExecutorAuthentication is flaky.

2017-05-16 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-7434:
---
Attachment: RestartSlaveRequireExecutorAuthentication is 
flaky_failure_log_centos6.txt

> SlaveTest.RestartSlaveRequireExecutorAuthentication is flaky.
> -
>
> Key: MESOS-7434
> URL: https://issues.apache.org/jira/browse/MESOS-7434
> Project: Mesos
>  Issue Type: Bug
> Environment: Debian 8
> CentOS 6
> other Linux distros
>Reporter: Greg Mann
>Assignee: Greg Mann
>  Labels: flaky, flaky-test, mesosphere
> Attachments: RestartSlaveRequireExecutorAuthentication is 
> flaky_failure_log_centos6.txt
>
>
> This test failure has been observed on an internal CI system. It occurs on a 
> variety of Linux distributions. It seems that using {{cat}} as the task 
> command may be problematic; see attached log file 
> {{SlaveTest.RestartSlaveRequireExecutorAuthentication.txt}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-7434) SlaveTest.RestartSlaveRequireExecutorAuthentication is flaky.

2017-05-16 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-7434:
---
Environment: 
Debian 8
CentOS 6
other Linux distros
Summary: SlaveTest.RestartSlaveRequireExecutorAuthentication is flaky.  
(was: SlaveTest.RestartSlaveRequireExecutorAuthentication is flaky)

> SlaveTest.RestartSlaveRequireExecutorAuthentication is flaky.
> -
>
> Key: MESOS-7434
> URL: https://issues.apache.org/jira/browse/MESOS-7434
> Project: Mesos
>  Issue Type: Bug
> Environment: Debian 8
> CentOS 6
> other Linux distros
>Reporter: Greg Mann
>Assignee: Greg Mann
>  Labels: flaky, flaky-test, mesosphere
>
> This test failure has been observed on an internal CI system. It occurs on a 
> variety of Linux distributions. It seems that using {{cat}} as the task 
> command may be problematic; see attached log file 
> {{SlaveTest.RestartSlaveRequireExecutorAuthentication.txt}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-7082) ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.KillTask/0 is flaky

2017-05-16 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-7082:
---
Environment: 
ubuntu 16.04 with/without SSL
Fedora 23

  was:ubuntu 16.04 with/without SSL


> ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.KillTask/0 is 
> flaky
> 
>
> Key: MESOS-7082
> URL: https://issues.apache.org/jira/browse/MESOS-7082
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.2.0
> Environment: ubuntu 16.04 with/without SSL
> Fedora 23
>Reporter: Anand Mazumdar
>Priority: Critical
>  Labels: flaky, flaky-test, mesosphere
>
> Showed up on our internal CI
> {noformat}
> 07:00:17 [ RUN  ] 
> ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.KillTask/0
> 07:00:17 I0207 07:00:17.775459  2952 cluster.cpp:160] Creating default 
> 'local' authorizer
> 07:00:17 I0207 07:00:17.776511  2970 master.cpp:383] Master 
> fa1554c4-572a-4b89-8994-a89460f588d3 (ip-10-153-254-29.ec2.internal) started 
> on 10.153.254.29:38570
> 07:00:17 I0207 07:00:17.776538  2970 master.cpp:385] Flags at startup: 
> --acls="" --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate_agents="true" --authenticate_frameworks="true" 
> --authenticate_http_frameworks="true" --authenticate_http_readonly="true" 
> --authenticate_http_readwrite="true" --authenticators="crammd5" 
> --authorizers="local" --credentials="/tmp/ZROfJk/credentials" 
> --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --http_authenticators="basic" --http_framework_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" 
> --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
> --max_unreachable_tasks_per_framework="1000" --quiet="false" 
> --recovery_agent_removal_limit="100%" --registry="in_memory" 
> --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" 
> --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" 
> --registry_store_timeout="100secs" --registry_strict="false" 
> --root_submissions="true" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/ZROfJk/master" 
> --zk_session_timeout="10secs"
> 07:00:17 I0207 07:00:17.776674  2970 master.cpp:435] Master only allowing 
> authenticated frameworks to register
> 07:00:17 I0207 07:00:17.776687  2970 master.cpp:449] Master only allowing 
> authenticated agents to register
> 07:00:17 I0207 07:00:17.776695  2970 master.cpp:462] Master only allowing 
> authenticated HTTP frameworks to register
> 07:00:17 I0207 07:00:17.776703  2970 credentials.hpp:37] Loading credentials 
> for authentication from '/tmp/ZROfJk/credentials'
> 07:00:17 I0207 07:00:17.776779  2970 master.cpp:507] Using default 'crammd5' 
> authenticator
> 07:00:17 I0207 07:00:17.776841  2970 http.cpp:919] Using default 'basic' HTTP 
> authenticator for realm 'mesos-master-readonly'
> 07:00:17 I0207 07:00:17.776919  2970 http.cpp:919] Using default 'basic' HTTP 
> authenticator for realm 'mesos-master-readwrite'
> 07:00:17 I0207 07:00:17.776970  2970 http.cpp:919] Using default 'basic' HTTP 
> authenticator for realm 'mesos-master-scheduler'
> 07:00:17 I0207 07:00:17.777009  2970 master.cpp:587] Authorization enabled
> 07:00:17 I0207 07:00:17.777122  2975 hierarchical.cpp:161] Initialized 
> hierarchical allocator process
> 07:00:17 I0207 07:00:17.777138  2974 whitelist_watcher.cpp:77] No whitelist 
> given
> 07:00:17 I0207 07:00:17.04  2976 master.cpp:2123] Elected as the leading 
> master!
> 07:00:17 I0207 07:00:17.26  2976 master.cpp:1645] Recovering from 
> registrar
> 07:00:17 I0207 07:00:17.84  2975 registrar.cpp:329] Recovering registrar
> 07:00:17 I0207 07:00:17.777989  2973 registrar.cpp:362] Successfully fetched 
> the registry (0B) in 176384ns
> 07:00:17 I0207 07:00:17.778023  2973 registrar.cpp:461] Applied 1 operations 
> in 7573ns; attempting to update the registry
> 07:00:17 I0207 07:00:17.778249  2976 registrar.cpp:506] Successfully updated 
> the registry in 210944ns
> 07:00:17 I0207 07:00:17.778290  2976 registrar.cpp:392] Successfully 
> recovered registrar
> 07:00:17 I0207 07:00:17.778373  2976 master.cpp:1761] Recovered 0 agents from 
> the registry (172B); allowing 10mins for agents to re-register
> 07:00:17 I0207 07:00:17.778394  2974 hierarchical.cpp:188] Skipping recovery 
> of hierarchical allocator: nothing to recover
> 07:00:17 I0207 07:00:17.869381  2952 containerizer.cpp:220] Using isolation: 
> posix/cpu,posix/mem,filesystem/posix,network/cni
> 07:00:17 I0207 07:00:17.872557  

[jira] [Updated] (MESOS-7510) DockerContainerizerTest.ROOT_DOCKER_LaunchWithPersistentVolumes fails on CentOS 6.

2017-05-16 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-7510:
---
Attachment: ROOT_DOCKER_LaunchWithPersistentVolumes_failure_log_centos6.txt

> DockerContainerizerTest.ROOT_DOCKER_LaunchWithPersistentVolumes fails on 
> CentOS 6.
> --
>
> Key: MESOS-7510
> URL: https://issues.apache.org/jira/browse/MESOS-7510
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
> Environment: CentOS 6
>Reporter: Alexander Rukletsov
>  Labels: containerizer, flaky-test, mesosphere, persistent-volumes
> Attachments: 
> ROOT_DOCKER_LaunchWithPersistentVolumes_failure_log_centos6.txt
>
>
> I see this test failing consistently on CentOS 6 with or without SSL enabled.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (MESOS-7510) DockerContainerizerTest.ROOT_DOCKER_LaunchWithPersistentVolumes fails on CentOS 6.

2017-05-16 Thread Alexander Rukletsov (JIRA)
Alexander Rukletsov created MESOS-7510:
--

 Summary: 
DockerContainerizerTest.ROOT_DOCKER_LaunchWithPersistentVolumes fails on CentOS 
6.
 Key: MESOS-7510
 URL: https://issues.apache.org/jira/browse/MESOS-7510
 Project: Mesos
  Issue Type: Bug
  Components: containerization
 Environment: CentOS 6
Reporter: Alexander Rukletsov


I see this test failing consistently on CentOS 6 with or without SSL enabled.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-6792) MasterSlaveReconciliationTest.ReconcileLostTask test is flaky

2017-05-16 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6792:
---
Labels: flaky-test mesosphere  (was: mesosphere)

> MasterSlaveReconciliationTest.ReconcileLostTask test is flaky
> -
>
> Key: MESOS-6792
> URL: https://issues.apache.org/jira/browse/MESOS-6792
> Project: Mesos
>  Issue Type: Bug
>  Components: technical debt, test
> Environment: Fedora 25, clang, w/ optimizations, SSL build
>Reporter: Benjamin Bannier
>  Labels: flaky-test, mesosphere
>
> The test {{MasterSlaveReconciliationTest.ReconcileLostTask}} is flaky for me 
> as of {{e99ea9ce8b1de01dd8b3cac6675337edb6320f38}},
> {code}
> Repeating all tests (iteration 912) . . .
> Note: Google Test filter = 
> 

[jira] [Updated] (MESOS-7509) CniIsolatorPortMapperTest.ROOT_INTERNET_CURL_PortMapper fails on some Linux distros.

2017-05-16 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-7509:
---
Attachment: ROOT_INTERNET_CURL_PortMapper_failure_log_2_centos6.txt

> CniIsolatorPortMapperTest.ROOT_INTERNET_CURL_PortMapper fails on some Linux 
> distros.
> 
>
> Key: MESOS-7509
> URL: https://issues.apache.org/jira/browse/MESOS-7509
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
> Environment: CentOS 6, Ubuntu 12.04
>Reporter: Alexander Rukletsov
>  Labels: containerizer, flaky-test, isolation, mesosphere
> Attachments: ROOT_INTERNET_CURL_PortMapper_failure_log_1.txt, 
> ROOT_INTERNET_CURL_PortMapper_failure_log_2_centos6.txt
>
>
> I see this test failing consistently on CentOS 6 and Ubuntu 12.04.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-7509) CniIsolatorPortMapperTest.ROOT_INTERNET_CURL_PortMapper fails on some Linux distros.

2017-05-16 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-7509:
---
Attachment: ROOT_INTERNET_CURL_PortMapper_failure_log_1.txt

> CniIsolatorPortMapperTest.ROOT_INTERNET_CURL_PortMapper fails on some Linux 
> distros.
> 
>
> Key: MESOS-7509
> URL: https://issues.apache.org/jira/browse/MESOS-7509
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
> Environment: CentOS 6, Ubuntu 12.04
>Reporter: Alexander Rukletsov
>  Labels: containerizer, flaky-test, isolation, mesosphere
> Attachments: ROOT_INTERNET_CURL_PortMapper_failure_log_1.txt
>
>
> I see this test failing consistently on CentOS 6 and Ubuntu 12.04.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-7509) CniIsolatorPortMapperTest.ROOT_INTERNET_CURL_PortMapper fails on some Linux distros.

2017-05-16 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-7509:
---
Summary: CniIsolatorPortMapperTest.ROOT_INTERNET_CURL_PortMapper fails on 
some Linux distros.  (was: 
CniIsolatorPortMapperTest.ROOT_INTERNET_CURL_PortMapper fails on some Linux 
distros)

> CniIsolatorPortMapperTest.ROOT_INTERNET_CURL_PortMapper fails on some Linux 
> distros.
> 
>
> Key: MESOS-7509
> URL: https://issues.apache.org/jira/browse/MESOS-7509
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
> Environment: CentOS 6, Ubuntu 12.04
>Reporter: Alexander Rukletsov
>  Labels: containerizer, flaky-test, isolation, mesosphere
>
> I see this test failing consistently on CentOS 6 and Ubuntu 12.04.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (MESOS-7509) CniIsolatorPortMapperTest.ROOT_INTERNET_CURL_PortMapper fails on some Linux distros

2017-05-16 Thread Alexander Rukletsov (JIRA)
Alexander Rukletsov created MESOS-7509:
--

 Summary: CniIsolatorPortMapperTest.ROOT_INTERNET_CURL_PortMapper 
fails on some Linux distros
 Key: MESOS-7509
 URL: https://issues.apache.org/jira/browse/MESOS-7509
 Project: Mesos
  Issue Type: Bug
  Components: containerization
 Environment: CentOS 6, Ubuntu 12.04
Reporter: Alexander Rukletsov


I see this test failing consistently on CentOS 6 and Ubuntu 12.04.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-7494) Issues if trying to install python2.7 after installing mesos

2017-05-16 Thread Mihai Turcu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16011837#comment-16011837
 ] 

Mihai Turcu commented on MESOS-7494:


Any news? :)

> Issues if trying to install python2.7 after installing mesos
> 
>
> Key: MESOS-7494
> URL: https://issues.apache.org/jira/browse/MESOS-7494
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.2.0
>Reporter: Mihai Turcu
>Priority: Minor
>
> root@prod-mesos-slave-02:~# apt-get install python2.7-minimal
> Reading package lists... Done
> Building dependency tree   
> Reading state information... Done
> The following packages were automatically installed and are no longer 
> required:
>   libpython-stdlib linux-headers-4.4.0-71 linux-headers-4.4.0-71-generic 
> linux-image-4.4.0-71-generic linux-image-extra-4.4.0-71-generic
> Use 'apt autoremove' to remove them.
> Suggested packages:
>   binfmt-support
> The following NEW packages will be installed:
>   python2.7-minimal
> 0 upgraded, 1 newly installed, 0 to remove and 36 not upgraded.
> 5 not fully installed or removed.
> Need to get 0 B/1,295 kB of archives.
> After this operation, 3,670 kB of additional disk space will be used.
> (Reading database ... 161088 files and directories currently installed.)
> Preparing to unpack .../python2.7-minimal_2.7.12-1ubuntu0~16.04.1_amd64.deb 
> ...
> focus on below lines
> 
> new installation of python2.7-minimal; /usr/lib/python2.7/site-packages is a 
> directory
> which is expected a symlink to /usr/local/lib/python2.7/dist-packages.
> please find the package shipping files in /usr/lib/python2.7/site-packages and
> file a bug report to ship these in /usr/lib/python2.7/dist-packages instead
> aborting installation of python2.7-minimal
> --
> dpkg: error processing archive 
> /var/cache/apt/archives/python2.7-minimal_2.7.12-1ubuntu0~16.04.1_amd64.deb 
> (--unpack):
>  subprocess new pre-installation script returned error exit status 1
> Errors were encountered while processing:
>  /var/cache/apt/archives/python2.7-minimal_2.7.12-1ubuntu0~16.04.1_amd64.deb
> E: Sub-process /usr/bin/dpkg returned an error code (1)
> I've encountered this issue on a clean install of Ubuntu 16.04.2 LTS. Install 
> vm, apt-get update && apt-get upgrade, install docker engine, install mesos 
> then try to install python-minimal or python2.7.
> The dpkg should simply write everything to the dist-packages folder.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)