[jira] [Comment Edited] (MESOS-9518) CNI_NETNS should not be set for orphan containers that do not have network namespace

2019-01-10 Thread Jie Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16739761#comment-16739761
 ] 

Jie Yu edited comment on MESOS-9518 at 1/11/19 6:19 AM:


https://reviews.apache.org/r/69706/
https://reviews.apache.org/r/69710/
https://reviews.apache.org/r/69711/
https://reviews.apache.org/r/69712/
https://reviews.apache.org/r/69713/
https://reviews.apache.org/r/69714/
https://reviews.apache.org/r/69715/


was (Author: jieyu):
Fix: https://reviews.apache.org/r/69706/

Adding tests now.

> CNI_NETNS should not be set for orphan containers that do not have network 
> namespace
> 
>
> Key: MESOS-9518
> URL: https://issues.apache.org/jira/browse/MESOS-9518
> Project: Mesos
>  Issue Type: Bug
>  Components: cni
>Affects Versions: 1.4.2, 1.5.1, 1.6.1, 1.7.0
>Reporter: Jie Yu
>Assignee: Jie Yu
>Priority: Major
>
> We introduced a new agent flag in MESOS-9492 so that CNI configs can be 
> persisted across reboot. This is for some CNI plugins to be able to cleanup 
> IP allocated to the containers after a sudden reboot of the host (not all CNI 
> plugins need this).
> It's important to unset `CNI_NETNS` environment variable after reboot when 
> invoking CNI plugin "DEL" command so that it conforms to the spec:
> {noformat}
> When CNI_NETNS and/or prevResult are not provided, the plugin should clean up 
> as many resources as possible (e.g. releasing IPAM allocations) and return a 
> successful response.
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MESOS-9519) Unable to build Mesos with CMake on Ubuntu 14.04.

2019-01-10 Thread Chun-Hung Hsiao (JIRA)
Chun-Hung Hsiao created MESOS-9519:
--

 Summary: Unable to build Mesos with CMake on Ubuntu 14.04.
 Key: MESOS-9519
 URL: https://issues.apache.org/jira/browse/MESOS-9519
 Project: Mesos
  Issue Type: Bug
  Components: build
Affects Versions: 1.7.0
Reporter: Chun-Hung Hsiao
Assignee: Chun-Hung Hsiao


Running the following command to build Mesos on Ubuntu 14.04 will lead to the 
error shown below:
{noformat}
OS=ubuntu:14.04 BUILDTOOL=cmake COMPILER=gcc CONFIGURATION='--verbose 
--enable-libevent --enable-ssl' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1' JOBS=48 
nice support/docker-build.sh{noformat}
{noformat}
/mesos/build/3rdparty/grpc-1.10.0/src/grpc-1.10.0/src/core/tsi/ssl_transport_security.cc:
In function 'tsi_result ssl_handshaker_extract_peer(tsi_handshaker*,
tsi_peer*)':
/mesos/build/3rdparty/grpc-1.10.0/src/grpc-1.10.0/src/core/tsi/ssl_transport_security.cc:1011:71:
error: 'SSL_get0_alpn_selected' was not declared in this scope
   SSL_get0_alpn_selected(impl->ssl, &alpn_selected, &alpn_selected_len);
                                                                       ^
/mesos/build/3rdparty/grpc-1.10.0/src/grpc-1.10.0/src/core/tsi/ssl_transport_security.cc:
In function 'tsi_result tsi_create_ssl_client_handshaker_factory(const
tsi_ssl_pem_key_cert_pair*, const char*, const char*, const char**,
uint16_t, tsi_ssl_client_handshaker_factory**)':
/mesos/build/3rdparty/grpc-1.10.0/src/grpc-1.10.0/src/core/tsi/ssl_transport_security.cc:1417:73:
error: 'SSL_CTX_set_alpn_protos' was not declared in this scope
               static_cast(impl->alpn_protocol_list_length))) {
                                                                         ^
/mesos/build/3rdparty/grpc-1.10.0/src/grpc-1.10.0/src/core/tsi/ssl_transport_security.cc:
In function 'tsi_result
tsi_create_ssl_server_handshaker_factory_ex(const
tsi_ssl_pem_key_cert_pair*, size_t, const char*,
tsi_client_certificate_request_type, const char*, const char**,
uint16_t, tsi_ssl_server_handshaker_factory**)':
/mesos/build/3rdparty/grpc-1.10.0/src/grpc-1.10.0/src/core/tsi/ssl_transport_security.cc:1557:79:
error: 'SSL_CTX_set_alpn_select_cb' was not declared in this scope

server_handshaker_factory_alpn_callback, impl);
                                                                               ^
make[7]: *** [CMakeFiles/grpc.dir/src/core/tsi/ssl_transport_security.cc.o]
Error 1
make[7]: Leaving directory
`/mesos/build/3rdparty/grpc-1.10.0/src/grpc-1.10.0-build'
make[6]: *** [CMakeFiles/grpc.dir/all] Error 2
make[6]: Leaving directory
`/mesos/build/3rdparty/grpc-1.10.0/src/grpc-1.10.0-build'
make[5]: *** [CMakeFiles/grpc.dir/rule] Error 2
make[5]: Leaving directory
`/mesos/build/3rdparty/grpc-1.10.0/src/grpc-1.10.0-build'
make[4]: *** [grpc] Error 2
make[4]: Leaving directory
`/mesos/build/3rdparty/grpc-1.10.0/src/grpc-1.10.0-build'
make[3]: *** [3rdparty/grpc-1.10.0/src/grpc-1.10.0-stamp/grpc-1.10.0-build]
Error 2
make[3]: Leaving directory `/mesos/build'
make[2]: *** [3rdparty/CMakeFiles/grpc-1.10.0.dir/all] Error 2
make[2]: *** Waiting for unfinished jobs{noformat}
The reason is that gRPC's CMake rules does not disable ALPN on systems with 
OpenSSL 1.0.1.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (MESOS-9509) Benchmark command health checks in default executor

2019-01-10 Thread Greg Mann (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-9509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann reassigned MESOS-9509:


Shepherd: Vinod Kone
Assignee: Greg Mann
  Sprint: Mesos Foundations R9 Sprint 37
Story Points: 5
  Labels: default-executor foundations mesosphere perfomance  (was: 
default-executor foundations perfomance)

> Benchmark command health checks in default executor
> ---
>
> Key: MESOS-9509
> URL: https://issues.apache.org/jira/browse/MESOS-9509
> Project: Mesos
>  Issue Type: Task
>  Components: executor
>Reporter: Vinod Kone
>Assignee: Greg Mann
>Priority: Major
>  Labels: default-executor, foundations, mesosphere, perfomance
>
> TCP/HTTP health checks were extensively scale tested as part of 
> https://mesosphere.com/blog/introducing-mesos-native-health-checks-apache-mesos-part-2/.
>  
> We should do the same for command checks by default executor because it uses 
> a very different mechanism (agent fork/execs the check command as a nested 
> container) and will have very different scalability characteristics.
> We should also use these benchmarks as an opportunity to produce perf traces 
> of the Mesos agent (both with and without process inheritance) so that a 
> thorough analysis of the performance can be done as part of MESOS-9513.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-9518) CNI_NETNS should not be set for orphan containers that do not have network namespace

2019-01-10 Thread Jie Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16739761#comment-16739761
 ] 

Jie Yu commented on MESOS-9518:
---

Fix: https://reviews.apache.org/r/69706/

Adding tests now.

> CNI_NETNS should not be set for orphan containers that do not have network 
> namespace
> 
>
> Key: MESOS-9518
> URL: https://issues.apache.org/jira/browse/MESOS-9518
> Project: Mesos
>  Issue Type: Bug
>  Components: cni
>Affects Versions: 1.4.2, 1.5.1, 1.6.1, 1.7.0
>Reporter: Jie Yu
>Assignee: Jie Yu
>Priority: Major
>
> We introduced a new agent flag in MESOS-9492 so that CNI configs can be 
> persisted across reboot. This is for some CNI plugins to be able to cleanup 
> IP allocated to the containers after a sudden reboot of the host (not all CNI 
> plugins need this).
> It's important to unset `CNI_NETNS` environment variable after reboot when 
> invoking CNI plugin "DEL" command so that it conforms to the spec:
> {noformat}
> When CNI_NETNS and/or prevResult are not provided, the plugin should clean up 
> as many resources as possible (e.g. releasing IPAM allocations) and return a 
> successful response.
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-9224) De-duplicate read-only requests to master based on principal.

2019-01-10 Thread Greg Mann (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-9224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16739688#comment-16739688
 ] 

Greg Mann commented on MESOS-9224:
--

Review chain ends here: https://reviews.apache.org/r/69064/

> De-duplicate read-only requests to master based on principal.
> -
>
> Key: MESOS-9224
> URL: https://issues.apache.org/jira/browse/MESOS-9224
> Project: Mesos
>  Issue Type: Improvement
>  Components: HTTP API
>Reporter: Alexander Rukletsov
>Assignee: Benno Evers
>Priority: Major
>  Labels: performance
>
> "Identical" read-only requests can be batched and answered together. With 
> batching available (MESOS-9158), we can now deduplicate requests based on 
> principal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (MESOS-7971) PersistentVolumeEndpointsTest.EndpointCreateThenOfferRemove test is flaky

2019-01-10 Thread Andrei Budnik (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-7971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16739635#comment-16739635
 ] 

Andrei Budnik edited comment on MESOS-7971 at 1/10/19 5:40 PM:
---

This is something different from previous ones.
{code:java}
E0110 17:13:09.326659 13916 master.cpp:8586] Failed to find the operation '' 
(uuid: 825f65eb-3ba1-4dfa-bdfa-8eb29194ace3) for an operator API call on agent 
ae22a9c8-0ef6-4f1e-b1eb-7b55f6e4508b-S0
{code}
Full log:
{code:java}
[ RUN ] PersistentVolumeEndpointsTest.EndpointCreateThenOfferRemove
I0110 17:12:59.303460 13893 cluster.cpp:174] Creating default 'local' authorizer
I0110 17:12:59.304430 13912 master.cpp:416] Master 
ae22a9c8-0ef6-4f1e-b1eb-7b55f6e4508b (ip-172-16-10-92.ec2.internal) started on 
172.16.10.92:42320
I0110 17:12:59.304451 13912 master.cpp:419] Flags at startup: --acls="" 
--agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
--allocation_interval="1000secs" --allocator="hierarchical" 
--authenticate_agents="true" --authenticate_frameworks="true" 
--authenticate_http_frameworks="true" --authenticate_http_readonly="true" 
--authenticate_http_readwrite="true" --authentication_v0_timeout="15secs" 
--authenticators="crammd5" --authorizers="local" 
--credentials="/tmp/PfFTwT/credentials" --filter_gpu_resources="true" 
--framework_sorter="drf" --help="false" --hostname_lookup="true" 
--http_authenticators="basic" --http_framework_authenticators="basic" 
--initialize_driver_logging="true" --log_auto_initialize="true" 
--logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" 
--max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
--max_operator_event_stream_subscribers="1000" 
--max_unreachable_tasks_per_framework="1000" --memory_profiling="false" 
--min_allocatable_resources="cpus:0.01|mem:32" --port="5050" 
--publish_per_framework_metrics="true" --quiet="false" 
--recovery_agent_removal_limit="100%" --registry="in_memory" 
--registry_fetch_timeout="1mins" --registry_gc_interval="15mins" 
--registry_max_agent_age="2weeks" --registry_max_agent_count="102400" 
--registry_store_timeout="100secs" --registry_strict="false" 
--require_agent_domain="false" --role_sorter="drf" --roles="role1" 
--root_submissions="true" --version="false" 
--webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/PfFTwT/master" 
--zk_session_timeout="10secs"
I0110 17:12:59.304585 13912 master.cpp:468] Master only allowing authenticated 
frameworks to register
I0110 17:12:59.304595 13912 master.cpp:474] Master only allowing authenticated 
agents to register
I0110 17:12:59.304603 13912 master.cpp:480] Master only allowing authenticated 
HTTP frameworks to register
I0110 17:12:59.304615 13912 credentials.hpp:37] Loading credentials for 
authentication from '/tmp/PfFTwT/credentials'
I0110 17:12:59.304684 13912 master.cpp:524] Using default 'crammd5' 
authenticator
I0110 17:12:59.304744 13912 http.cpp:965] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-readonly'
I0110 17:12:59.304831 13912 http.cpp:965] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-readwrite'
I0110 17:12:59.304889 13912 http.cpp:965] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-scheduler'
I0110 17:12:59.304941 13912 master.cpp:605] Authorization enabled
W0110 17:12:59.304967 13912 master.cpp:668] The '--roles' flag is deprecated. 
This flag will be removed in the future. See the Mesos 0.27 upgrade notes for 
more information
I0110 17:12:59.305047 13919 hierarchical.cpp:176] Initialized hierarchical 
allocator process
I0110 17:12:59.305128 13918 whitelist_watcher.cpp:77] No whitelist given
I0110 17:12:59.305600 13914 master.cpp:2085] Elected as the leading master!
I0110 17:12:59.305622 13914 master.cpp:1640] Recovering from registrar
I0110 17:12:59.305698 13913 registrar.cpp:339] Recovering registrar
I0110 17:12:59.305853 13912 registrar.cpp:383] Successfully fetched the 
registry (0B) in 118016ns
I0110 17:12:59.305899 13912 registrar.cpp:487] Applied 1 operations in 8238ns; 
attempting to update the registry
I0110 17:12:59.306036 13912 registrar.cpp:544] Successfully updated the 
registry in 112128ns
I0110 17:12:59.306092 13912 registrar.cpp:416] Successfully recovered registrar
I0110 17:12:59.306217 13916 master.cpp:1754] Recovered 0 agents from the 
registry (172B); allowing 10mins for agents to reregister
I0110 17:12:59.306258 13919 hierarchical.cpp:216] Skipping recovery of 
hierarchical allocator: nothing to recover
W0110 17:12:59.307780 13893 process.cpp:2829] Attempted to spawn already 
running process files@172.16.10.92:42320
I0110 17:12:59.308149 13893 containerizer.cpp:305] Using isolation { 
environment_secret, posix/cpu, posix/mem, filesystem/posix, network/cni }
I0110 17:12:59.310348 13893 linux_launcher.cpp:144] Using 
/sys/fs/cgroup/freezer as the freezer hierarchy for t

[jira] [Commented] (MESOS-7971) PersistentVolumeEndpointsTest.EndpointCreateThenOfferRemove test is flaky

2019-01-10 Thread Andrei Budnik (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-7971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16739635#comment-16739635
 ] 

Andrei Budnik commented on MESOS-7971:
--

This is something different from previous ones.
{code:java}
E0110 17:13:09.326659 13916 master.cpp:8586] Failed to find the operation '' 
(uuid: 825f65eb-3ba1-4dfa-bdfa-8eb29194ace3) for an operator API call on agent 
ae22a9c8-0ef6-4f1e-b1eb-7b55f6e4508b-S0
{code}
Full log:
{code:java}
[ RUN ] PersistentVolumeEndpointsTest.EndpointCreateThenOfferRemove
I0110 17:12:59.303460 13893 cluster.cpp:174] Creating default 'local' authorizer
I0110 17:12:59.304430 13912 master.cpp:416] Master 
ae22a9c8-0ef6-4f1e-b1eb-7b55f6e4508b (ip-172-16-10-92.ec2.internal) started on 
172.16.10.92:42320
I0110 17:12:59.304451 13912 master.cpp:419] Flags at startup: --acls="" 
--agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
--allocation_interval="1000secs" --allocator="hierarchical" 
--authenticate_agents="true" --authenticate_frameworks="true" 
--authenticate_http_frameworks="true" --authenticate_http_readonly="true" 
--authenticate_http_readwrite="true" --authentication_v0_timeout="15secs" 
--authenticators="crammd5" --authorizers="local" 
--credentials="/tmp/PfFTwT/credentials" --filter_gpu_resources="true" 
--framework_sorter="drf" --help="false" --hostname_lookup="true" 
--http_authenticators="basic" --http_framework_authenticators="basic" 
--initialize_driver_logging="true" --log_auto_initialize="true" 
--logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" 
--max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
--max_operator_event_stream_subscribers="1000" 
--max_unreachable_tasks_per_framework="1000" --memory_profiling="false" 
--min_allocatable_resources="cpus:0.01|mem:32" --port="5050" 
--publish_per_framework_metrics="true" --quiet="false" 
--recovery_agent_removal_limit="100%" --registry="in_memory" 
--registry_fetch_timeout="1mins" --registry_gc_interval="15mins" 
--registry_max_agent_age="2weeks" --registry_max_agent_count="102400" 
--registry_store_timeout="100secs" --registry_strict="false" 
--require_agent_domain="false" --role_sorter="drf" --roles="role1" 
--root_submissions="true" --version="false" 
--webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/PfFTwT/master" 
--zk_session_timeout="10secs"
I0110 17:12:59.304585 13912 master.cpp:468] Master only allowing authenticated 
frameworks to register
I0110 17:12:59.304595 13912 master.cpp:474] Master only allowing authenticated 
agents to register
I0110 17:12:59.304603 13912 master.cpp:480] Master only allowing authenticated 
HTTP frameworks to register
I0110 17:12:59.304615 13912 credentials.hpp:37] Loading credentials for 
authentication from '/tmp/PfFTwT/credentials'
I0110 17:12:59.304684 13912 master.cpp:524] Using default 'crammd5' 
authenticator
I0110 17:12:59.304744 13912 http.cpp:965] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-readonly'
I0110 17:12:59.304831 13912 http.cpp:965] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-readwrite'
I0110 17:12:59.304889 13912 http.cpp:965] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-scheduler'
I0110 17:12:59.304941 13912 master.cpp:605] Authorization enabled
W0110 17:12:59.304967 13912 master.cpp:668] The '--roles' flag is deprecated. 
This flag will be removed in the future. See the Mesos 0.27 upgrade notes for 
more information
I0110 17:12:59.305047 13919 hierarchical.cpp:176] Initialized hierarchical 
allocator process
I0110 17:12:59.305128 13918 whitelist_watcher.cpp:77] No whitelist given
I0110 17:12:59.305600 13914 master.cpp:2085] Elected as the leading master!
I0110 17:12:59.305622 13914 master.cpp:1640] Recovering from registrar
I0110 17:12:59.305698 13913 registrar.cpp:339] Recovering registrar
I0110 17:12:59.305853 13912 registrar.cpp:383] Successfully fetched the 
registry (0B) in 118016ns
I0110 17:12:59.305899 13912 registrar.cpp:487] Applied 1 operations in 8238ns; 
attempting to update the registry
I0110 17:12:59.306036 13912 registrar.cpp:544] Successfully updated the 
registry in 112128ns
I0110 17:12:59.306092 13912 registrar.cpp:416] Successfully recovered registrar
I0110 17:12:59.306217 13916 master.cpp:1754] Recovered 0 agents from the 
registry (172B); allowing 10mins for agents to reregister
I0110 17:12:59.306258 13919 hierarchical.cpp:216] Skipping recovery of 
hierarchical allocator: nothing to recover
W0110 17:12:59.307780 13893 process.cpp:2829] Attempted to spawn already 
running process files@172.16.10.92:42320
I0110 17:12:59.308149 13893 containerizer.cpp:305] Using isolation { 
environment_secret, posix/cpu, posix/mem, filesystem/posix, network/cni }
I0110 17:12:59.310348 13893 linux_launcher.cpp:144] Using 
/sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher
I0110 17:12:59.310752 13893 pro

[jira] [Assigned] (MESOS-9518) CNI_NETNS should not be set for orphan containers that do not have network namespace

2019-01-10 Thread Jie Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu reassigned MESOS-9518:
-

Assignee: Jie Yu

> CNI_NETNS should not be set for orphan containers that do not have network 
> namespace
> 
>
> Key: MESOS-9518
> URL: https://issues.apache.org/jira/browse/MESOS-9518
> Project: Mesos
>  Issue Type: Bug
>  Components: cni
>Affects Versions: 1.4.2, 1.5.1, 1.6.1, 1.7.0
>Reporter: Jie Yu
>Assignee: Jie Yu
>Priority: Major
>
> We introduced a new agent flag in MESOS-9492 so that CNI configs can be 
> persisted across reboot. This is for some CNI plugins to be able to cleanup 
> IP allocated to the containers after a sudden reboot of the host (not all CNI 
> plugins need this).
> It's important to unset `CNI_NETNS` environment variable after reboot when 
> invoking CNI plugin "DEL" command so that it conforms to the spec:
> {noformat}
> When CNI_NETNS and/or prevResult are not provided, the plugin should clean up 
> as many resources as possible (e.g. releasing IPAM allocations) and return a 
> successful response.
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)