[jira] [Assigned] (MESOS-8921) Autotools don't work with newer OpenJDK versions

2018-09-03 Thread Kapil Arya (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-8921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kapil Arya reassigned MESOS-8921:
-

Assignee: Kapil Arya

> Autotools don't work with newer OpenJDK versions
> 
>
> Key: MESOS-8921
> URL: https://issues.apache.org/jira/browse/MESOS-8921
> Project: Mesos
>  Issue Type: Bug
>  Components: build
>Reporter: Kapil Arya
>Assignee: Kapil Arya
>Priority: Major
>  Labels: ci
>
> There are three distinct issues with modern Java and Linux versions:
> 1. Mesos configure script expects `libjvm.so` at 
> `$JAVA_HOME/jre/lib//server/libjvm.so`, but in the newer openjdk 
> versions, `libjvm.so` is found at `$JAVA_HOME/lib/server/libjvm.so`.
> 2. On some distros (e.g., Ubuntu 18.04), JAVA_HOME env var might be missing. 
> In such cases, the configure is able to compute it by looking at `java` and 
> `javac` paths and succeeds. However, some maven plugins require JAVA_HOME to 
> be set and could fail if it's not found.
> {code:java}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-javadoc-plugin:2.8.1:jar 
> (build-and-attach-javadocs) on project mesos: MavenReportException: Error 
> while creating archive: Unable to find javadoc command: The environment 
> variable JAVA_HOME is not correctly set. -> [Help 1]
> {code}
> Because configure scripts generate an automake variable `JAVA_HOME`, we can 
> simply invoke maven in the following way to fix this issue:
> {code:java}
> JAVA_HOME=$JAVA_HOME mvn ...{code}
>  These two behaviors were observed with OpenJDK 1.11 on Ubuntu 18.04 but I 
> suspect that the behavior is present on other distros/OpenJDK versions.
> 3. `javah` has been removed as of OpenJDK 1.10. Instead `javac -h` is to be 
> used as a replacement. See [http://openjdk.java.net/jeps/313] for more 
> details.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MESOS-9205) ContentTypeAndSSLConfig/SchedulerSSLTest.RunTaskAndTeardown fails.

2018-09-03 Thread Till Toenshoff (JIRA)
Till Toenshoff created MESOS-9205:
-

 Summary: 
ContentTypeAndSSLConfig/SchedulerSSLTest.RunTaskAndTeardown fails.
 Key: MESOS-9205
 URL: https://issues.apache.org/jira/browse/MESOS-9205
 Project: Mesos
  Issue Type: Bug
  Components: test
Affects Versions: 1.8.0
 Environment: macOS 10.14, Apple LLVM version 10.0.0 
(clang-1000.10.43.1), libevent 2.0.22-stable, lbssl 1.0.2p. 
Reporter: Till Toenshoff


ContentTypeAndSSLConfig/SchedulerSSLTest.RunTaskAndTeardown/0 and 
ContentTypeAndSSLConfig/SchedulerSSLTest.RunTaskAndTeardown/2 fail consistently 
on macOS.

See logs;

{noformat}
[==] Running 4 tests from 1 test case.
[--] Global test environment set-up.
[--] 4 tests from ContentTypeAndSSLConfig/SchedulerSSLTest
[ RUN  ] ContentTypeAndSSLConfig/SchedulerSSLTest.RunTaskAndTeardown/0
I0903 22:48:28.795420 229418368 openssl.cpp:509] CA directory path unspecified! 
NOTE: Set CA directory path with LIBPROCESS_SSL_CA_DIR=
I0903 22:48:28.795441 229418368 openssl.cpp:514] Will not verify peer 
certificate!
NOTE: Set LIBPROCESS_SSL_VERIFY_CERT=1 to enable peer certificate verification
I0903 22:48:28.795447 229418368 openssl.cpp:534] LIBPROCESS_SSL_REQUIRE_CERT 
implies peer certificate verification.
LIBPROCESS_SSL_VERIFY_CERT set to true
I0903 22:48:28.795559 229418368 openssl.cpp:561] Using CA file: 
/private/var/folders/66/mgr662nx7t90lspb7wjg8ctrgn/T/8hbPai/cert.pem
I0903 22:48:28.795965 28344320 process.cpp:926] Stopped the socket accept loop
I0903 22:48:28.800592 229418368 cluster.cpp:173] Creating default 'local' 
authorizer
I0903 22:48:28.804749 27271168 master.cpp:413] Master 
4c4592b1-ce05-42d9-a2fe-27159146ae42 (lobomacpro4.fritz.box) started on 
192.168.178.20:58409
I0903 22:48:28.804769 27271168 master.cpp:416] Flags at startup: --acls="" 
--agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
--allocation_interval="1secs" --allocator="hierarchical" 
--authenticate_agents="true" --authenticate_frameworks="true" 
--authenticate_http_frameworks="true" --authenticate_http_readonly="true" 
--authenticate_http_readwrite="true" --authentication_v0_timeout="15secs" 
--authenticators="crammd5" --authorizers="local" 
--credentials="/private/var/folders/66/mgr662nx7t90lspb7wjg8ctrgn/T/8hbPai/credentials"
 --filter_gpu_resources="true" --framework_sorter="drf" --help="false" 
--hostname_lookup="true" --http_authenticators="basic" 
--http_framework_authenticators="basic" --initialize_driver_logging="true" 
--log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
--max_agent_ping_timeouts="5" --max_completed_frameworks="50" 
--max_completed_tasks_per_framework="1000" 
--max_unreachable_tasks_per_framework="1000" --memory_profiling="false" 
--min_allocatable_resources="cpus:0.01|mem:32" --port="5050" --quiet="false" 
--recovery_agent_removal_limit="100%" --registry="in_memory" 
--registry_fetch_timeout="1mins" --registry_gc_interval="15mins" 
--registry_max_agent_age="2weeks" --registry_max_agent_count="102400" 
--registry_store_timeout="100secs" --registry_strict="false" 
--require_agent_domain="false" --role_sorter="drf" --root_submissions="true" 
--version="false" --webui_dir="/usr/local/share/mesos/webui" 
--work_dir="/private/var/folders/66/mgr662nx7t90lspb7wjg8ctrgn/T/8hbPai/master"
 --zk_session_timeout="10secs"
I0903 22:48:28.806269 27271168 master.cpp:465] Master only allowing 
authenticated frameworks to register
I0903 22:48:28.806282 27271168 master.cpp:471] Master only allowing 
authenticated agents to register
I0903 22:48:28.806288 27271168 master.cpp:477] Master only allowing 
authenticated HTTP frameworks to register
I0903 22:48:28.806293 27271168 credentials.hpp:37] Loading credentials for 
authentication from 
'/private/var/folders/66/mgr662nx7t90lspb7wjg8ctrgn/T/8hbPai/credentials'
I0903 22:48:28.806504 27271168 master.cpp:521] Using default 'crammd5' 
authenticator
I0903 22:48:28.806540 27271168 authenticator.cpp:520] Initializing server SASL
I0903 22:48:28.832607 27271168 http.cpp:977] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-readonly'
I0903 22:48:28.832725 27271168 http.cpp:977] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-readwrite'
I0903 22:48:28.832794 27271168 http.cpp:977] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-scheduler'
I0903 22:48:28.832818 27271168 master.cpp:602] Authorization enabled
I0903 22:48:28.836750 22978560 master.cpp:2083] Elected as the leading master!
I0903 22:48:28.836772 22978560 master.cpp:1638] Recovering from registrar
I0903 22:48:28.838846 26198016 registrar.cpp:383] Successfully fetched the 
registry (0B) in 1.900032ms
I0903 22:48:28.839041 26198016 registrar.cpp:487] Applied 1 operations in 
38876ns; attempting to update the registry
I0903 22:48:28.840880 26198016 registrar.cpp:544] 

[jira] [Comment Edited] (MESOS-8403) Add agent HTTP API operator call to mark local resource providers as gone

2018-09-03 Thread Benjamin Bannier (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-8403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16565525#comment-16565525
 ] 

Benjamin Bannier edited comment on MESOS-8403 at 9/3/18 1:01 PM:
-

Reviews:

-[https://reviews.apache.org/r/68143/]-
 -[https://reviews.apache.org/r/68144/]-
 -[https://reviews.apache.org/r/68146/]-
 [https://reviews.apache.org/r/68147/]


was (Author: bbannier):
Reviews:

~https://reviews.apache.org/r/68143/~
~https://reviews.apache.org/r/68144/~
~https://reviews.apache.org/r/68146/~
https://reviews.apache.org/r/68147/

> Add agent HTTP API operator call to mark local resource providers as gone
> -
>
> Key: MESOS-8403
> URL: https://issues.apache.org/jira/browse/MESOS-8403
> Project: Mesos
>  Issue Type: Improvement
>  Components: agent, storage
>Affects Versions: 1.5.0
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>Priority: Major
>  Labels: mesosphere
>
> It is currently not possible to mark local resource providers as gone (e.g., 
> after agent reconfiguration). As resource providers registered at earlier 
> times could still be cached in a number of places, e.g., the agent or the 
> master, the only way to e.g., prevent this cache from growing too large is to 
> fail over caching components (to e.g., prevent an agent cache to update a 
> fresh master cache during reconciliation).
> Showing unavailable and known to be gone resource providers in various 
> endpoints is likely also confusing to users.
> We should add an operator call to mark resource providers as gone. While the 
> entity managing resource provider subscription state is the resource provider 
> manager, it still seems to make sense to add this operator call to the agent 
> API as currently only local resource providers are supported. The agent would 
> then forward the call to the resource provider manager which would transition 
> its state for the affected resource provider, e.g., setting its state to 
> {{GONE}} and removing it from the list of known resource providers, and then 
> send out an update to its subscribers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-8403) Add agent HTTP API operator call to mark local resource providers as gone

2018-09-03 Thread Benjamin Bannier (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-8403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16602138#comment-16602138
 ] 

Benjamin Bannier commented on MESOS-8403:
-

{noformat}
commit 6a98857cf429580b72cd97ddd749a074c3e4524d
Author: Benjamin Bannier 
Date:   Mon Aug 13 11:10:59 2018 +0200

Sent an event to resource providers when they are removed.

In order to allow proper cleanup the resource provider manager sends a
`REMOVED` to a resource provider when it is being removed.

The event is not sent reliably, i.e., if the resource provider was
e.g., not subscribed when it was removed we currently will never
attempt to resend the event.

Review: https://reviews.apache.org/r/68145/

commit 75bc091e123744003f2ef956be54ad7f562c4815
Author: Benjamin Bannier 
Date:   Wed Aug 15 09:37:12 2018 +0200

Made RP manager only send resource provider ID on state updates.

With the introduction of the resource provider `SUBSCRIBE` event which
contains the full `ResourceProviderInfo` of a subscribed resource
provider, we can avoid sending the `ResourceProviderInfo` in
`UPDATE_STATE` events and instead only send a `ResourceProviderID`
like in most other messages.

Review: https://reviews.apache.org/r/68362

commit f83b31867c86e35f38fd538993138768939291f0
Author: Benjamin Bannier 
Date:   Mon Aug 13 11:11:04 2018 +0200

Added actions and ACLs to authorize removal of resource providers.

Review: https://reviews.apache.org/r/68146/

commit 4b558e24594b43456f35de697ed8484bbb331fe1
Author: Benjamin Bannier 
Date:   Mon Aug 13 11:10:51 2018 +0200

Added methods to remove resource providers from provider manager.

This patch adds a method to remove a resource provider from the
resource provider manager. The resource provider will be marked as
removed in the manager's registry and disconnected. We also expose a
new `REMOVE` event whenever a resource provider was removed.

This patch does not add integration with e.g., the agent.

Review: https://reviews.apache.org/r/68144/

commit 700a62313e8f0fce94f43c9be6f54fc848cf6eb5
Author: Benjamin Bannier 
Date:   Mon Aug 13 11:10:44 2018 +0200

Made resource provider manager emit an event when provider subscribed.

This patch adds a `SUBSCRIBE` resource provider message which is
emitted by the resource provider manager when a resource provider
subscribes. This event can e.g., be used to detect subscriptions when
theresource provider never updates any resources.

We currently do not expose this event from the agent outwards.

Review: https://reviews.apache.org/r/68143/

commit 701e014a69b1f692f011e79169582c701ebf4f3c
Author: Benjamin Bannier 
Date:   Fri Aug 17 12:00:43 2018 +0200

Allowed tracking of resource providers w/o resource version in agent.

This will be used in a later patch introducing explicit handling of
resource provider subscription events in the agent.

Review: https://reviews.apache.org/r/68407
{noformat}

> Add agent HTTP API operator call to mark local resource providers as gone
> -
>
> Key: MESOS-8403
> URL: https://issues.apache.org/jira/browse/MESOS-8403
> Project: Mesos
>  Issue Type: Improvement
>  Components: agent, storage
>Affects Versions: 1.5.0
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>Priority: Major
>  Labels: mesosphere
>
> It is currently not possible to mark local resource providers as gone (e.g., 
> after agent reconfiguration). As resource providers registered at earlier 
> times could still be cached in a number of places, e.g., the agent or the 
> master, the only way to e.g., prevent this cache from growing too large is to 
> fail over caching components (to e.g., prevent an agent cache to update a 
> fresh master cache during reconciliation).
> Showing unavailable and known to be gone resource providers in various 
> endpoints is likely also confusing to users.
> We should add an operator call to mark resource providers as gone. While the 
> entity managing resource provider subscription state is the resource provider 
> manager, it still seems to make sense to add this operator call to the agent 
> API as currently only local resource providers are supported. The agent would 
> then forward the call to the resource provider manager which would transition 
> its state for the affected resource provider, e.g., setting its state to 
> {{GONE}} and removing it from the list of known resource providers, and then 
> send out an update to its subscribers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (MESOS-8403) Add agent HTTP API operator call to mark local resource providers as gone

2018-09-03 Thread Benjamin Bannier (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-8403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16565525#comment-16565525
 ] 

Benjamin Bannier edited comment on MESOS-8403 at 9/3/18 1:00 PM:
-

Reviews:

~https://reviews.apache.org/r/68143/~
~https://reviews.apache.org/r/68144/~
~https://reviews.apache.org/r/68146/~
https://reviews.apache.org/r/68147/


was (Author: bbannier):
Reviews:

https://reviews.apache.org/r/68143/
https://reviews.apache.org/r/68144/
https://reviews.apache.org/r/68146/
https://reviews.apache.org/r/68147/

> Add agent HTTP API operator call to mark local resource providers as gone
> -
>
> Key: MESOS-8403
> URL: https://issues.apache.org/jira/browse/MESOS-8403
> Project: Mesos
>  Issue Type: Improvement
>  Components: agent, storage
>Affects Versions: 1.5.0
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>Priority: Major
>  Labels: mesosphere
>
> It is currently not possible to mark local resource providers as gone (e.g., 
> after agent reconfiguration). As resource providers registered at earlier 
> times could still be cached in a number of places, e.g., the agent or the 
> master, the only way to e.g., prevent this cache from growing too large is to 
> fail over caching components (to e.g., prevent an agent cache to update a 
> fresh master cache during reconciliation).
> Showing unavailable and known to be gone resource providers in various 
> endpoints is likely also confusing to users.
> We should add an operator call to mark resource providers as gone. While the 
> entity managing resource provider subscription state is the resource provider 
> manager, it still seems to make sense to add this operator call to the agent 
> API as currently only local resource providers are supported. The agent would 
> then forward the call to the resource provider manager which would transition 
> its state for the affected resource provider, e.g., setting its state to 
> {{GONE}} and removing it from the list of known resource providers, and then 
> send out an update to its subscribers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (MESOS-9116) Launch nested container session fails due to incorrect detection of `mnt` namespace of command executor's task.

2018-09-03 Thread Alexander Rukletsov (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16586276#comment-16586276
 ] 

Alexander Rukletsov edited comment on MESOS-9116 at 9/3/18 10:09 AM:
-

Backports to 1.6.x:
{noformat}
cfba574408a85861d424a2c58d3d7277490c398e
6d884fbf9be169fd97483a1f341540c5354d88a9
a4409826deada53eef8843df1a0178e9edfa4c9c
20a4d4fae2f30f9e5436a154087c1a1bb9dc0629
{noformat}
Backports to 1.5.x:
{noformat}
6dd3fcc8ab2aecd182fff29deac07b32b3cc2d81
edeac7b0da5dd7ee1e4e50320d964eb84220d87d
966574a31a3f8c5d4f9a5f02eeb1644aff7fdc97
e4d8ab9911af6d494aae7f5762dd84b8f085fd1e
{noformat}
Backports to 1.4.x (partial):
{noformat}
c37eb59e4c4b7b6c16509f317c78207da6eeb485
{noformat}


was (Author: alexr):
Backports to 1.6.x:
{noformat}
cfba574408a85861d424a2c58d3d7277490c398e
6d884fbf9be169fd97483a1f341540c5354d88a9
a4409826deada53eef8843df1a0178e9edfa4c9c
20a4d4fae2f30f9e5436a154087c1a1bb9dc0629
{noformat}
Backports to 1.5.x:
{noformat}
6dd3fcc8ab2aecd182fff29deac07b32b3cc2d81
edeac7b0da5dd7ee1e4e50320d964eb84220d87d
966574a31a3f8c5d4f9a5f02eeb1644aff7fdc97
e4d8ab9911af6d494aae7f5762dd84b8f085fd1e
{noformat}
Backports to 1.4.x:
{noformat}
c37eb59e4c4b7b6c16509f317c78207da6eeb485
{noformat}

> Launch nested container session fails due to incorrect detection of `mnt` 
> namespace of command executor's task.
> ---
>
> Key: MESOS-9116
> URL: https://issues.apache.org/jira/browse/MESOS-9116
> Project: Mesos
>  Issue Type: Bug
>  Components: agent, containerization
>Reporter: Andrei Budnik
>Assignee: Andrei Budnik
>Priority: Critical
>  Labels: mesosphere
> Fix For: 1.4.3, 1.5.2, 1.6.2, 1.7.0
>
> Attachments: pstree.png
>
>
> Launch nested container call might fail with the following error:
> {code:java}
> Failed to enter mount namespace: Failed to open '/proc/29473/ns/mnt': No such 
> file or directory
> {code}
> This happens when the containerizer launcher [tries to 
> enter|https://github.com/apache/mesos/blob/077f122d52671412a2ab5d992d535712cc154002/src/slave/containerizer/mesos/launch.cpp#L879-L892]
>  `mnt` namespace using the pid of a terminated process. The pid [was 
> detected|https://github.com/apache/mesos/blob/077f122d52671412a2ab5d992d535712cc154002/src/slave/containerizer/mesos/containerizer.cpp#L1930-L1958]
>  by the agent before spawning the containerizer launcher process, because the 
> process was running back then.
> The issue can be reproduced using the following test (pseudocode):
> {code:java}
> launchTask("sleep 1000")
> parentContainerId = containerizer.containers().begin()
> outputs = []
> for i in range(10):
>   ContainerId containerId
>   containerId.parent = parentContainerId
>   containerId.id = UUID.random()
>   LAUNCH_NESTED_CONTAINER_SESSION(containerId, "echo echo")
>   response = ATTACH_CONTAINER_OUTPUT(containerId)
>   outputs.append(response.reader)
> for output in outputs:
>   stdout, stderr = getProcessIOData(output)
>   assert("echo" == stdout + stderr){code}
> When we start the very first nested container, `getMountNamespaceTarget()` 
> returns a PID of the task (`sleep 1000`), because it's the only process whose 
> `mnt` namespace differs from the parent container. This nested container 
> becomes a child of PID 1 process, which is also a parent of the command 
> executor. It's not an executor's child! It can be seen in attached 
> `pstree.png`.
> When we start a second nested container, `getMountNamespaceTarget()` might 
> return PID of the previous nested container (`echo echo`) instead of the 
> task's PID (`sleep 1000`). It happens because the first nested container 
> entered `mnt` namespace of the task. Then, the containerizer launcher 
> ("nanny" process) attempts to enter `mnt` namespace using the PID of a 
> terminated process, so we get this error.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MESOS-9204) MetricsTest.SnapshotStatistics is flaky

2018-09-03 Thread Benjamin Bannier (JIRA)
Benjamin Bannier created MESOS-9204:
---

 Summary: MetricsTest.SnapshotStatistics is flaky
 Key: MESOS-9204
 URL: https://issues.apache.org/jira/browse/MESOS-9204
 Project: Mesos
  Issue Type: Bug
  Components: libprocess, test
Reporter: Benjamin Bannier


I see the test {{MetricsTest.SnapshotStatistics}} fail under load, e.g., in 
parallel test execution.

{noformat}
[ RUN  ] MetricsTest.SnapshotStatistics
../3rdparty/libprocess/src/tests/metrics_tests.cpp:536: Failure
  Expected: expected[key]
  Which is: 9.9902
To be equal to: responseValues[key]
  Which is: 2.1219957904712067e-314
../3rdparty/libprocess/src/tests/metrics_tests.cpp:536: Failure
  Expected: expected[key]
  Which is: 9.9996
To be equal to: responseValues[key]
  Which is: 0
../3rdparty/libprocess/src/tests/metrics_tests.cpp:536: Failure
  Expected: expected[key]
  Which is: 9.9004
To be equal to: responseValues[key]
  Which is: 9.9996
../3rdparty/libprocess/src/tests/metrics_tests.cpp:536: Failure
  Expected: expected[key]
  Which is: 9.5
To be equal to: responseValues[key]
  Which is: 9.9902
../3rdparty/libprocess/src/tests/metrics_tests.cpp:536: Failure
  Expected: expected[key]
  Which is: 5
To be equal to: responseValues[key]
  Which is: 9
../3rdparty/libprocess/src/tests/metrics_tests.cpp:536: Failure
  Expected: expected[key]
  Which is: 9
To be equal to: responseValues[key]
  Which is: 9.9004
[  FAILED  ] MetricsTest.SnapshotStatistics (26 ms)
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)