[jira] [Commented] (MESOS-7971) PersistentVolumeEndpointsTest.EndpointCreateThenOfferRemove test is flaky

2019-01-10 Thread Andrei Budnik (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-7971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16739635#comment-16739635
 ] 

Andrei Budnik commented on MESOS-7971:
--

This is something different from previous ones.
{code:java}
E0110 17:13:09.326659 13916 master.cpp:8586] Failed to find the operation '' 
(uuid: 825f65eb-3ba1-4dfa-bdfa-8eb29194ace3) for an operator API call on agent 
ae22a9c8-0ef6-4f1e-b1eb-7b55f6e4508b-S0
{code}
Full log:
{code:java}
[ RUN ] PersistentVolumeEndpointsTest.EndpointCreateThenOfferRemove
I0110 17:12:59.303460 13893 cluster.cpp:174] Creating default 'local' authorizer
I0110 17:12:59.304430 13912 master.cpp:416] Master 
ae22a9c8-0ef6-4f1e-b1eb-7b55f6e4508b (ip-172-16-10-92.ec2.internal) started on 
172.16.10.92:42320
I0110 17:12:59.304451 13912 master.cpp:419] Flags at startup: --acls="" 
--agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
--allocation_interval="1000secs" --allocator="hierarchical" 
--authenticate_agents="true" --authenticate_frameworks="true" 
--authenticate_http_frameworks="true" --authenticate_http_readonly="true" 
--authenticate_http_readwrite="true" --authentication_v0_timeout="15secs" 
--authenticators="crammd5" --authorizers="local" 
--credentials="/tmp/PfFTwT/credentials" --filter_gpu_resources="true" 
--framework_sorter="drf" --help="false" --hostname_lookup="true" 
--http_authenticators="basic" --http_framework_authenticators="basic" 
--initialize_driver_logging="true" --log_auto_initialize="true" 
--logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" 
--max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
--max_operator_event_stream_subscribers="1000" 
--max_unreachable_tasks_per_framework="1000" --memory_profiling="false" 
--min_allocatable_resources="cpus:0.01|mem:32" --port="5050" 
--publish_per_framework_metrics="true" --quiet="false" 
--recovery_agent_removal_limit="100%" --registry="in_memory" 
--registry_fetch_timeout="1mins" --registry_gc_interval="15mins" 
--registry_max_agent_age="2weeks" --registry_max_agent_count="102400" 
--registry_store_timeout="100secs" --registry_strict="false" 
--require_agent_domain="false" --role_sorter="drf" --roles="role1" 
--root_submissions="true" --version="false" 
--webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/PfFTwT/master" 
--zk_session_timeout="10secs"
I0110 17:12:59.304585 13912 master.cpp:468] Master only allowing authenticated 
frameworks to register
I0110 17:12:59.304595 13912 master.cpp:474] Master only allowing authenticated 
agents to register
I0110 17:12:59.304603 13912 master.cpp:480] Master only allowing authenticated 
HTTP frameworks to register
I0110 17:12:59.304615 13912 credentials.hpp:37] Loading credentials for 
authentication from '/tmp/PfFTwT/credentials'
I0110 17:12:59.304684 13912 master.cpp:524] Using default 'crammd5' 
authenticator
I0110 17:12:59.304744 13912 http.cpp:965] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-readonly'
I0110 17:12:59.304831 13912 http.cpp:965] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-readwrite'
I0110 17:12:59.304889 13912 http.cpp:965] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-scheduler'
I0110 17:12:59.304941 13912 master.cpp:605] Authorization enabled
W0110 17:12:59.304967 13912 master.cpp:668] The '--roles' flag is deprecated. 
This flag will be removed in the future. See the Mesos 0.27 upgrade notes for 
more information
I0110 17:12:59.305047 13919 hierarchical.cpp:176] Initialized hierarchical 
allocator process
I0110 17:12:59.305128 13918 whitelist_watcher.cpp:77] No whitelist given
I0110 17:12:59.305600 13914 master.cpp:2085] Elected as the leading master!
I0110 17:12:59.305622 13914 master.cpp:1640] Recovering from registrar
I0110 17:12:59.305698 13913 registrar.cpp:339] Recovering registrar
I0110 17:12:59.305853 13912 registrar.cpp:383] Successfully fetched the 
registry (0B) in 118016ns
I0110 17:12:59.305899 13912 registrar.cpp:487] Applied 1 operations in 8238ns; 
attempting to update the registry
I0110 17:12:59.306036 13912 registrar.cpp:544] Successfully updated the 
registry in 112128ns
I0110 17:12:59.306092 13912 registrar.cpp:416] Successfully recovered registrar
I0110 17:12:59.306217 13916 master.cpp:1754] Recovered 0 agents from the 
registry (172B); allowing 10mins for agents to reregister
I0110 17:12:59.306258 13919 hierarchical.cpp:216] Skipping recovery of 
hierarchical allocator: nothing to recover
W0110 17:12:59.307780 13893 process.cpp:2829] Attempted to spawn already 
running process files@172.16.10.92:42320
I0110 17:12:59.308149 13893 containerizer.cpp:305] Using isolation { 
environment_secret, posix/cpu, posix/mem, filesystem/posix, network/cni }
I0110 17:12:59.310348 13893 linux_launcher.cpp:144] Using 
/sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher
I0110 17:12:59.310752 13893 

[jira] [Commented] (MESOS-7971) PersistentVolumeEndpointsTest.EndpointCreateThenOfferRemove test is flaky

2018-12-06 Thread Chun-Hung Hsiao (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-7971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16712278#comment-16712278
 ] 

Chun-Hung Hsiao commented on MESOS-7971:


For resource provider operations, we use  {{resource_version_uuid}} to resolve 
this.

It seems to me that we should to the same in {{Slave::applyOperation}} as well:
Check if {{ApplyOperationMessage.resource_version_uuid}} equals to 
{{resourceVersion}},
and only apply the speculative operation if the version matches.

However, we only have `resource_version_uuid` since 1.5 (with the 
{{RESOURCE_PROVIDER}} agent capability),
we could not use the same strategy to fix this in 1.4 if we want to.

> PersistentVolumeEndpointsTest.EndpointCreateThenOfferRemove test is flaky
> -
>
> Key: MESOS-7971
> URL: https://issues.apache.org/jira/browse/MESOS-7971
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation
>Affects Versions: 1.4.0, 1.6.0, 1.7.0, 1.8.0
>Reporter: Vinod Kone
>Assignee: Meng Zhu
>Priority: Critical
>  Labels: flaky-test, mesosphere
> Attachments: ApacheJenkinsConsoleText_autotools_gcc_ubuntu16.txt
>
>
> Saw this when testing 1.4.0-rc5
> {code}
> [ RUN  ] PersistentVolumeEndpointsTest.EndpointCreateThenOfferRemove
> I0912 05:40:27.335222 30860 cluster.cpp:162] Creating default 'local' 
> authorizer
> I0912 05:40:27.338429 30867 master.cpp:442] Master 
> 2bd1e8eb-e314-4181-9ed3-d397ec1dbede (6aa774430302) started on 
> 172.17.0.3:54639
> I0912 05:40:27.338472 30867 master.cpp:444] Flags at startup: --acls="" 
> --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
> --allocation_interval="50ms" --allocator="HierarchicalDRF" 
> --authenticate_agents="true" --authenticate_frameworks="true" 
> --authenticate_http_frameworks="true" --authenticate_http_readonly="true" 
> --authenticate_http_readwrite="true" --authenticators="crammd5" 
> --authorizers="local" --credentials="/tmp/hH0YXe/credentials" 
> --filter_gpu_resources="true" --framework_sorter="drf" --help="false" 
> --hostname_lookup="true" --http_authenticators="basic" 
> --http_framework_authenticators="basic" --initialize_driver_logging="true" 
> --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
> --max_agent_ping_timeouts="5" --max_completed_frameworks="50" 
> --max_completed_tasks_per_framework="1000" 
> --max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false" 
> --recovery_agent_removal_limit="100%" --registry="in_memory" 
> --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" 
> --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" 
> --registry_store_timeout="100secs" --registry_strict="false" --roles="role1" 
> --root_submissions="true" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/hH0YXe/master" 
> --zk_session_timeout="10secs"
> I0912 05:40:27.338778 30867 master.cpp:494] Master only allowing 
> authenticated frameworks to register
> I0912 05:40:27.338788 30867 master.cpp:508] Master only allowing 
> authenticated agents to register
> I0912 05:40:27.338793 30867 master.cpp:521] Master only allowing 
> authenticated HTTP frameworks to register
> I0912 05:40:27.338799 30867 credentials.hpp:37] Loading credentials for 
> authentication from '/tmp/hH0YXe/credentials'
> I0912 05:40:27.353009 30867 master.cpp:566] Using default 'crammd5' 
> authenticator
> I0912 05:40:27.353183 30867 http.cpp:1026] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-readonly'
> I0912 05:40:27.353364 30867 http.cpp:1026] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-readwrite'
> I0912 05:40:27.353482 30867 http.cpp:1026] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-scheduler'
> I0912 05:40:27.353588 30867 master.cpp:646] Authorization enabled
> W0912 05:40:27.353605 30867 master.cpp:709] The '--roles' flag is deprecated. 
> This flag will be removed in the future. See the Mesos 0.27 upgrade notes for 
> more information
> I0912 05:40:27.353742 30868 hierarchical.cpp:171] Initialized hierarchical 
> allocator process
> I0912 05:40:27.353775 30872 whitelist_watcher.cpp:77] No whitelist given
> I0912 05:40:27.356655 30873 master.cpp:2163] Elected as the leading master!
> I0912 05:40:27.356675 30873 master.cpp:1702] Recovering from registrar
> I0912 05:40:27.356868 30874 registrar.cpp:347] Recovering registrar
> I0912 05:40:27.357390 30874 registrar.cpp:391] Successfully fetched the 
> registry (0B) in 494080ns
> I0912 05:40:27.357483 30874 registrar.cpp:495] Applied 1 operations in 
> 31911ns; attempting to update the registry
> I0912 05:40:27.357919 30874 registrar.cpp:552] Successfully updated the 
> registry in 391936ns
> 

[jira] [Commented] (MESOS-7971) PersistentVolumeEndpointsTest.EndpointCreateThenOfferRemove test is flaky

2018-12-06 Thread Meng Zhu (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-7971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16712200#comment-16712200
 ] 

Meng Zhu commented on MESOS-7971:
-

This looks like a legitimate bug. Here is the sequence of events that can 
trigger the bug

- agent (re)registers with the master
- operation calls are made to the master (let’s say create volume)
- the allocator is speculatively updated in 
https://github.com/apache/mesos/blob/master/src/master/master.cpp#L11315
- before agent resource gets updated, it sends `UpdateSlaveMessage` when 
getting the (re)registered message in 
https://github.com/apache/mesos/blob/master/src/slave/slave.cpp#L1551 and 
https://github.com/apache/mesos/blob/master/src/slave/slave.cpp#L1633
- the `UpdateSlaveMessage` triggers  allocator to update the total resources 
again https://github.com/apache/mesos/blob/master/src/master/master.cpp#L8205, 
resource update from the previous operation is overwritten and LOST
- agent finishes the operation and informs the master through 
`UpdateOperationStatusMessage`
- but for the speculative operation, we do not update the allocator 
https://github.com/apache/mesos/blob/master/src/master/master.cpp#L11177

Thus, the speculative operation failed to be applied on the allocator but 
successfully applied to the agent.

> PersistentVolumeEndpointsTest.EndpointCreateThenOfferRemove test is flaky
> -
>
> Key: MESOS-7971
> URL: https://issues.apache.org/jira/browse/MESOS-7971
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation
>Affects Versions: 1.4.0, 1.6.0, 1.7.0, 1.8.0
>Reporter: Vinod Kone
>Assignee: Meng Zhu
>Priority: Critical
>  Labels: flaky-test, mesosphere
> Attachments: ApacheJenkinsConsoleText_autotools_gcc_ubuntu16.txt
>
>
> Saw this when testing 1.4.0-rc5
> {code}
> [ RUN  ] PersistentVolumeEndpointsTest.EndpointCreateThenOfferRemove
> I0912 05:40:27.335222 30860 cluster.cpp:162] Creating default 'local' 
> authorizer
> I0912 05:40:27.338429 30867 master.cpp:442] Master 
> 2bd1e8eb-e314-4181-9ed3-d397ec1dbede (6aa774430302) started on 
> 172.17.0.3:54639
> I0912 05:40:27.338472 30867 master.cpp:444] Flags at startup: --acls="" 
> --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
> --allocation_interval="50ms" --allocator="HierarchicalDRF" 
> --authenticate_agents="true" --authenticate_frameworks="true" 
> --authenticate_http_frameworks="true" --authenticate_http_readonly="true" 
> --authenticate_http_readwrite="true" --authenticators="crammd5" 
> --authorizers="local" --credentials="/tmp/hH0YXe/credentials" 
> --filter_gpu_resources="true" --framework_sorter="drf" --help="false" 
> --hostname_lookup="true" --http_authenticators="basic" 
> --http_framework_authenticators="basic" --initialize_driver_logging="true" 
> --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
> --max_agent_ping_timeouts="5" --max_completed_frameworks="50" 
> --max_completed_tasks_per_framework="1000" 
> --max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false" 
> --recovery_agent_removal_limit="100%" --registry="in_memory" 
> --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" 
> --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" 
> --registry_store_timeout="100secs" --registry_strict="false" --roles="role1" 
> --root_submissions="true" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/hH0YXe/master" 
> --zk_session_timeout="10secs"
> I0912 05:40:27.338778 30867 master.cpp:494] Master only allowing 
> authenticated frameworks to register
> I0912 05:40:27.338788 30867 master.cpp:508] Master only allowing 
> authenticated agents to register
> I0912 05:40:27.338793 30867 master.cpp:521] Master only allowing 
> authenticated HTTP frameworks to register
> I0912 05:40:27.338799 30867 credentials.hpp:37] Loading credentials for 
> authentication from '/tmp/hH0YXe/credentials'
> I0912 05:40:27.353009 30867 master.cpp:566] Using default 'crammd5' 
> authenticator
> I0912 05:40:27.353183 30867 http.cpp:1026] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-readonly'
> I0912 05:40:27.353364 30867 http.cpp:1026] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-readwrite'
> I0912 05:40:27.353482 30867 http.cpp:1026] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-scheduler'
> I0912 05:40:27.353588 30867 master.cpp:646] Authorization enabled
> W0912 05:40:27.353605 30867 master.cpp:709] The '--roles' flag is deprecated. 
> This flag will be removed in the future. See the Mesos 0.27 upgrade notes for 
> more information
> I0912 05:40:27.353742 30868 hierarchical.cpp:171] Initialized hierarchical 
> 

[jira] [Commented] (MESOS-7971) PersistentVolumeEndpointsTest.EndpointCreateThenOfferRemove test is flaky

2018-12-06 Thread Meng Zhu (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-7971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16712148#comment-16712148
 ] 

Meng Zhu commented on MESOS-7971:
-

This test is flaky because due to a race.

The test expects the offer to be sent out after the reserve and create 
operations have finished. But it only waits for the 202 returned by both calls.

When the offer is sent while the agent is processing the create operation, the 
offer does not contain the expected volume resource. Failing the test.

Adding manual clock control and properly settle should fix the test.


> PersistentVolumeEndpointsTest.EndpointCreateThenOfferRemove test is flaky
> -
>
> Key: MESOS-7971
> URL: https://issues.apache.org/jira/browse/MESOS-7971
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation
>Affects Versions: 1.4.0, 1.6.0, 1.7.0, 1.8.0
>Reporter: Vinod Kone
>Assignee: Meng Zhu
>Priority: Critical
>  Labels: flaky-test, mesosphere
> Attachments: ApacheJenkinsConsoleText_autotools_gcc_ubuntu16.txt
>
>
> Saw this when testing 1.4.0-rc5
> {code}
> [ RUN  ] PersistentVolumeEndpointsTest.EndpointCreateThenOfferRemove
> I0912 05:40:27.335222 30860 cluster.cpp:162] Creating default 'local' 
> authorizer
> I0912 05:40:27.338429 30867 master.cpp:442] Master 
> 2bd1e8eb-e314-4181-9ed3-d397ec1dbede (6aa774430302) started on 
> 172.17.0.3:54639
> I0912 05:40:27.338472 30867 master.cpp:444] Flags at startup: --acls="" 
> --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
> --allocation_interval="50ms" --allocator="HierarchicalDRF" 
> --authenticate_agents="true" --authenticate_frameworks="true" 
> --authenticate_http_frameworks="true" --authenticate_http_readonly="true" 
> --authenticate_http_readwrite="true" --authenticators="crammd5" 
> --authorizers="local" --credentials="/tmp/hH0YXe/credentials" 
> --filter_gpu_resources="true" --framework_sorter="drf" --help="false" 
> --hostname_lookup="true" --http_authenticators="basic" 
> --http_framework_authenticators="basic" --initialize_driver_logging="true" 
> --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
> --max_agent_ping_timeouts="5" --max_completed_frameworks="50" 
> --max_completed_tasks_per_framework="1000" 
> --max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false" 
> --recovery_agent_removal_limit="100%" --registry="in_memory" 
> --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" 
> --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" 
> --registry_store_timeout="100secs" --registry_strict="false" --roles="role1" 
> --root_submissions="true" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/hH0YXe/master" 
> --zk_session_timeout="10secs"
> I0912 05:40:27.338778 30867 master.cpp:494] Master only allowing 
> authenticated frameworks to register
> I0912 05:40:27.338788 30867 master.cpp:508] Master only allowing 
> authenticated agents to register
> I0912 05:40:27.338793 30867 master.cpp:521] Master only allowing 
> authenticated HTTP frameworks to register
> I0912 05:40:27.338799 30867 credentials.hpp:37] Loading credentials for 
> authentication from '/tmp/hH0YXe/credentials'
> I0912 05:40:27.353009 30867 master.cpp:566] Using default 'crammd5' 
> authenticator
> I0912 05:40:27.353183 30867 http.cpp:1026] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-readonly'
> I0912 05:40:27.353364 30867 http.cpp:1026] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-readwrite'
> I0912 05:40:27.353482 30867 http.cpp:1026] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-scheduler'
> I0912 05:40:27.353588 30867 master.cpp:646] Authorization enabled
> W0912 05:40:27.353605 30867 master.cpp:709] The '--roles' flag is deprecated. 
> This flag will be removed in the future. See the Mesos 0.27 upgrade notes for 
> more information
> I0912 05:40:27.353742 30868 hierarchical.cpp:171] Initialized hierarchical 
> allocator process
> I0912 05:40:27.353775 30872 whitelist_watcher.cpp:77] No whitelist given
> I0912 05:40:27.356655 30873 master.cpp:2163] Elected as the leading master!
> I0912 05:40:27.356675 30873 master.cpp:1702] Recovering from registrar
> I0912 05:40:27.356868 30874 registrar.cpp:347] Recovering registrar
> I0912 05:40:27.357390 30874 registrar.cpp:391] Successfully fetched the 
> registry (0B) in 494080ns
> I0912 05:40:27.357483 30874 registrar.cpp:495] Applied 1 operations in 
> 31911ns; attempting to update the registry
> I0912 05:40:27.357919 30874 registrar.cpp:552] Successfully updated the 
> registry in 391936ns
> I0912 05:40:27.358018 30874 registrar.cpp:424] Successfully recovered 
> registrar
> I0912 

[jira] [Commented] (MESOS-7971) PersistentVolumeEndpointsTest.EndpointCreateThenOfferRemove test is flaky

2018-12-03 Thread Vinod Kone (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-7971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16707765#comment-16707765
 ] 

Vinod Kone commented on MESOS-7971:
---

Saw this again.

{code}
*06:14:51* [ RUN  ] 
PersistentVolumeEndpointsTest.EndpointCreateThenOfferRemove*06:14:51* I1203 
06:14:50.630549 19784 cluster.cpp:173] Creating default 'local' 
authorizer*06:14:51* I1203 06:14:50.633529 19796 master.cpp:413] Master 
f1ffe054-ad44-45d4-9f39-84b048e1a359 (c16130e94783) started on 
172.17.0.3:44340*06:14:51* I1203 06:14:50.633581 19796 master.cpp:416] Flags at 
startup: --acls="" --agent_ping_timeout="15secs" 
--agent_reregister_timeout="10mins" --allocation_interval="1000secs" 
--allocator="hierarchical" --authenticate_agents="true" 
--authenticate_frameworks="true" --authenticate_http_frameworks="true" 
--authenticate_http_readonly="true" --authenticate_http_readwrite="true" 
--authentication_v0_timeout="15secs" --authenticators="crammd5" 
--authorizers="local" --credentials="/tmp/4vMyjy/credentials" 
--filter_gpu_resources="true" --framework_sorter="drf" --help="false" 
--hostname_lookup="true" --http_authenticators="basic" 
--http_framework_authenticators="basic" --initialize_driver_logging="true" 
--log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
--max_agent_ping_timeouts="5" --max_completed_frameworks="50" 
--max_completed_tasks_per_framework="1000" 
--max_unreachable_tasks_per_framework="1000" --memory_profiling="false" 
--min_allocatable_resources="cpus:0.01|mem:32" --port="5050" 
--publish_per_framework_metrics="true" --quiet="false" 
--recovery_agent_removal_limit="100%" --registry="in_memory" 
--registry_fetch_timeout="1mins" --registry_gc_interval="15mins" 
--registry_max_agent_age="2weeks" --registry_max_agent_count="102400" 
--registry_store_timeout="100secs" --registry_strict="false" 
--require_agent_domain="false" --role_sorter="drf" --roles="role1" 
--root_submissions="true" --version="false" 
--webui_dir="/tmp/SRC/build/mesos-1.8.0/_inst/share/mesos/webui" 
--work_dir="/tmp/4vMyjy/master" --zk_session_timeout="10secs"*06:14:51* I1203 
06:14:50.634217 19796 master.cpp:465] Master only allowing authenticated 
frameworks to register*06:14:51* I1203 06:14:50.634236 19796 master.cpp:471] 
Master only allowing authenticated agents to register*06:14:51* I1203 
06:14:50.634253 19796 master.cpp:477] Master only allowing authenticated HTTP 
frameworks to register*06:14:51* I1203 06:14:50.634270 19796 
credentials.hpp:37] Loading credentials for authentication from 
'/tmp/4vMyjy/credentials'*06:14:51* I1203 06:14:50.634608 19796 master.cpp:521] 
Using default 'crammd5' authenticator*06:14:51* I1203 06:14:50.634840 19796 
http.cpp:1042] Creating default 'basic' HTTP authenticator for realm 
'mesos-master-readonly'*06:14:51* I1203 06:14:50.635052 19796 http.cpp:1042] 
Creating default 'basic' HTTP authenticator for realm 
'mesos-master-readwrite'*06:14:51* I1203 06:14:50.635200 19796 http.cpp:1042] 
Creating default 'basic' HTTP authenticator for realm 
'mesos-master-scheduler'*06:14:51* I1203 06:14:50.635373 19796 master.cpp:602] 
Authorization enabled*06:14:51* W1203 06:14:50.635457 19796 master.cpp:665] The 
'--roles' flag is deprecated. This flag will be removed in the future. See the 
Mesos 0.27 upgrade notes for more information*06:14:51* I1203 06:14:50.635991 
19800 whitelist_watcher.cpp:77] No whitelist given*06:14:51* I1203 
06:14:50.636032 19793 hierarchical.cpp:175] Initialized hierarchical allocator 
process*06:14:51* I1203 06:14:50.638939 19796 master.cpp:2105] Elected as the 
leading master!*06:14:51* I1203 06:14:50.638975 19796 master.cpp:1660] 
Recovering from registrar*06:14:51* I1203 06:14:50.639200 19792 
registrar.cpp:339] Recovering registrar*06:14:51* I1203 06:14:50.639927 19792 
registrar.cpp:383] Successfully fetched the registry (0B) in 672768ns*06:14:51* 
I1203 06:14:50.640069 19792 registrar.cpp:487] Applied 1 operations in 48006ns; 
attempting to update the registry*06:14:51* I1203 06:14:50.640718 19792 
registrar.cpp:544] Successfully updated the registry in 582912ns*06:14:51* 
I1203 06:14:50.640852 19792 registrar.cpp:416] Successfully recovered 
registrar*06:14:51* I1203 06:14:50.641299 19800 master.cpp:1774] Recovered 0 
agents from the registry (135B); allowing 10mins for agents to 
reregister*06:14:51* I1203 06:14:50.641340 19799 hierarchical.cpp:215] Skipping 
recovery of hierarchical allocator: nothing to recover*06:14:51* W1203 
06:14:50.647153 19784 process.cpp:2829] Attempted to spawn already running 
process files@172.17.0.3:44340*06:14:51* I1203 06:14:50.648453 19784 
containerizer.cpp:305] Using isolation \{ environment_secret, posix/cpu, 
posix/mem, filesystem/posix, network/cni }*06:14:51* W1203 06:14:50.649060 
19784 backend.cpp:76] Failed to create 'aufs' backend: AufsBackend requires 
root privileges*06:14:51* W1203 06:14:50.649088 19784 backend.cpp:76] Failed to 

[jira] [Commented] (MESOS-7971) PersistentVolumeEndpointsTest.EndpointCreateThenOfferRemove test is flaky

2018-11-06 Thread Joseph Wu (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-7971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16677306#comment-16677306
 ] 

Joseph Wu commented on MESOS-7971:
--

Slightly different logs observed on an internal CI run (Ubuntu 16, no SSL).  
One HTTP response in this run expects a 202, but gets a 409 instead.
{code}
[ RUN  ] PersistentVolumeEndpointsTest.EndpointCreateThenOfferRemove
I1106 19:50:14.650254 19563 cluster.cpp:162] Creating default 'local' authorizer
I1106 19:50:14.651284 19588 master.cpp:442] Master 
d5905469-73fc-4219-b939-c6056f1f62a1 (ip-172-16-10-48.ec2.internal) started on 
172.16.10.48:39946
I1106 19:50:14.651309 19588 master.cpp:444] Flags at startup: --acls="" 
--agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
--allocation_interval="50ms" --allocator="hierarchical" 
--authenticate_agents="true" --authenticate_frameworks="true" 
--authenticate_http_frameworks="true" --authenticate_http_readonly="true" 
--authenticate_http_readwrite="true" --authentication_v0_timeout="15secs" 
--authenticators="crammd5" --authorizers="local" 
--credentials="/tmp/fZatVl/credentials" --filter_gpu_resources="true" 
--framework_sorter="drf" --help="false" --hostname_lookup="true" 
--http_authenticators="basic" --http_framework_authenticators="basic" 
--initialize_driver_logging="true" --log_auto_initialize="true" 
--logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" 
--max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
--max_unreachable_tasks_per_framework="1000" 
--min_allocatable_resources="cpus:0.01|mem:32" --port="5050" --quiet="false" 
--recovery_agent_removal_limit="100%" --registry="in_memory" 
--registry_fetch_timeout="1mins" --registry_gc_interval="15mins" 
--registry_max_agent_age="2weeks" --registry_max_agent_count="102400" 
--registry_store_timeout="100secs" --registry_strict="false" 
--role_sorter="drf" --roles="role1" --root_submissions="true" --version="false" 
--webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/fZatVl/master" 
--zk_session_timeout="10secs"
I1106 19:50:14.651437 19588 master.cpp:494] Master only allowing authenticated 
frameworks to register
I1106 19:50:14.651448 19588 master.cpp:508] Master only allowing authenticated 
agents to register
I1106 19:50:14.651453 19588 master.cpp:521] Master only allowing authenticated 
HTTP frameworks to register
I1106 19:50:14.651459 19588 credentials.hpp:37] Loading credentials for 
authentication from '/tmp/fZatVl/credentials'
I1106 19:50:14.651548 19588 master.cpp:566] Using default 'crammd5' 
authenticator
I1106 19:50:14.651593 19588 http.cpp:1045] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-readonly'
I1106 19:50:14.651643 19588 http.cpp:1045] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-readwrite'
I1106 19:50:14.651672 19588 http.cpp:1045] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-scheduler'
I1106 19:50:14.651700 19588 master.cpp:646] Authorization enabled
W1106 19:50:14.651710 19588 master.cpp:709] The '--roles' flag is deprecated. 
This flag will be removed in the future. See the Mesos 0.27 upgrade notes for 
more information
I1106 19:50:14.651803 19590 hierarchical.cpp:173] Initialized hierarchical 
allocator process
I1106 19:50:14.651830 19590 whitelist_watcher.cpp:77] No whitelist given
I1106 19:50:14.652432 19590 master.cpp:2200] Elected as the leading master!
I1106 19:50:14.652454 19590 master.cpp:1739] Recovering from registrar
I1106 19:50:14.652506 19590 registrar.cpp:347] Recovering registrar
I1106 19:50:14.652595 19590 registrar.cpp:391] Successfully fetched the 
registry (0B) in 72960ns
I1106 19:50:14.652622 19590 registrar.cpp:495] Applied 1 operations in 5332ns; 
attempting to update the registry
I1106 19:50:14.656131 19586 registrar.cpp:552] Successfully updated the 
registry in 3.472128ms
I1106 19:50:14.656177 19586 registrar.cpp:424] Successfully recovered registrar
I1106 19:50:14.656266 19588 master.cpp:1838] Recovered 0 agents from the 
registry (168B); allowing 10mins for agents to re-register
I1106 19:50:14.656299 19588 hierarchical.cpp:211] Skipping recovery of 
hierarchical allocator: nothing to recover
W1106 19:50:14.657806 19563 process.cpp:3196] Attempted to spawn already 
running process files@172.16.10.48:39946
I1106 19:50:14.658203 19563 containerizer.cpp:246] Using isolation: 
posix/cpu,posix/mem,filesystem/posix,network/cni,environment_secret
I1106 19:50:14.661717 19563 linux_launcher.cpp:149] Using 
/sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher
I1106 19:50:14.662039 19563 provisioner.cpp:255] Using default backend 'overlay'
I1106 19:50:14.662547 19563 cluster.cpp:448] Creating default 'local' authorizer
I1106 19:50:14.662969 19589 slave.cpp:249] Mesos agent started on 
(378)@172.16.10.48:39946
I1106 19:50:14.662987 19589 slave.cpp:250] Flags at startup: --acls="" 

[jira] [Commented] (MESOS-7971) PersistentVolumeEndpointsTest.EndpointCreateThenOfferRemove test if flaky

2018-08-14 Thread Vinod Kone (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-7971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16580076#comment-16580076
 ] 

Vinod Kone commented on MESOS-7971:
---

Saw this again in ASF CI

 

{code}
01:59:40 [ RUN  ] 
PersistentVolumeEndpointsTest.EndpointCreateThenOfferRemove
01:59:40 I0814 01:59:40.466804 25171 cluster.cpp:173] Creating default 'local' 
authorizer
01:59:40 I0814 01:59:40.470376 25184 master.cpp:413] Master 
d922690d-0ba0-451a-b563-88c300e70670 (f419ee17fd4a) started on 172.17.0.4:57729
01:59:40 I0814 01:59:40.470427 25184 master.cpp:416] Flags at startup: 
--acls="" --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
--allocation_interval="1000secs" --allocator="hierarchical" 
--authenticate_agents="true" --authenticate_frameworks="true" 
--authenticate_http_frameworks="true" --authenticate_http_readonly="true" 
--authenticate_http_readwrite="true" --authenticators="crammd5" 
--authorizers="local" --credentials="/tmp/e6gxWw/credentials" 
--filter_gpu_resources="true" --framework_sorter="drf" --help="false" 
--hostname_lookup="true" --http_authenticators="basic" 
--http_framework_authenticators="basic" --initialize_driver_logging="true" 
--log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
--max_agent_ping_timeouts="5" --max_completed_frameworks="50" 
--max_completed_tasks_per_framework="1000" 
--max_unreachable_tasks_per_framework="1000" --memory_profiling="false" 
--min_allocatable_resources="cpus:0.01|mem:32" --port="5050" --quiet="false" 
--recovery_agent_removal_limit="100%" --registry="in_memory" 
--registry_fetch_timeout="1mins" --registry_gc_interval="15mins" 
--registry_max_agent_age="2weeks" --registry_max_agent_count="102400" 
--registry_store_timeout="100secs" --registry_strict="false" 
--require_agent_domain="false" --role_sorter="drf" --roles="role1" 
--root_submissions="true" --version="false" 
--webui_dir="/tmp/SRC/build/mesos-1.7.0/_inst/share/mesos/webui" 
--work_dir="/tmp/e6gxWw/master" --zk_session_timeout="10secs"
01:59:40 I0814 01:59:40.470870 25184 master.cpp:465] Master only allowing 
authenticated frameworks to register
01:59:40 I0814 01:59:40.470890 25184 master.cpp:471] Master only allowing 
authenticated agents to register
01:59:40 I0814 01:59:40.470901 25184 master.cpp:477] Master only allowing 
authenticated HTTP frameworks to register
01:59:40 I0814 01:59:40.470916 25184 credentials.hpp:37] Loading credentials 
for authentication from '/tmp/e6gxWw/credentials'
01:59:40 I0814 01:59:40.471252 25184 master.cpp:521] Using default 'crammd5' 
authenticator
01:59:40 I0814 01:59:40.471448 25184 http.cpp:977] Creating default 'basic' 
HTTP authenticator for realm 'mesos-master-readonly'
01:59:40 I0814 01:59:40.471707 25184 http.cpp:977] Creating default 'basic' 
HTTP authenticator for realm 'mesos-master-readwrite'
01:59:40 I0814 01:59:40.472013 25184 http.cpp:977] Creating default 'basic' 
HTTP authenticator for realm 'mesos-master-scheduler'
01:59:40 I0814 01:59:40.472188 25184 master.cpp:602] Authorization enabled
01:59:40 W0814 01:59:40.472220 25184 master.cpp:665] The '--roles' flag is 
deprecated. This flag will be removed in the future. See the Mesos 0.27 upgrade 
notes for more information
01:59:40 I0814 01:59:40.472955 25185 hierarchical.cpp:182] Initialized 
hierarchical allocator process
01:59:40 I0814 01:59:40.477638 25178 whitelist_watcher.cpp:77] No whitelist 
given
01:59:40 I0814 01:59:40.481549 25172 master.cpp:2083] Elected as the leading 
master!
01:59:40 I0814 01:59:40.481595 25172 master.cpp:1638] Recovering from registrar
01:59:40 I0814 01:59:40.481855 25187 registrar.cpp:339] Recovering registrar
01:59:40 I0814 01:59:40.482591 25187 registrar.cpp:383] Successfully fetched 
the registry (0B) in 680704ns
01:59:40 I0814 01:59:40.482749 25187 registrar.cpp:487] Applied 1 operations in 
58202ns; attempting to update the registry
01:59:40 I0814 01:59:40.483386 25187 registrar.cpp:544] Successfully updated 
the registry in 563968ns
01:59:40 I0814 01:59:40.483546 25187 registrar.cpp:416] Successfully recovered 
registrar
01:59:40 I0814 01:59:40.484119 25178 master.cpp:1752] Recovered 0 agents from 
the registry (135B); allowing 10mins for agents to reregister
01:59:40 I0814 01:59:40.484197 25172 hierarchical.cpp:220] Skipping recovery of 
hierarchical allocator: nothing to recover
01:59:40 W0814 01:59:40.490447 25171 process.cpp:2810] Attempted to spawn 
already running process files@172.17.0.4:57729
01:59:40 I0814 01:59:40.491410 25171 containerizer.cpp:300] Using isolation { 
environment_secret, posix/cpu, posix/mem, filesystem/posix, network/cni }
01:59:40 W0814 01:59:40.491938 25171 backend.cpp:76] Failed to create 'aufs' 
backend: AufsBackend requires root privileges
01:59:40 W0814 01:59:40.491963 25171 backend.cpp:76] Failed to create 'bind' 
backend: BindBackend requires root privileges
01:59:40 I0814 01:59:40.491991 25171 

[jira] [Commented] (MESOS-7971) PersistentVolumeEndpointsTest.EndpointCreateThenOfferRemove test if flaky

2018-06-12 Thread Chun-Hung Hsiao (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-7971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16510159#comment-16510159
 ] 

Chun-Hung Hsiao commented on MESOS-7971:


Observed another failure on Apache CI: 
[^ApacheJenkinsConsoleText_autotools_gcc_ubuntu16.txt]

> PersistentVolumeEndpointsTest.EndpointCreateThenOfferRemove test if flaky
> -
>
> Key: MESOS-7971
> URL: https://issues.apache.org/jira/browse/MESOS-7971
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.4.0, 1.6.0
>Reporter: Vinod Kone
>Priority: Major
>  Labels: flaky-test, mesosphere
> Attachments: ApacheJenkinsConsoleText_autotools_gcc_ubuntu16.txt
>
>
> Saw this when testing 1.4.0-rc5
> {code}
> [ RUN  ] PersistentVolumeEndpointsTest.EndpointCreateThenOfferRemove
> I0912 05:40:27.335222 30860 cluster.cpp:162] Creating default 'local' 
> authorizer
> I0912 05:40:27.338429 30867 master.cpp:442] Master 
> 2bd1e8eb-e314-4181-9ed3-d397ec1dbede (6aa774430302) started on 
> 172.17.0.3:54639
> I0912 05:40:27.338472 30867 master.cpp:444] Flags at startup: --acls="" 
> --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
> --allocation_interval="50ms" --allocator="HierarchicalDRF" 
> --authenticate_agents="true" --authenticate_frameworks="true" 
> --authenticate_http_frameworks="true" --authenticate_http_readonly="true" 
> --authenticate_http_readwrite="true" --authenticators="crammd5" 
> --authorizers="local" --credentials="/tmp/hH0YXe/credentials" 
> --filter_gpu_resources="true" --framework_sorter="drf" --help="false" 
> --hostname_lookup="true" --http_authenticators="basic" 
> --http_framework_authenticators="basic" --initialize_driver_logging="true" 
> --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
> --max_agent_ping_timeouts="5" --max_completed_frameworks="50" 
> --max_completed_tasks_per_framework="1000" 
> --max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false" 
> --recovery_agent_removal_limit="100%" --registry="in_memory" 
> --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" 
> --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" 
> --registry_store_timeout="100secs" --registry_strict="false" --roles="role1" 
> --root_submissions="true" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/hH0YXe/master" 
> --zk_session_timeout="10secs"
> I0912 05:40:27.338778 30867 master.cpp:494] Master only allowing 
> authenticated frameworks to register
> I0912 05:40:27.338788 30867 master.cpp:508] Master only allowing 
> authenticated agents to register
> I0912 05:40:27.338793 30867 master.cpp:521] Master only allowing 
> authenticated HTTP frameworks to register
> I0912 05:40:27.338799 30867 credentials.hpp:37] Loading credentials for 
> authentication from '/tmp/hH0YXe/credentials'
> I0912 05:40:27.353009 30867 master.cpp:566] Using default 'crammd5' 
> authenticator
> I0912 05:40:27.353183 30867 http.cpp:1026] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-readonly'
> I0912 05:40:27.353364 30867 http.cpp:1026] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-readwrite'
> I0912 05:40:27.353482 30867 http.cpp:1026] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-scheduler'
> I0912 05:40:27.353588 30867 master.cpp:646] Authorization enabled
> W0912 05:40:27.353605 30867 master.cpp:709] The '--roles' flag is deprecated. 
> This flag will be removed in the future. See the Mesos 0.27 upgrade notes for 
> more information
> I0912 05:40:27.353742 30868 hierarchical.cpp:171] Initialized hierarchical 
> allocator process
> I0912 05:40:27.353775 30872 whitelist_watcher.cpp:77] No whitelist given
> I0912 05:40:27.356655 30873 master.cpp:2163] Elected as the leading master!
> I0912 05:40:27.356675 30873 master.cpp:1702] Recovering from registrar
> I0912 05:40:27.356868 30874 registrar.cpp:347] Recovering registrar
> I0912 05:40:27.357390 30874 registrar.cpp:391] Successfully fetched the 
> registry (0B) in 494080ns
> I0912 05:40:27.357483 30874 registrar.cpp:495] Applied 1 operations in 
> 31911ns; attempting to update the registry
> I0912 05:40:27.357919 30874 registrar.cpp:552] Successfully updated the 
> registry in 391936ns
> I0912 05:40:27.358018 30874 registrar.cpp:424] Successfully recovered 
> registrar
> I0912 05:40:27.358413 30868 master.cpp:1801] Recovered 0 agents from the 
> registry (129B); allowing 10mins for agents to re-register
> I0912 05:40:27.358482 30867 hierarchical.cpp:209] Skipping recovery of 
> hierarchical allocator: nothing to recover
> W0912 05:40:27.364050 30860 process.cpp:3196] Attempted to spawn already 
> running process files@172.17.0.3:54639
> I0912 05:40:27.365372 30860 

[jira] [Commented] (MESOS-7971) PersistentVolumeEndpointsTest.EndpointCreateThenOfferRemove test if flaky

2018-05-09 Thread Chun-Hung Hsiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16469885#comment-16469885
 ] 

Chun-Hung Hsiao commented on MESOS-7971:


Failed on Apache CI: 
https://builds.apache.org/job/Mesos-Buildbot/5273/BUILDTOOL=cmake,COMPILER=clang,CONFIGURATION=--verbose%20--disable-libtool-wrappers,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1%20MESOS_TEST_AWAIT_TIMEOUT=60secs,OS=ubuntu%3A16.04,label_exp=(ubuntu)&&(!ubuntu-us1)&&(!ubuntu-eu2)&&(!H21)&&(!H23)&&(!H26)&&(!H27)/console

> PersistentVolumeEndpointsTest.EndpointCreateThenOfferRemove test if flaky
> -
>
> Key: MESOS-7971
> URL: https://issues.apache.org/jira/browse/MESOS-7971
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.4.0, 1.6.0
>Reporter: Vinod Kone
>Priority: Major
>  Labels: flaky-test, mesosphere
>
> Saw this when testing 1.4.0-rc5
> {code}
> [ RUN  ] PersistentVolumeEndpointsTest.EndpointCreateThenOfferRemove
> I0912 05:40:27.335222 30860 cluster.cpp:162] Creating default 'local' 
> authorizer
> I0912 05:40:27.338429 30867 master.cpp:442] Master 
> 2bd1e8eb-e314-4181-9ed3-d397ec1dbede (6aa774430302) started on 
> 172.17.0.3:54639
> I0912 05:40:27.338472 30867 master.cpp:444] Flags at startup: --acls="" 
> --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
> --allocation_interval="50ms" --allocator="HierarchicalDRF" 
> --authenticate_agents="true" --authenticate_frameworks="true" 
> --authenticate_http_frameworks="true" --authenticate_http_readonly="true" 
> --authenticate_http_readwrite="true" --authenticators="crammd5" 
> --authorizers="local" --credentials="/tmp/hH0YXe/credentials" 
> --filter_gpu_resources="true" --framework_sorter="drf" --help="false" 
> --hostname_lookup="true" --http_authenticators="basic" 
> --http_framework_authenticators="basic" --initialize_driver_logging="true" 
> --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
> --max_agent_ping_timeouts="5" --max_completed_frameworks="50" 
> --max_completed_tasks_per_framework="1000" 
> --max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false" 
> --recovery_agent_removal_limit="100%" --registry="in_memory" 
> --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" 
> --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" 
> --registry_store_timeout="100secs" --registry_strict="false" --roles="role1" 
> --root_submissions="true" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/hH0YXe/master" 
> --zk_session_timeout="10secs"
> I0912 05:40:27.338778 30867 master.cpp:494] Master only allowing 
> authenticated frameworks to register
> I0912 05:40:27.338788 30867 master.cpp:508] Master only allowing 
> authenticated agents to register
> I0912 05:40:27.338793 30867 master.cpp:521] Master only allowing 
> authenticated HTTP frameworks to register
> I0912 05:40:27.338799 30867 credentials.hpp:37] Loading credentials for 
> authentication from '/tmp/hH0YXe/credentials'
> I0912 05:40:27.353009 30867 master.cpp:566] Using default 'crammd5' 
> authenticator
> I0912 05:40:27.353183 30867 http.cpp:1026] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-readonly'
> I0912 05:40:27.353364 30867 http.cpp:1026] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-readwrite'
> I0912 05:40:27.353482 30867 http.cpp:1026] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-scheduler'
> I0912 05:40:27.353588 30867 master.cpp:646] Authorization enabled
> W0912 05:40:27.353605 30867 master.cpp:709] The '--roles' flag is deprecated. 
> This flag will be removed in the future. See the Mesos 0.27 upgrade notes for 
> more information
> I0912 05:40:27.353742 30868 hierarchical.cpp:171] Initialized hierarchical 
> allocator process
> I0912 05:40:27.353775 30872 whitelist_watcher.cpp:77] No whitelist given
> I0912 05:40:27.356655 30873 master.cpp:2163] Elected as the leading master!
> I0912 05:40:27.356675 30873 master.cpp:1702] Recovering from registrar
> I0912 05:40:27.356868 30874 registrar.cpp:347] Recovering registrar
> I0912 05:40:27.357390 30874 registrar.cpp:391] Successfully fetched the 
> registry (0B) in 494080ns
> I0912 05:40:27.357483 30874 registrar.cpp:495] Applied 1 operations in 
> 31911ns; attempting to update the registry
> I0912 05:40:27.357919 30874 registrar.cpp:552] Successfully updated the 
> registry in 391936ns
> I0912 05:40:27.358018 30874 registrar.cpp:424] Successfully recovered 
> registrar
> I0912 05:40:27.358413 30868 master.cpp:1801] Recovered 0 agents from the 
> registry (129B); allowing 10mins for agents to re-register
> I0912 05:40:27.358482 30867 hierarchical.cpp:209] Skipping recovery of 
> hierarchical allocator: 

[jira] [Commented] (MESOS-7971) PersistentVolumeEndpointsTest.EndpointCreateThenOfferRemove test if flaky

2018-02-21 Thread Armand Grillet (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16371372#comment-16371372
 ] 

Armand Grillet commented on MESOS-7971:
---

Looks like it's happening again: 
https://reviews.apache.org/r/64211/#review197842

> PersistentVolumeEndpointsTest.EndpointCreateThenOfferRemove test if flaky
> -
>
> Key: MESOS-7971
> URL: https://issues.apache.org/jira/browse/MESOS-7971
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.4.0
>Reporter: Vinod Kone
>Priority: Major
>  Labels: flaky-test, mesosphere
>
> Saw this when testing 1.4.0-rc5
> {code}
> [ RUN  ] PersistentVolumeEndpointsTest.EndpointCreateThenOfferRemove
> I0912 05:40:27.335222 30860 cluster.cpp:162] Creating default 'local' 
> authorizer
> I0912 05:40:27.338429 30867 master.cpp:442] Master 
> 2bd1e8eb-e314-4181-9ed3-d397ec1dbede (6aa774430302) started on 
> 172.17.0.3:54639
> I0912 05:40:27.338472 30867 master.cpp:444] Flags at startup: --acls="" 
> --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
> --allocation_interval="50ms" --allocator="HierarchicalDRF" 
> --authenticate_agents="true" --authenticate_frameworks="true" 
> --authenticate_http_frameworks="true" --authenticate_http_readonly="true" 
> --authenticate_http_readwrite="true" --authenticators="crammd5" 
> --authorizers="local" --credentials="/tmp/hH0YXe/credentials" 
> --filter_gpu_resources="true" --framework_sorter="drf" --help="false" 
> --hostname_lookup="true" --http_authenticators="basic" 
> --http_framework_authenticators="basic" --initialize_driver_logging="true" 
> --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
> --max_agent_ping_timeouts="5" --max_completed_frameworks="50" 
> --max_completed_tasks_per_framework="1000" 
> --max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false" 
> --recovery_agent_removal_limit="100%" --registry="in_memory" 
> --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" 
> --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" 
> --registry_store_timeout="100secs" --registry_strict="false" --roles="role1" 
> --root_submissions="true" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/hH0YXe/master" 
> --zk_session_timeout="10secs"
> I0912 05:40:27.338778 30867 master.cpp:494] Master only allowing 
> authenticated frameworks to register
> I0912 05:40:27.338788 30867 master.cpp:508] Master only allowing 
> authenticated agents to register
> I0912 05:40:27.338793 30867 master.cpp:521] Master only allowing 
> authenticated HTTP frameworks to register
> I0912 05:40:27.338799 30867 credentials.hpp:37] Loading credentials for 
> authentication from '/tmp/hH0YXe/credentials'
> I0912 05:40:27.353009 30867 master.cpp:566] Using default 'crammd5' 
> authenticator
> I0912 05:40:27.353183 30867 http.cpp:1026] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-readonly'
> I0912 05:40:27.353364 30867 http.cpp:1026] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-readwrite'
> I0912 05:40:27.353482 30867 http.cpp:1026] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-scheduler'
> I0912 05:40:27.353588 30867 master.cpp:646] Authorization enabled
> W0912 05:40:27.353605 30867 master.cpp:709] The '--roles' flag is deprecated. 
> This flag will be removed in the future. See the Mesos 0.27 upgrade notes for 
> more information
> I0912 05:40:27.353742 30868 hierarchical.cpp:171] Initialized hierarchical 
> allocator process
> I0912 05:40:27.353775 30872 whitelist_watcher.cpp:77] No whitelist given
> I0912 05:40:27.356655 30873 master.cpp:2163] Elected as the leading master!
> I0912 05:40:27.356675 30873 master.cpp:1702] Recovering from registrar
> I0912 05:40:27.356868 30874 registrar.cpp:347] Recovering registrar
> I0912 05:40:27.357390 30874 registrar.cpp:391] Successfully fetched the 
> registry (0B) in 494080ns
> I0912 05:40:27.357483 30874 registrar.cpp:495] Applied 1 operations in 
> 31911ns; attempting to update the registry
> I0912 05:40:27.357919 30874 registrar.cpp:552] Successfully updated the 
> registry in 391936ns
> I0912 05:40:27.358018 30874 registrar.cpp:424] Successfully recovered 
> registrar
> I0912 05:40:27.358413 30868 master.cpp:1801] Recovered 0 agents from the 
> registry (129B); allowing 10mins for agents to re-register
> I0912 05:40:27.358482 30867 hierarchical.cpp:209] Skipping recovery of 
> hierarchical allocator: nothing to recover
> W0912 05:40:27.364050 30860 process.cpp:3196] Attempted to spawn already 
> running process files@172.17.0.3:54639
> I0912 05:40:27.365372 30860 containerizer.cpp:246] Using isolation: 
>