[jira] [Commented] (MESOS-7971) PersistentVolumeEndpointsTest.EndpointCreateThenOfferRemove test is flaky
[ https://issues.apache.org/jira/browse/MESOS-7971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16739635#comment-16739635 ] Andrei Budnik commented on MESOS-7971: -- This is something different from previous ones. {code:java} E0110 17:13:09.326659 13916 master.cpp:8586] Failed to find the operation '' (uuid: 825f65eb-3ba1-4dfa-bdfa-8eb29194ace3) for an operator API call on agent ae22a9c8-0ef6-4f1e-b1eb-7b55f6e4508b-S0 {code} Full log: {code:java} [ RUN ] PersistentVolumeEndpointsTest.EndpointCreateThenOfferRemove I0110 17:12:59.303460 13893 cluster.cpp:174] Creating default 'local' authorizer I0110 17:12:59.304430 13912 master.cpp:416] Master ae22a9c8-0ef6-4f1e-b1eb-7b55f6e4508b (ip-172-16-10-92.ec2.internal) started on 172.16.10.92:42320 I0110 17:12:59.304451 13912 master.cpp:419] Flags at startup: --acls="" --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" --allocation_interval="1000secs" --allocator="hierarchical" --authenticate_agents="true" --authenticate_frameworks="true" --authenticate_http_frameworks="true" --authenticate_http_readonly="true" --authenticate_http_readwrite="true" --authentication_v0_timeout="15secs" --authenticators="crammd5" --authorizers="local" --credentials="/tmp/PfFTwT/credentials" --filter_gpu_resources="true" --framework_sorter="drf" --help="false" --hostname_lookup="true" --http_authenticators="basic" --http_framework_authenticators="basic" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" --max_operator_event_stream_subscribers="1000" --max_unreachable_tasks_per_framework="1000" --memory_profiling="false" --min_allocatable_resources="cpus:0.01|mem:32" --port="5050" --publish_per_framework_metrics="true" --quiet="false" --recovery_agent_removal_limit="100%" --registry="in_memory" --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" --registry_store_timeout="100secs" --registry_strict="false" --require_agent_domain="false" --role_sorter="drf" --roles="role1" --root_submissions="true" --version="false" --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/PfFTwT/master" --zk_session_timeout="10secs" I0110 17:12:59.304585 13912 master.cpp:468] Master only allowing authenticated frameworks to register I0110 17:12:59.304595 13912 master.cpp:474] Master only allowing authenticated agents to register I0110 17:12:59.304603 13912 master.cpp:480] Master only allowing authenticated HTTP frameworks to register I0110 17:12:59.304615 13912 credentials.hpp:37] Loading credentials for authentication from '/tmp/PfFTwT/credentials' I0110 17:12:59.304684 13912 master.cpp:524] Using default 'crammd5' authenticator I0110 17:12:59.304744 13912 http.cpp:965] Creating default 'basic' HTTP authenticator for realm 'mesos-master-readonly' I0110 17:12:59.304831 13912 http.cpp:965] Creating default 'basic' HTTP authenticator for realm 'mesos-master-readwrite' I0110 17:12:59.304889 13912 http.cpp:965] Creating default 'basic' HTTP authenticator for realm 'mesos-master-scheduler' I0110 17:12:59.304941 13912 master.cpp:605] Authorization enabled W0110 17:12:59.304967 13912 master.cpp:668] The '--roles' flag is deprecated. This flag will be removed in the future. See the Mesos 0.27 upgrade notes for more information I0110 17:12:59.305047 13919 hierarchical.cpp:176] Initialized hierarchical allocator process I0110 17:12:59.305128 13918 whitelist_watcher.cpp:77] No whitelist given I0110 17:12:59.305600 13914 master.cpp:2085] Elected as the leading master! I0110 17:12:59.305622 13914 master.cpp:1640] Recovering from registrar I0110 17:12:59.305698 13913 registrar.cpp:339] Recovering registrar I0110 17:12:59.305853 13912 registrar.cpp:383] Successfully fetched the registry (0B) in 118016ns I0110 17:12:59.305899 13912 registrar.cpp:487] Applied 1 operations in 8238ns; attempting to update the registry I0110 17:12:59.306036 13912 registrar.cpp:544] Successfully updated the registry in 112128ns I0110 17:12:59.306092 13912 registrar.cpp:416] Successfully recovered registrar I0110 17:12:59.306217 13916 master.cpp:1754] Recovered 0 agents from the registry (172B); allowing 10mins for agents to reregister I0110 17:12:59.306258 13919 hierarchical.cpp:216] Skipping recovery of hierarchical allocator: nothing to recover W0110 17:12:59.307780 13893 process.cpp:2829] Attempted to spawn already running process files@172.16.10.92:42320 I0110 17:12:59.308149 13893 containerizer.cpp:305] Using isolation { environment_secret, posix/cpu, posix/mem, filesystem/posix, network/cni } I0110 17:12:59.310348 13893 linux_launcher.cpp:144] Using /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher I0110 17:12:59.310752 13893
[jira] [Commented] (MESOS-7971) PersistentVolumeEndpointsTest.EndpointCreateThenOfferRemove test is flaky
[ https://issues.apache.org/jira/browse/MESOS-7971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16712278#comment-16712278 ] Chun-Hung Hsiao commented on MESOS-7971: For resource provider operations, we use {{resource_version_uuid}} to resolve this. It seems to me that we should to the same in {{Slave::applyOperation}} as well: Check if {{ApplyOperationMessage.resource_version_uuid}} equals to {{resourceVersion}}, and only apply the speculative operation if the version matches. However, we only have `resource_version_uuid` since 1.5 (with the {{RESOURCE_PROVIDER}} agent capability), we could not use the same strategy to fix this in 1.4 if we want to. > PersistentVolumeEndpointsTest.EndpointCreateThenOfferRemove test is flaky > - > > Key: MESOS-7971 > URL: https://issues.apache.org/jira/browse/MESOS-7971 > Project: Mesos > Issue Type: Bug > Components: allocation >Affects Versions: 1.4.0, 1.6.0, 1.7.0, 1.8.0 >Reporter: Vinod Kone >Assignee: Meng Zhu >Priority: Critical > Labels: flaky-test, mesosphere > Attachments: ApacheJenkinsConsoleText_autotools_gcc_ubuntu16.txt > > > Saw this when testing 1.4.0-rc5 > {code} > [ RUN ] PersistentVolumeEndpointsTest.EndpointCreateThenOfferRemove > I0912 05:40:27.335222 30860 cluster.cpp:162] Creating default 'local' > authorizer > I0912 05:40:27.338429 30867 master.cpp:442] Master > 2bd1e8eb-e314-4181-9ed3-d397ec1dbede (6aa774430302) started on > 172.17.0.3:54639 > I0912 05:40:27.338472 30867 master.cpp:444] Flags at startup: --acls="" > --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" > --allocation_interval="50ms" --allocator="HierarchicalDRF" > --authenticate_agents="true" --authenticate_frameworks="true" > --authenticate_http_frameworks="true" --authenticate_http_readonly="true" > --authenticate_http_readwrite="true" --authenticators="crammd5" > --authorizers="local" --credentials="/tmp/hH0YXe/credentials" > --filter_gpu_resources="true" --framework_sorter="drf" --help="false" > --hostname_lookup="true" --http_authenticators="basic" > --http_framework_authenticators="basic" --initialize_driver_logging="true" > --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" > --max_agent_ping_timeouts="5" --max_completed_frameworks="50" > --max_completed_tasks_per_framework="1000" > --max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false" > --recovery_agent_removal_limit="100%" --registry="in_memory" > --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" > --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" > --registry_store_timeout="100secs" --registry_strict="false" --roles="role1" > --root_submissions="true" --user_sorter="drf" --version="false" > --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/hH0YXe/master" > --zk_session_timeout="10secs" > I0912 05:40:27.338778 30867 master.cpp:494] Master only allowing > authenticated frameworks to register > I0912 05:40:27.338788 30867 master.cpp:508] Master only allowing > authenticated agents to register > I0912 05:40:27.338793 30867 master.cpp:521] Master only allowing > authenticated HTTP frameworks to register > I0912 05:40:27.338799 30867 credentials.hpp:37] Loading credentials for > authentication from '/tmp/hH0YXe/credentials' > I0912 05:40:27.353009 30867 master.cpp:566] Using default 'crammd5' > authenticator > I0912 05:40:27.353183 30867 http.cpp:1026] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-readonly' > I0912 05:40:27.353364 30867 http.cpp:1026] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-readwrite' > I0912 05:40:27.353482 30867 http.cpp:1026] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-scheduler' > I0912 05:40:27.353588 30867 master.cpp:646] Authorization enabled > W0912 05:40:27.353605 30867 master.cpp:709] The '--roles' flag is deprecated. > This flag will be removed in the future. See the Mesos 0.27 upgrade notes for > more information > I0912 05:40:27.353742 30868 hierarchical.cpp:171] Initialized hierarchical > allocator process > I0912 05:40:27.353775 30872 whitelist_watcher.cpp:77] No whitelist given > I0912 05:40:27.356655 30873 master.cpp:2163] Elected as the leading master! > I0912 05:40:27.356675 30873 master.cpp:1702] Recovering from registrar > I0912 05:40:27.356868 30874 registrar.cpp:347] Recovering registrar > I0912 05:40:27.357390 30874 registrar.cpp:391] Successfully fetched the > registry (0B) in 494080ns > I0912 05:40:27.357483 30874 registrar.cpp:495] Applied 1 operations in > 31911ns; attempting to update the registry > I0912 05:40:27.357919 30874 registrar.cpp:552] Successfully updated the > registry in 391936ns >
[jira] [Commented] (MESOS-7971) PersistentVolumeEndpointsTest.EndpointCreateThenOfferRemove test is flaky
[ https://issues.apache.org/jira/browse/MESOS-7971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16712200#comment-16712200 ] Meng Zhu commented on MESOS-7971: - This looks like a legitimate bug. Here is the sequence of events that can trigger the bug - agent (re)registers with the master - operation calls are made to the master (let’s say create volume) - the allocator is speculatively updated in https://github.com/apache/mesos/blob/master/src/master/master.cpp#L11315 - before agent resource gets updated, it sends `UpdateSlaveMessage` when getting the (re)registered message in https://github.com/apache/mesos/blob/master/src/slave/slave.cpp#L1551 and https://github.com/apache/mesos/blob/master/src/slave/slave.cpp#L1633 - the `UpdateSlaveMessage` triggers allocator to update the total resources again https://github.com/apache/mesos/blob/master/src/master/master.cpp#L8205, resource update from the previous operation is overwritten and LOST - agent finishes the operation and informs the master through `UpdateOperationStatusMessage` - but for the speculative operation, we do not update the allocator https://github.com/apache/mesos/blob/master/src/master/master.cpp#L11177 Thus, the speculative operation failed to be applied on the allocator but successfully applied to the agent. > PersistentVolumeEndpointsTest.EndpointCreateThenOfferRemove test is flaky > - > > Key: MESOS-7971 > URL: https://issues.apache.org/jira/browse/MESOS-7971 > Project: Mesos > Issue Type: Bug > Components: allocation >Affects Versions: 1.4.0, 1.6.0, 1.7.0, 1.8.0 >Reporter: Vinod Kone >Assignee: Meng Zhu >Priority: Critical > Labels: flaky-test, mesosphere > Attachments: ApacheJenkinsConsoleText_autotools_gcc_ubuntu16.txt > > > Saw this when testing 1.4.0-rc5 > {code} > [ RUN ] PersistentVolumeEndpointsTest.EndpointCreateThenOfferRemove > I0912 05:40:27.335222 30860 cluster.cpp:162] Creating default 'local' > authorizer > I0912 05:40:27.338429 30867 master.cpp:442] Master > 2bd1e8eb-e314-4181-9ed3-d397ec1dbede (6aa774430302) started on > 172.17.0.3:54639 > I0912 05:40:27.338472 30867 master.cpp:444] Flags at startup: --acls="" > --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" > --allocation_interval="50ms" --allocator="HierarchicalDRF" > --authenticate_agents="true" --authenticate_frameworks="true" > --authenticate_http_frameworks="true" --authenticate_http_readonly="true" > --authenticate_http_readwrite="true" --authenticators="crammd5" > --authorizers="local" --credentials="/tmp/hH0YXe/credentials" > --filter_gpu_resources="true" --framework_sorter="drf" --help="false" > --hostname_lookup="true" --http_authenticators="basic" > --http_framework_authenticators="basic" --initialize_driver_logging="true" > --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" > --max_agent_ping_timeouts="5" --max_completed_frameworks="50" > --max_completed_tasks_per_framework="1000" > --max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false" > --recovery_agent_removal_limit="100%" --registry="in_memory" > --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" > --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" > --registry_store_timeout="100secs" --registry_strict="false" --roles="role1" > --root_submissions="true" --user_sorter="drf" --version="false" > --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/hH0YXe/master" > --zk_session_timeout="10secs" > I0912 05:40:27.338778 30867 master.cpp:494] Master only allowing > authenticated frameworks to register > I0912 05:40:27.338788 30867 master.cpp:508] Master only allowing > authenticated agents to register > I0912 05:40:27.338793 30867 master.cpp:521] Master only allowing > authenticated HTTP frameworks to register > I0912 05:40:27.338799 30867 credentials.hpp:37] Loading credentials for > authentication from '/tmp/hH0YXe/credentials' > I0912 05:40:27.353009 30867 master.cpp:566] Using default 'crammd5' > authenticator > I0912 05:40:27.353183 30867 http.cpp:1026] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-readonly' > I0912 05:40:27.353364 30867 http.cpp:1026] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-readwrite' > I0912 05:40:27.353482 30867 http.cpp:1026] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-scheduler' > I0912 05:40:27.353588 30867 master.cpp:646] Authorization enabled > W0912 05:40:27.353605 30867 master.cpp:709] The '--roles' flag is deprecated. > This flag will be removed in the future. See the Mesos 0.27 upgrade notes for > more information > I0912 05:40:27.353742 30868 hierarchical.cpp:171] Initialized hierarchical >
[jira] [Commented] (MESOS-7971) PersistentVolumeEndpointsTest.EndpointCreateThenOfferRemove test is flaky
[ https://issues.apache.org/jira/browse/MESOS-7971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16712148#comment-16712148 ] Meng Zhu commented on MESOS-7971: - This test is flaky because due to a race. The test expects the offer to be sent out after the reserve and create operations have finished. But it only waits for the 202 returned by both calls. When the offer is sent while the agent is processing the create operation, the offer does not contain the expected volume resource. Failing the test. Adding manual clock control and properly settle should fix the test. > PersistentVolumeEndpointsTest.EndpointCreateThenOfferRemove test is flaky > - > > Key: MESOS-7971 > URL: https://issues.apache.org/jira/browse/MESOS-7971 > Project: Mesos > Issue Type: Bug > Components: allocation >Affects Versions: 1.4.0, 1.6.0, 1.7.0, 1.8.0 >Reporter: Vinod Kone >Assignee: Meng Zhu >Priority: Critical > Labels: flaky-test, mesosphere > Attachments: ApacheJenkinsConsoleText_autotools_gcc_ubuntu16.txt > > > Saw this when testing 1.4.0-rc5 > {code} > [ RUN ] PersistentVolumeEndpointsTest.EndpointCreateThenOfferRemove > I0912 05:40:27.335222 30860 cluster.cpp:162] Creating default 'local' > authorizer > I0912 05:40:27.338429 30867 master.cpp:442] Master > 2bd1e8eb-e314-4181-9ed3-d397ec1dbede (6aa774430302) started on > 172.17.0.3:54639 > I0912 05:40:27.338472 30867 master.cpp:444] Flags at startup: --acls="" > --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" > --allocation_interval="50ms" --allocator="HierarchicalDRF" > --authenticate_agents="true" --authenticate_frameworks="true" > --authenticate_http_frameworks="true" --authenticate_http_readonly="true" > --authenticate_http_readwrite="true" --authenticators="crammd5" > --authorizers="local" --credentials="/tmp/hH0YXe/credentials" > --filter_gpu_resources="true" --framework_sorter="drf" --help="false" > --hostname_lookup="true" --http_authenticators="basic" > --http_framework_authenticators="basic" --initialize_driver_logging="true" > --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" > --max_agent_ping_timeouts="5" --max_completed_frameworks="50" > --max_completed_tasks_per_framework="1000" > --max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false" > --recovery_agent_removal_limit="100%" --registry="in_memory" > --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" > --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" > --registry_store_timeout="100secs" --registry_strict="false" --roles="role1" > --root_submissions="true" --user_sorter="drf" --version="false" > --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/hH0YXe/master" > --zk_session_timeout="10secs" > I0912 05:40:27.338778 30867 master.cpp:494] Master only allowing > authenticated frameworks to register > I0912 05:40:27.338788 30867 master.cpp:508] Master only allowing > authenticated agents to register > I0912 05:40:27.338793 30867 master.cpp:521] Master only allowing > authenticated HTTP frameworks to register > I0912 05:40:27.338799 30867 credentials.hpp:37] Loading credentials for > authentication from '/tmp/hH0YXe/credentials' > I0912 05:40:27.353009 30867 master.cpp:566] Using default 'crammd5' > authenticator > I0912 05:40:27.353183 30867 http.cpp:1026] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-readonly' > I0912 05:40:27.353364 30867 http.cpp:1026] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-readwrite' > I0912 05:40:27.353482 30867 http.cpp:1026] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-scheduler' > I0912 05:40:27.353588 30867 master.cpp:646] Authorization enabled > W0912 05:40:27.353605 30867 master.cpp:709] The '--roles' flag is deprecated. > This flag will be removed in the future. See the Mesos 0.27 upgrade notes for > more information > I0912 05:40:27.353742 30868 hierarchical.cpp:171] Initialized hierarchical > allocator process > I0912 05:40:27.353775 30872 whitelist_watcher.cpp:77] No whitelist given > I0912 05:40:27.356655 30873 master.cpp:2163] Elected as the leading master! > I0912 05:40:27.356675 30873 master.cpp:1702] Recovering from registrar > I0912 05:40:27.356868 30874 registrar.cpp:347] Recovering registrar > I0912 05:40:27.357390 30874 registrar.cpp:391] Successfully fetched the > registry (0B) in 494080ns > I0912 05:40:27.357483 30874 registrar.cpp:495] Applied 1 operations in > 31911ns; attempting to update the registry > I0912 05:40:27.357919 30874 registrar.cpp:552] Successfully updated the > registry in 391936ns > I0912 05:40:27.358018 30874 registrar.cpp:424] Successfully recovered > registrar > I0912
[jira] [Commented] (MESOS-7971) PersistentVolumeEndpointsTest.EndpointCreateThenOfferRemove test is flaky
[ https://issues.apache.org/jira/browse/MESOS-7971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16707765#comment-16707765 ] Vinod Kone commented on MESOS-7971: --- Saw this again. {code} *06:14:51* [ RUN ] PersistentVolumeEndpointsTest.EndpointCreateThenOfferRemove*06:14:51* I1203 06:14:50.630549 19784 cluster.cpp:173] Creating default 'local' authorizer*06:14:51* I1203 06:14:50.633529 19796 master.cpp:413] Master f1ffe054-ad44-45d4-9f39-84b048e1a359 (c16130e94783) started on 172.17.0.3:44340*06:14:51* I1203 06:14:50.633581 19796 master.cpp:416] Flags at startup: --acls="" --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" --allocation_interval="1000secs" --allocator="hierarchical" --authenticate_agents="true" --authenticate_frameworks="true" --authenticate_http_frameworks="true" --authenticate_http_readonly="true" --authenticate_http_readwrite="true" --authentication_v0_timeout="15secs" --authenticators="crammd5" --authorizers="local" --credentials="/tmp/4vMyjy/credentials" --filter_gpu_resources="true" --framework_sorter="drf" --help="false" --hostname_lookup="true" --http_authenticators="basic" --http_framework_authenticators="basic" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" --max_unreachable_tasks_per_framework="1000" --memory_profiling="false" --min_allocatable_resources="cpus:0.01|mem:32" --port="5050" --publish_per_framework_metrics="true" --quiet="false" --recovery_agent_removal_limit="100%" --registry="in_memory" --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" --registry_store_timeout="100secs" --registry_strict="false" --require_agent_domain="false" --role_sorter="drf" --roles="role1" --root_submissions="true" --version="false" --webui_dir="/tmp/SRC/build/mesos-1.8.0/_inst/share/mesos/webui" --work_dir="/tmp/4vMyjy/master" --zk_session_timeout="10secs"*06:14:51* I1203 06:14:50.634217 19796 master.cpp:465] Master only allowing authenticated frameworks to register*06:14:51* I1203 06:14:50.634236 19796 master.cpp:471] Master only allowing authenticated agents to register*06:14:51* I1203 06:14:50.634253 19796 master.cpp:477] Master only allowing authenticated HTTP frameworks to register*06:14:51* I1203 06:14:50.634270 19796 credentials.hpp:37] Loading credentials for authentication from '/tmp/4vMyjy/credentials'*06:14:51* I1203 06:14:50.634608 19796 master.cpp:521] Using default 'crammd5' authenticator*06:14:51* I1203 06:14:50.634840 19796 http.cpp:1042] Creating default 'basic' HTTP authenticator for realm 'mesos-master-readonly'*06:14:51* I1203 06:14:50.635052 19796 http.cpp:1042] Creating default 'basic' HTTP authenticator for realm 'mesos-master-readwrite'*06:14:51* I1203 06:14:50.635200 19796 http.cpp:1042] Creating default 'basic' HTTP authenticator for realm 'mesos-master-scheduler'*06:14:51* I1203 06:14:50.635373 19796 master.cpp:602] Authorization enabled*06:14:51* W1203 06:14:50.635457 19796 master.cpp:665] The '--roles' flag is deprecated. This flag will be removed in the future. See the Mesos 0.27 upgrade notes for more information*06:14:51* I1203 06:14:50.635991 19800 whitelist_watcher.cpp:77] No whitelist given*06:14:51* I1203 06:14:50.636032 19793 hierarchical.cpp:175] Initialized hierarchical allocator process*06:14:51* I1203 06:14:50.638939 19796 master.cpp:2105] Elected as the leading master!*06:14:51* I1203 06:14:50.638975 19796 master.cpp:1660] Recovering from registrar*06:14:51* I1203 06:14:50.639200 19792 registrar.cpp:339] Recovering registrar*06:14:51* I1203 06:14:50.639927 19792 registrar.cpp:383] Successfully fetched the registry (0B) in 672768ns*06:14:51* I1203 06:14:50.640069 19792 registrar.cpp:487] Applied 1 operations in 48006ns; attempting to update the registry*06:14:51* I1203 06:14:50.640718 19792 registrar.cpp:544] Successfully updated the registry in 582912ns*06:14:51* I1203 06:14:50.640852 19792 registrar.cpp:416] Successfully recovered registrar*06:14:51* I1203 06:14:50.641299 19800 master.cpp:1774] Recovered 0 agents from the registry (135B); allowing 10mins for agents to reregister*06:14:51* I1203 06:14:50.641340 19799 hierarchical.cpp:215] Skipping recovery of hierarchical allocator: nothing to recover*06:14:51* W1203 06:14:50.647153 19784 process.cpp:2829] Attempted to spawn already running process files@172.17.0.3:44340*06:14:51* I1203 06:14:50.648453 19784 containerizer.cpp:305] Using isolation \{ environment_secret, posix/cpu, posix/mem, filesystem/posix, network/cni }*06:14:51* W1203 06:14:50.649060 19784 backend.cpp:76] Failed to create 'aufs' backend: AufsBackend requires root privileges*06:14:51* W1203 06:14:50.649088 19784 backend.cpp:76] Failed to
[jira] [Commented] (MESOS-7971) PersistentVolumeEndpointsTest.EndpointCreateThenOfferRemove test is flaky
[ https://issues.apache.org/jira/browse/MESOS-7971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16677306#comment-16677306 ] Joseph Wu commented on MESOS-7971: -- Slightly different logs observed on an internal CI run (Ubuntu 16, no SSL). One HTTP response in this run expects a 202, but gets a 409 instead. {code} [ RUN ] PersistentVolumeEndpointsTest.EndpointCreateThenOfferRemove I1106 19:50:14.650254 19563 cluster.cpp:162] Creating default 'local' authorizer I1106 19:50:14.651284 19588 master.cpp:442] Master d5905469-73fc-4219-b939-c6056f1f62a1 (ip-172-16-10-48.ec2.internal) started on 172.16.10.48:39946 I1106 19:50:14.651309 19588 master.cpp:444] Flags at startup: --acls="" --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" --allocation_interval="50ms" --allocator="hierarchical" --authenticate_agents="true" --authenticate_frameworks="true" --authenticate_http_frameworks="true" --authenticate_http_readonly="true" --authenticate_http_readwrite="true" --authentication_v0_timeout="15secs" --authenticators="crammd5" --authorizers="local" --credentials="/tmp/fZatVl/credentials" --filter_gpu_resources="true" --framework_sorter="drf" --help="false" --hostname_lookup="true" --http_authenticators="basic" --http_framework_authenticators="basic" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" --max_unreachable_tasks_per_framework="1000" --min_allocatable_resources="cpus:0.01|mem:32" --port="5050" --quiet="false" --recovery_agent_removal_limit="100%" --registry="in_memory" --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" --registry_store_timeout="100secs" --registry_strict="false" --role_sorter="drf" --roles="role1" --root_submissions="true" --version="false" --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/fZatVl/master" --zk_session_timeout="10secs" I1106 19:50:14.651437 19588 master.cpp:494] Master only allowing authenticated frameworks to register I1106 19:50:14.651448 19588 master.cpp:508] Master only allowing authenticated agents to register I1106 19:50:14.651453 19588 master.cpp:521] Master only allowing authenticated HTTP frameworks to register I1106 19:50:14.651459 19588 credentials.hpp:37] Loading credentials for authentication from '/tmp/fZatVl/credentials' I1106 19:50:14.651548 19588 master.cpp:566] Using default 'crammd5' authenticator I1106 19:50:14.651593 19588 http.cpp:1045] Creating default 'basic' HTTP authenticator for realm 'mesos-master-readonly' I1106 19:50:14.651643 19588 http.cpp:1045] Creating default 'basic' HTTP authenticator for realm 'mesos-master-readwrite' I1106 19:50:14.651672 19588 http.cpp:1045] Creating default 'basic' HTTP authenticator for realm 'mesos-master-scheduler' I1106 19:50:14.651700 19588 master.cpp:646] Authorization enabled W1106 19:50:14.651710 19588 master.cpp:709] The '--roles' flag is deprecated. This flag will be removed in the future. See the Mesos 0.27 upgrade notes for more information I1106 19:50:14.651803 19590 hierarchical.cpp:173] Initialized hierarchical allocator process I1106 19:50:14.651830 19590 whitelist_watcher.cpp:77] No whitelist given I1106 19:50:14.652432 19590 master.cpp:2200] Elected as the leading master! I1106 19:50:14.652454 19590 master.cpp:1739] Recovering from registrar I1106 19:50:14.652506 19590 registrar.cpp:347] Recovering registrar I1106 19:50:14.652595 19590 registrar.cpp:391] Successfully fetched the registry (0B) in 72960ns I1106 19:50:14.652622 19590 registrar.cpp:495] Applied 1 operations in 5332ns; attempting to update the registry I1106 19:50:14.656131 19586 registrar.cpp:552] Successfully updated the registry in 3.472128ms I1106 19:50:14.656177 19586 registrar.cpp:424] Successfully recovered registrar I1106 19:50:14.656266 19588 master.cpp:1838] Recovered 0 agents from the registry (168B); allowing 10mins for agents to re-register I1106 19:50:14.656299 19588 hierarchical.cpp:211] Skipping recovery of hierarchical allocator: nothing to recover W1106 19:50:14.657806 19563 process.cpp:3196] Attempted to spawn already running process files@172.16.10.48:39946 I1106 19:50:14.658203 19563 containerizer.cpp:246] Using isolation: posix/cpu,posix/mem,filesystem/posix,network/cni,environment_secret I1106 19:50:14.661717 19563 linux_launcher.cpp:149] Using /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher I1106 19:50:14.662039 19563 provisioner.cpp:255] Using default backend 'overlay' I1106 19:50:14.662547 19563 cluster.cpp:448] Creating default 'local' authorizer I1106 19:50:14.662969 19589 slave.cpp:249] Mesos agent started on (378)@172.16.10.48:39946 I1106 19:50:14.662987 19589 slave.cpp:250] Flags at startup: --acls=""
[jira] [Commented] (MESOS-7971) PersistentVolumeEndpointsTest.EndpointCreateThenOfferRemove test if flaky
[ https://issues.apache.org/jira/browse/MESOS-7971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16580076#comment-16580076 ] Vinod Kone commented on MESOS-7971: --- Saw this again in ASF CI {code} 01:59:40 [ RUN ] PersistentVolumeEndpointsTest.EndpointCreateThenOfferRemove 01:59:40 I0814 01:59:40.466804 25171 cluster.cpp:173] Creating default 'local' authorizer 01:59:40 I0814 01:59:40.470376 25184 master.cpp:413] Master d922690d-0ba0-451a-b563-88c300e70670 (f419ee17fd4a) started on 172.17.0.4:57729 01:59:40 I0814 01:59:40.470427 25184 master.cpp:416] Flags at startup: --acls="" --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" --allocation_interval="1000secs" --allocator="hierarchical" --authenticate_agents="true" --authenticate_frameworks="true" --authenticate_http_frameworks="true" --authenticate_http_readonly="true" --authenticate_http_readwrite="true" --authenticators="crammd5" --authorizers="local" --credentials="/tmp/e6gxWw/credentials" --filter_gpu_resources="true" --framework_sorter="drf" --help="false" --hostname_lookup="true" --http_authenticators="basic" --http_framework_authenticators="basic" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" --max_unreachable_tasks_per_framework="1000" --memory_profiling="false" --min_allocatable_resources="cpus:0.01|mem:32" --port="5050" --quiet="false" --recovery_agent_removal_limit="100%" --registry="in_memory" --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" --registry_store_timeout="100secs" --registry_strict="false" --require_agent_domain="false" --role_sorter="drf" --roles="role1" --root_submissions="true" --version="false" --webui_dir="/tmp/SRC/build/mesos-1.7.0/_inst/share/mesos/webui" --work_dir="/tmp/e6gxWw/master" --zk_session_timeout="10secs" 01:59:40 I0814 01:59:40.470870 25184 master.cpp:465] Master only allowing authenticated frameworks to register 01:59:40 I0814 01:59:40.470890 25184 master.cpp:471] Master only allowing authenticated agents to register 01:59:40 I0814 01:59:40.470901 25184 master.cpp:477] Master only allowing authenticated HTTP frameworks to register 01:59:40 I0814 01:59:40.470916 25184 credentials.hpp:37] Loading credentials for authentication from '/tmp/e6gxWw/credentials' 01:59:40 I0814 01:59:40.471252 25184 master.cpp:521] Using default 'crammd5' authenticator 01:59:40 I0814 01:59:40.471448 25184 http.cpp:977] Creating default 'basic' HTTP authenticator for realm 'mesos-master-readonly' 01:59:40 I0814 01:59:40.471707 25184 http.cpp:977] Creating default 'basic' HTTP authenticator for realm 'mesos-master-readwrite' 01:59:40 I0814 01:59:40.472013 25184 http.cpp:977] Creating default 'basic' HTTP authenticator for realm 'mesos-master-scheduler' 01:59:40 I0814 01:59:40.472188 25184 master.cpp:602] Authorization enabled 01:59:40 W0814 01:59:40.472220 25184 master.cpp:665] The '--roles' flag is deprecated. This flag will be removed in the future. See the Mesos 0.27 upgrade notes for more information 01:59:40 I0814 01:59:40.472955 25185 hierarchical.cpp:182] Initialized hierarchical allocator process 01:59:40 I0814 01:59:40.477638 25178 whitelist_watcher.cpp:77] No whitelist given 01:59:40 I0814 01:59:40.481549 25172 master.cpp:2083] Elected as the leading master! 01:59:40 I0814 01:59:40.481595 25172 master.cpp:1638] Recovering from registrar 01:59:40 I0814 01:59:40.481855 25187 registrar.cpp:339] Recovering registrar 01:59:40 I0814 01:59:40.482591 25187 registrar.cpp:383] Successfully fetched the registry (0B) in 680704ns 01:59:40 I0814 01:59:40.482749 25187 registrar.cpp:487] Applied 1 operations in 58202ns; attempting to update the registry 01:59:40 I0814 01:59:40.483386 25187 registrar.cpp:544] Successfully updated the registry in 563968ns 01:59:40 I0814 01:59:40.483546 25187 registrar.cpp:416] Successfully recovered registrar 01:59:40 I0814 01:59:40.484119 25178 master.cpp:1752] Recovered 0 agents from the registry (135B); allowing 10mins for agents to reregister 01:59:40 I0814 01:59:40.484197 25172 hierarchical.cpp:220] Skipping recovery of hierarchical allocator: nothing to recover 01:59:40 W0814 01:59:40.490447 25171 process.cpp:2810] Attempted to spawn already running process files@172.17.0.4:57729 01:59:40 I0814 01:59:40.491410 25171 containerizer.cpp:300] Using isolation { environment_secret, posix/cpu, posix/mem, filesystem/posix, network/cni } 01:59:40 W0814 01:59:40.491938 25171 backend.cpp:76] Failed to create 'aufs' backend: AufsBackend requires root privileges 01:59:40 W0814 01:59:40.491963 25171 backend.cpp:76] Failed to create 'bind' backend: BindBackend requires root privileges 01:59:40 I0814 01:59:40.491991 25171
[jira] [Commented] (MESOS-7971) PersistentVolumeEndpointsTest.EndpointCreateThenOfferRemove test if flaky
[ https://issues.apache.org/jira/browse/MESOS-7971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16510159#comment-16510159 ] Chun-Hung Hsiao commented on MESOS-7971: Observed another failure on Apache CI: [^ApacheJenkinsConsoleText_autotools_gcc_ubuntu16.txt] > PersistentVolumeEndpointsTest.EndpointCreateThenOfferRemove test if flaky > - > > Key: MESOS-7971 > URL: https://issues.apache.org/jira/browse/MESOS-7971 > Project: Mesos > Issue Type: Bug >Affects Versions: 1.4.0, 1.6.0 >Reporter: Vinod Kone >Priority: Major > Labels: flaky-test, mesosphere > Attachments: ApacheJenkinsConsoleText_autotools_gcc_ubuntu16.txt > > > Saw this when testing 1.4.0-rc5 > {code} > [ RUN ] PersistentVolumeEndpointsTest.EndpointCreateThenOfferRemove > I0912 05:40:27.335222 30860 cluster.cpp:162] Creating default 'local' > authorizer > I0912 05:40:27.338429 30867 master.cpp:442] Master > 2bd1e8eb-e314-4181-9ed3-d397ec1dbede (6aa774430302) started on > 172.17.0.3:54639 > I0912 05:40:27.338472 30867 master.cpp:444] Flags at startup: --acls="" > --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" > --allocation_interval="50ms" --allocator="HierarchicalDRF" > --authenticate_agents="true" --authenticate_frameworks="true" > --authenticate_http_frameworks="true" --authenticate_http_readonly="true" > --authenticate_http_readwrite="true" --authenticators="crammd5" > --authorizers="local" --credentials="/tmp/hH0YXe/credentials" > --filter_gpu_resources="true" --framework_sorter="drf" --help="false" > --hostname_lookup="true" --http_authenticators="basic" > --http_framework_authenticators="basic" --initialize_driver_logging="true" > --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" > --max_agent_ping_timeouts="5" --max_completed_frameworks="50" > --max_completed_tasks_per_framework="1000" > --max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false" > --recovery_agent_removal_limit="100%" --registry="in_memory" > --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" > --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" > --registry_store_timeout="100secs" --registry_strict="false" --roles="role1" > --root_submissions="true" --user_sorter="drf" --version="false" > --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/hH0YXe/master" > --zk_session_timeout="10secs" > I0912 05:40:27.338778 30867 master.cpp:494] Master only allowing > authenticated frameworks to register > I0912 05:40:27.338788 30867 master.cpp:508] Master only allowing > authenticated agents to register > I0912 05:40:27.338793 30867 master.cpp:521] Master only allowing > authenticated HTTP frameworks to register > I0912 05:40:27.338799 30867 credentials.hpp:37] Loading credentials for > authentication from '/tmp/hH0YXe/credentials' > I0912 05:40:27.353009 30867 master.cpp:566] Using default 'crammd5' > authenticator > I0912 05:40:27.353183 30867 http.cpp:1026] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-readonly' > I0912 05:40:27.353364 30867 http.cpp:1026] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-readwrite' > I0912 05:40:27.353482 30867 http.cpp:1026] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-scheduler' > I0912 05:40:27.353588 30867 master.cpp:646] Authorization enabled > W0912 05:40:27.353605 30867 master.cpp:709] The '--roles' flag is deprecated. > This flag will be removed in the future. See the Mesos 0.27 upgrade notes for > more information > I0912 05:40:27.353742 30868 hierarchical.cpp:171] Initialized hierarchical > allocator process > I0912 05:40:27.353775 30872 whitelist_watcher.cpp:77] No whitelist given > I0912 05:40:27.356655 30873 master.cpp:2163] Elected as the leading master! > I0912 05:40:27.356675 30873 master.cpp:1702] Recovering from registrar > I0912 05:40:27.356868 30874 registrar.cpp:347] Recovering registrar > I0912 05:40:27.357390 30874 registrar.cpp:391] Successfully fetched the > registry (0B) in 494080ns > I0912 05:40:27.357483 30874 registrar.cpp:495] Applied 1 operations in > 31911ns; attempting to update the registry > I0912 05:40:27.357919 30874 registrar.cpp:552] Successfully updated the > registry in 391936ns > I0912 05:40:27.358018 30874 registrar.cpp:424] Successfully recovered > registrar > I0912 05:40:27.358413 30868 master.cpp:1801] Recovered 0 agents from the > registry (129B); allowing 10mins for agents to re-register > I0912 05:40:27.358482 30867 hierarchical.cpp:209] Skipping recovery of > hierarchical allocator: nothing to recover > W0912 05:40:27.364050 30860 process.cpp:3196] Attempted to spawn already > running process files@172.17.0.3:54639 > I0912 05:40:27.365372 30860
[jira] [Commented] (MESOS-7971) PersistentVolumeEndpointsTest.EndpointCreateThenOfferRemove test if flaky
[ https://issues.apache.org/jira/browse/MESOS-7971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16469885#comment-16469885 ] Chun-Hung Hsiao commented on MESOS-7971: Failed on Apache CI: https://builds.apache.org/job/Mesos-Buildbot/5273/BUILDTOOL=cmake,COMPILER=clang,CONFIGURATION=--verbose%20--disable-libtool-wrappers,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1%20MESOS_TEST_AWAIT_TIMEOUT=60secs,OS=ubuntu%3A16.04,label_exp=(ubuntu)&&(!ubuntu-us1)&&(!ubuntu-eu2)&&(!H21)&&(!H23)&&(!H26)&&(!H27)/console > PersistentVolumeEndpointsTest.EndpointCreateThenOfferRemove test if flaky > - > > Key: MESOS-7971 > URL: https://issues.apache.org/jira/browse/MESOS-7971 > Project: Mesos > Issue Type: Bug >Affects Versions: 1.4.0, 1.6.0 >Reporter: Vinod Kone >Priority: Major > Labels: flaky-test, mesosphere > > Saw this when testing 1.4.0-rc5 > {code} > [ RUN ] PersistentVolumeEndpointsTest.EndpointCreateThenOfferRemove > I0912 05:40:27.335222 30860 cluster.cpp:162] Creating default 'local' > authorizer > I0912 05:40:27.338429 30867 master.cpp:442] Master > 2bd1e8eb-e314-4181-9ed3-d397ec1dbede (6aa774430302) started on > 172.17.0.3:54639 > I0912 05:40:27.338472 30867 master.cpp:444] Flags at startup: --acls="" > --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" > --allocation_interval="50ms" --allocator="HierarchicalDRF" > --authenticate_agents="true" --authenticate_frameworks="true" > --authenticate_http_frameworks="true" --authenticate_http_readonly="true" > --authenticate_http_readwrite="true" --authenticators="crammd5" > --authorizers="local" --credentials="/tmp/hH0YXe/credentials" > --filter_gpu_resources="true" --framework_sorter="drf" --help="false" > --hostname_lookup="true" --http_authenticators="basic" > --http_framework_authenticators="basic" --initialize_driver_logging="true" > --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" > --max_agent_ping_timeouts="5" --max_completed_frameworks="50" > --max_completed_tasks_per_framework="1000" > --max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false" > --recovery_agent_removal_limit="100%" --registry="in_memory" > --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" > --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" > --registry_store_timeout="100secs" --registry_strict="false" --roles="role1" > --root_submissions="true" --user_sorter="drf" --version="false" > --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/hH0YXe/master" > --zk_session_timeout="10secs" > I0912 05:40:27.338778 30867 master.cpp:494] Master only allowing > authenticated frameworks to register > I0912 05:40:27.338788 30867 master.cpp:508] Master only allowing > authenticated agents to register > I0912 05:40:27.338793 30867 master.cpp:521] Master only allowing > authenticated HTTP frameworks to register > I0912 05:40:27.338799 30867 credentials.hpp:37] Loading credentials for > authentication from '/tmp/hH0YXe/credentials' > I0912 05:40:27.353009 30867 master.cpp:566] Using default 'crammd5' > authenticator > I0912 05:40:27.353183 30867 http.cpp:1026] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-readonly' > I0912 05:40:27.353364 30867 http.cpp:1026] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-readwrite' > I0912 05:40:27.353482 30867 http.cpp:1026] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-scheduler' > I0912 05:40:27.353588 30867 master.cpp:646] Authorization enabled > W0912 05:40:27.353605 30867 master.cpp:709] The '--roles' flag is deprecated. > This flag will be removed in the future. See the Mesos 0.27 upgrade notes for > more information > I0912 05:40:27.353742 30868 hierarchical.cpp:171] Initialized hierarchical > allocator process > I0912 05:40:27.353775 30872 whitelist_watcher.cpp:77] No whitelist given > I0912 05:40:27.356655 30873 master.cpp:2163] Elected as the leading master! > I0912 05:40:27.356675 30873 master.cpp:1702] Recovering from registrar > I0912 05:40:27.356868 30874 registrar.cpp:347] Recovering registrar > I0912 05:40:27.357390 30874 registrar.cpp:391] Successfully fetched the > registry (0B) in 494080ns > I0912 05:40:27.357483 30874 registrar.cpp:495] Applied 1 operations in > 31911ns; attempting to update the registry > I0912 05:40:27.357919 30874 registrar.cpp:552] Successfully updated the > registry in 391936ns > I0912 05:40:27.358018 30874 registrar.cpp:424] Successfully recovered > registrar > I0912 05:40:27.358413 30868 master.cpp:1801] Recovered 0 agents from the > registry (129B); allowing 10mins for agents to re-register > I0912 05:40:27.358482 30867 hierarchical.cpp:209] Skipping recovery of > hierarchical allocator:
[jira] [Commented] (MESOS-7971) PersistentVolumeEndpointsTest.EndpointCreateThenOfferRemove test if flaky
[ https://issues.apache.org/jira/browse/MESOS-7971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16371372#comment-16371372 ] Armand Grillet commented on MESOS-7971: --- Looks like it's happening again: https://reviews.apache.org/r/64211/#review197842 > PersistentVolumeEndpointsTest.EndpointCreateThenOfferRemove test if flaky > - > > Key: MESOS-7971 > URL: https://issues.apache.org/jira/browse/MESOS-7971 > Project: Mesos > Issue Type: Bug >Affects Versions: 1.4.0 >Reporter: Vinod Kone >Priority: Major > Labels: flaky-test, mesosphere > > Saw this when testing 1.4.0-rc5 > {code} > [ RUN ] PersistentVolumeEndpointsTest.EndpointCreateThenOfferRemove > I0912 05:40:27.335222 30860 cluster.cpp:162] Creating default 'local' > authorizer > I0912 05:40:27.338429 30867 master.cpp:442] Master > 2bd1e8eb-e314-4181-9ed3-d397ec1dbede (6aa774430302) started on > 172.17.0.3:54639 > I0912 05:40:27.338472 30867 master.cpp:444] Flags at startup: --acls="" > --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" > --allocation_interval="50ms" --allocator="HierarchicalDRF" > --authenticate_agents="true" --authenticate_frameworks="true" > --authenticate_http_frameworks="true" --authenticate_http_readonly="true" > --authenticate_http_readwrite="true" --authenticators="crammd5" > --authorizers="local" --credentials="/tmp/hH0YXe/credentials" > --filter_gpu_resources="true" --framework_sorter="drf" --help="false" > --hostname_lookup="true" --http_authenticators="basic" > --http_framework_authenticators="basic" --initialize_driver_logging="true" > --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" > --max_agent_ping_timeouts="5" --max_completed_frameworks="50" > --max_completed_tasks_per_framework="1000" > --max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false" > --recovery_agent_removal_limit="100%" --registry="in_memory" > --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" > --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" > --registry_store_timeout="100secs" --registry_strict="false" --roles="role1" > --root_submissions="true" --user_sorter="drf" --version="false" > --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/hH0YXe/master" > --zk_session_timeout="10secs" > I0912 05:40:27.338778 30867 master.cpp:494] Master only allowing > authenticated frameworks to register > I0912 05:40:27.338788 30867 master.cpp:508] Master only allowing > authenticated agents to register > I0912 05:40:27.338793 30867 master.cpp:521] Master only allowing > authenticated HTTP frameworks to register > I0912 05:40:27.338799 30867 credentials.hpp:37] Loading credentials for > authentication from '/tmp/hH0YXe/credentials' > I0912 05:40:27.353009 30867 master.cpp:566] Using default 'crammd5' > authenticator > I0912 05:40:27.353183 30867 http.cpp:1026] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-readonly' > I0912 05:40:27.353364 30867 http.cpp:1026] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-readwrite' > I0912 05:40:27.353482 30867 http.cpp:1026] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-scheduler' > I0912 05:40:27.353588 30867 master.cpp:646] Authorization enabled > W0912 05:40:27.353605 30867 master.cpp:709] The '--roles' flag is deprecated. > This flag will be removed in the future. See the Mesos 0.27 upgrade notes for > more information > I0912 05:40:27.353742 30868 hierarchical.cpp:171] Initialized hierarchical > allocator process > I0912 05:40:27.353775 30872 whitelist_watcher.cpp:77] No whitelist given > I0912 05:40:27.356655 30873 master.cpp:2163] Elected as the leading master! > I0912 05:40:27.356675 30873 master.cpp:1702] Recovering from registrar > I0912 05:40:27.356868 30874 registrar.cpp:347] Recovering registrar > I0912 05:40:27.357390 30874 registrar.cpp:391] Successfully fetched the > registry (0B) in 494080ns > I0912 05:40:27.357483 30874 registrar.cpp:495] Applied 1 operations in > 31911ns; attempting to update the registry > I0912 05:40:27.357919 30874 registrar.cpp:552] Successfully updated the > registry in 391936ns > I0912 05:40:27.358018 30874 registrar.cpp:424] Successfully recovered > registrar > I0912 05:40:27.358413 30868 master.cpp:1801] Recovered 0 agents from the > registry (129B); allowing 10mins for agents to re-register > I0912 05:40:27.358482 30867 hierarchical.cpp:209] Skipping recovery of > hierarchical allocator: nothing to recover > W0912 05:40:27.364050 30860 process.cpp:3196] Attempted to spawn already > running process files@172.17.0.3:54639 > I0912 05:40:27.365372 30860 containerizer.cpp:246] Using isolation: >