[jira] [Commented] (MESOS-8983) SlaveRecoveryTest/0.PingTimeoutDuringRecovery is flaky
[ https://issues.apache.org/jira/browse/MESOS-8983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16918041#comment-16918041 ] Vinod Kone commented on MESOS-8983: --- Seen this again when testing 1.9.0-RC2. {code} 13:32:33 3: [ RUN ] SlaveRecoveryTest/0.PingTimeoutDuringRecovery 13:32:33 3: I0828 18:32:33.580678 20801 cluster.cpp:177] Creating default 'local' authorizer 13:32:33 3: I0828 18:32:33.587858 20824 master.cpp:440] Master 3de64da7-619c-4652-9d33-3fe2ca2a3d5f (b766865f9da3) started on 172.17.0.2:42011 13:32:33 3: I0828 18:32:33.587904 20824 master.cpp:443] Flags at startup: --acls="" --agent_ping_timeout="1secs" --agent_reregister_timeout="10mins" --allocation_interval="1secs" --allocator="hierarchical" --authenticate_agents="true" --authenticate_frameworks="true" --authenticate_http_frameworks="true" --authenticate_http_readonly="true" --authenticate_http_readwrite="true" --authentication_v0_timeout="15secs" --authenticators="crammd5" --authorizers="local" --credentials="/tmp/sIRhDp/credentials" --filter_gpu_resources="true" --framework_sorter="drf" --help="false" --hostname_lookup="true" --http_authenticators="basic" --http_framework_authenticators="basic" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="2" --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" --max_operator_event_stream_subscribers="1000" --max_unreachable_tasks_per_framework="1000" --memory_profiling="false" --min_allocatable_resources="cpus:0.01|mem:32" --port="5050" --publish_per_framework_metrics="true" --quiet="false" --recovery_agent_removal_limit="100%" --registry="in_memory" --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" --registry_store_timeout="100secs" --registry_strict="false" --require_agent_domain="false" --role_sorter="drf" --root_submissions="true" --version="false" --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/sIRhDp/master" --zk_session_timeout="10secs" 13:32:33 3: I0828 18:32:33.588558 20824 master.cpp:492] Master only allowing authenticated frameworks to register 13:32:33 3: I0828 18:32:33.588574 20824 master.cpp:498] Master only allowing authenticated agents to register 13:32:33 3: I0828 18:32:33.588587 20824 master.cpp:504] Master only allowing authenticated HTTP frameworks to register 13:32:33 3: I0828 18:32:33.588599 20824 credentials.hpp:37] Loading credentials for authentication from '/tmp/sIRhDp/credentials' 13:32:33 3: I0828 18:32:33.588999 20824 master.cpp:548] Using default 'crammd5' authenticator 13:32:33 3: I0828 18:32:33.589262 20824 http.cpp:975] Creating default 'basic' HTTP authenticator for realm 'mesos-master-readonly' 13:32:33 3: I0828 18:32:33.589529 20824 http.cpp:975] Creating default 'basic' HTTP authenticator for realm 'mesos-master-readwrite' 13:32:33 3: I0828 18:32:33.589697 20824 http.cpp:975] Creating default 'basic' HTTP authenticator for realm 'mesos-master-scheduler' 13:32:33 3: I0828 18:32:33.589866 20824 master.cpp:629] Authorization enabled 13:32:33 3: I0828 18:32:33.590817 20823 whitelist_watcher.cpp:77] No whitelist given 13:32:33 3: I0828 18:32:33.594827 20816 master.cpp:2170] Elected as the leading master! 13:32:33 3: I0828 18:32:33.594887 20816 master.cpp:1666] Recovering from registrar 13:32:33 3: I0828 18:32:33.595124 20808 hierarchical.cpp:474] Initialized hierarchical allocator process 13:32:33 3: I0828 18:32:33.595382 20808 registrar.cpp:339] Recovering registrar 13:32:33 3: I0828 18:32:33.596575 20808 registrar.cpp:383] Successfully fetched the registry (0B) in 1.14688ms 13:32:33 3: I0828 18:32:33.596779 20808 registrar.cpp:487] Applied 1 operations in 63194ns; attempting to update the registry 13:32:33 3: I0828 18:32:33.597638 20819 registrar.cpp:544] Successfully updated the registry in 788224ns 13:32:33 3: I0828 18:32:33.597805 20819 registrar.cpp:416] Successfully recovered registrar 13:32:33 3: I0828 18:32:33.598423 20819 master.cpp:1819] Recovered 0 agents from the registry (144B); allowing 10mins for agents to reregister 13:32:33 3: I0828 18:32:33.598599 20813 hierarchical.cpp:513] Skipping recovery of hierarchical allocator: nothing to recover 13:32:33 3: I0828 18:32:33.614511 20801 containerizer.cpp:318] Using isolation { environment_secret, posix/cpu, posix/mem, filesystem/posix, network/cni } 13:32:33 3: W0828 18:32:33.615756 20801 backend.cpp:76] Failed to create 'overlay' backend: OverlayBackend requires root privileges 13:32:33 3: W0828 18:32:33.615855 20801 backend.cpp:76] Failed to create 'aufs' backend: AufsBackend requires root privileges 13:32:33 3: W0828 18:32:33.615934 20801 backend.cpp:76] Failed to create 'bind' backend: BindBackend requires root privileges 13:32:33 3: I0828 18:32:33.616178 20801 provisioner
[jira] [Commented] (MESOS-8983) SlaveRecoveryTest/0.PingTimeoutDuringRecovery is flaky
[ https://issues.apache.org/jira/browse/MESOS-8983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16819281#comment-16819281 ] Andrei Budnik commented on MESOS-8983: -- ThisĀ testĀ fails pretty often on ARM. > SlaveRecoveryTest/0.PingTimeoutDuringRecovery is flaky > -- > > Key: MESOS-8983 > URL: https://issues.apache.org/jira/browse/MESOS-8983 > Project: Mesos > Issue Type: Bug >Affects Versions: 1.7.0, 1.8.0 >Reporter: Alexander Rojas >Assignee: Joseph Wu >Priority: Major > Labels: flaky-test, foundations > > During an unrelated change in a PR, the apache build bot sent the following > error: > {noformat} > @ 7FF71117D888 > std::invoke<,process::Future > >,process::ProcessBase *> > @ 7FF71119257B > lambda::internal::Partial<,process::Future > >,std::_Ph<1> > >::invoke_expand<,std::tuple > >,std::_Ph<1> >,st > @ 7FF7110C08BA ) @ 7FF7110F058C > std::_Invoker_functor::_Call,process::Future > >,std::_Ph<1> >,process::ProcessBase *> > @ 7FF711183EBC > std::invoke,process::Future > >,std::_Ph<1> >,process::ProcessBase *> > @ 7FF7110C9F21 > ),process::Future > >,std::_Ph<1> >,process::ProcessBase * > @ 7FF711236416 process::ProcessBase > *)>::CallableFn,process::Future > >,std::_Ph<1> > >::operator( > @ 7FF712C1A25D process::ProcessBase *)>::operator( > @ 7FF712ACB2F9 process::ProcessBase::consume > @ 7FF712C738CA process::DispatchEvent::consume > @ 7FF70ECE7B07 process::ProcessBase::serve > @ 7FF712AD93B0 process::ProcessManager::resume > @ 7FF712C07371 ?? > @ 7FF712B2B130 > std::_Invoker_functor::_Call< > > @ 7FF712B8B8E0 > std::invoke< > > @ 7FF712B4076C > std::_LaunchPad > >,std::default_delete > > > > >::_Execute<0> > @ 7FF712C5A60A > std::_LaunchPad > >,std::default_delete > > > > >::_Run > @ 7FF712C45E78 > std::_LaunchPad > >,std::default_delete > > > > >::_Go > @ 7FF712C2C3CD std::_Pad::_Call_func > @ 7FFF9BE53428 _register_onexit_function > @ 7FFF9BE53071 _register_onexit_function > @ 7FFFB6391FE4 BaseThreadInitThunk > @ 7FFFB69FF061 RtlUserThreadStart > ll containerizers > I0606 10:25:26.680230 18356 slave.cpp:7158] Recovering executors > I0606 10:25:26.680230 18356 slave.cpp:7182] Sending reconnect request to > executor '3f11d255-bb7b-4e99-967b-055fef95b595' of framework > 62cf792a-dc69-4e3c-b54f-d83f98fb9451- at executor(1)@192.10.1.5:55652 > I0606 10:25:26.688225 22560 slave.cpp:4984] Received re-registration message > from executor '3f11d255-bb7b-4e99-967b-055fef95b595' of framework > 62cf792a-dc69-4e3c-b54f-d83f98fb9451- > I0606 10:25:26.691216 22888 slave.cpp:5901] No pings from master received > within 75secs > F0606 10:25:26.692219 22888 slave.cpp:1249] Check failed: state == > DISCONNECTED || state == RUNNING || state == TERMINATING RECOVERING > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)