Vinod Kone created MESOS-7726:
---------------------------------

             Summary: MasterTest.IgnoreOldAgentReregistration test is flaky
                 Key: MESOS-7726
                 URL: https://issues.apache.org/jira/browse/MESOS-7726
             Project: Mesos
          Issue Type: Bug
            Reporter: Vinod Kone
            Assignee: Neil Conway


Observed this on ASF CI.

{code}
[ RUN      ] MasterTest.IgnoreOldAgentReregistration
I0627 05:23:06.031154  4917 cluster.cpp:162] Creating default 'local' authorizer
I0627 05:23:06.033433  4945 master.cpp:438] Master 
a8778782-0da1-49a5-9cb8-9f6d11701733 (c43debbe7e32) started on 172.17.0.4:41747
I0627 05:23:06.033457  4945 master.cpp:440] Flags at startup: --acls="" 
--agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
--allocation_interval="1secs" --allocator="HierarchicalDRF" 
--authenticate_agents="true" --authenticate_frameworks="true" 
--authenticate_http_frameworks="true" --authenticate_http_readonly="true" 
--authenticate_http_readwrite="true" --authenticators="crammd5" 
--authorizers="local" --credentials="/tmp/2BARnF/credentials" 
--filter_gpu_resources="true" --framework_sorter="drf" --help="false" 
--hostname_lookup="true" --http_authenticators="basic" 
--http_framework_authenticators="basic" --initialize_driver_logging="true" 
--log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
--max_agent_ping_timeouts="5" --max_completed_frameworks="50" 
--max_completed_tasks_per_framework="1000" 
--max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false" 
--recovery_agent_removal_limit="100%" --registry="in_memory" 
--registry_fetch_timeout="1mins" --registry_gc_interval="15mins" 
--registry_max_agent_age="2weeks" --registry_max_agent_count="102400" 
--registry_store_timeout="100secs" --registry_strict="false" 
--root_submissions="true" --user_sorter="drf" --version="false" 
--webui_dir="/mesos/mesos-1.4.0/_inst/share/mesos/webui" 
--work_dir="/tmp/2BARnF/master" --zk_session_timeout="10secs"
I0627 05:23:06.033771  4945 master.cpp:490] Master only allowing authenticated 
frameworks to register
I0627 05:23:06.033787  4945 master.cpp:504] Master only allowing authenticated 
agents to register
I0627 05:23:06.033798  4945 master.cpp:517] Master only allowing authenticated 
HTTP frameworks to register
I0627 05:23:06.033812  4945 credentials.hpp:37] Loading credentials for 
authentication from '/tmp/2BARnF/credentials'
I0627 05:23:06.034080  4945 master.cpp:562] Using default 'crammd5' 
authenticator
I0627 05:23:06.034221  4945 http.cpp:974] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-readonly'
I0627 05:23:06.034409  4945 http.cpp:974] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-readwrite'
I0627 05:23:06.034569  4945 http.cpp:974] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-scheduler'
I0627 05:23:06.034688  4945 master.cpp:642] Authorization enabled
I0627 05:23:06.034862  4938 whitelist_watcher.cpp:77] No whitelist given
I0627 05:23:06.034868  4950 hierarchical.cpp:169] Initialized hierarchical 
allocator process
I0627 05:23:06.037211  4957 master.cpp:2161] Elected as the leading master!
I0627 05:23:06.037236  4957 master.cpp:1700] Recovering from registrar
I0627 05:23:06.037333  4938 registrar.cpp:345] Recovering registrar
I0627 05:23:06.038146  4938 registrar.cpp:389] Successfully fetched the 
registry (0B) in 768256ns
I0627 05:23:06.038290  4938 registrar.cpp:493] Applied 1 operations in 30798ns; 
attempting to update the registry
I0627 05:23:06.038861  4938 registrar.cpp:550] Successfully updated the 
registry in 510976ns
I0627 05:23:06.038960  4938 registrar.cpp:422] Successfully recovered registrar
I0627 05:23:06.039364  4941 hierarchical.cpp:207] Skipping recovery of 
hierarchical allocator: nothing to recover
I0627 05:23:06.039594  4958 master.cpp:1799] Recovered 0 agents from the 
registry (129B); allowing 10mins for agents to re-register
I0627 05:23:06.043999  4917 containerizer.cpp:230] Using isolation: 
posix/cpu,posix/mem,filesystem/posix,network/cni,environment_secret
W0627 05:23:06.044456  4917 backend.cpp:76] Failed to create 'aufs' backend: 
AufsBackend requires root privileges
W0627 05:23:06.044548  4917 backend.cpp:76] Failed to create 'bind' backend: 
BindBackend requires root privileges
I0627 05:23:06.044580  4917 provisioner.cpp:255] Using default backend 'copy'
I0627 05:23:06.046222  4917 cluster.cpp:448] Creating default 'local' authorizer
I0627 05:23:06.047572  4950 slave.cpp:249] Mesos agent started on 
(269)@172.17.0.4:41747
I0627 05:23:06.047591  4950 slave.cpp:250] Flags at startup: --acls="" 
--appc_simple_discovery_uri_prefix="http://"; 
--appc_store_dir="/tmp/MasterTest_IgnoreOldAgentReregistration_Bgz7OK/store/appc"
 --authenticate_http_readonly="true" --authenticate_http_readwrite="true" 
--authenticatee="crammd5" --authentication_backoff_factor="1secs" 
--authorizer="local" --cgroups_cpu_enable_pids_and_tids_count="false" 
--cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup" 
--cgroups_limit_swap="false" --cgroups_root="mesos" 
--container_disk_watch_interval="15secs" --containerizers="mesos" 
--credential="/tmp/MasterTest_IgnoreOldAgentReregistration_Bgz7OK/credential" 
--default_role="*" --disk_watch_interval="1mins" --docker="docker" 
--docker_kill_orphans="true" --docker_registry="https://registry-1.docker.io"; 
--docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock" 
--docker_stop_timeout="0ns" 
--docker_store_dir="/tmp/MasterTest_IgnoreOldAgentReregistration_Bgz7OK/store/docker"
 --docker_volume_checkpoint_dir="/var/run/mesos/isolators/docker/volume" 
--enforce_container_disk_quota="false" --executor_registration_timeout="1mins" 
--executor_reregistration_timeout="2secs" 
--executor_shutdown_grace_period="5secs" 
--fetcher_cache_dir="/tmp/MasterTest_IgnoreOldAgentReregistration_Bgz7OK/fetch" 
--fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks" 
--gc_disk_headroom="0.1" --hadoop_home="" --help="false" 
--hostname_lookup="true" --http_command_executor="false" 
--http_credentials="/tmp/MasterTest_IgnoreOldAgentReregistration_Bgz7OK/http_credentials"
 --http_heartbeat_interval="30secs" --initialize_driver_logging="true" 
--isolation="posix/cpu,posix/mem" --launcher="posix" 
--launcher_dir="/mesos/mesos-1.4.0/_build/src" --logbufsecs="0" 
--logging_level="INFO" --max_completed_executors_per_framework="150" 
--oversubscribed_resources_interval="15secs" --perf_duration="10secs" 
--perf_interval="1mins" --port="5051" --qos_correction_interval_min="0ns" 
--quiet="false" --recover="reconnect" --recovery_timeout="15mins" 
--registration_backoff_factor="10ms" 
--resources="cpus:2;gpus:0;mem:1024;disk:1024;ports:[31000-32000]" 
--revocable_cpu_low_priority="true" 
--runtime_dir="/tmp/MasterTest_IgnoreOldAgentReregistration_Bgz7OK" 
--sandbox_directory="/mnt/mesos/sandbox" --strict="true" --switch_user="true" 
--systemd_enable_support="true" 
--systemd_runtime_directory="/run/systemd/system" --version="false" 
--work_dir="/tmp/MasterTest_IgnoreOldAgentReregistration_US8t6c"
I0627 05:23:06.047937  4950 credentials.hpp:86] Loading credential for 
authentication from 
'/tmp/MasterTest_IgnoreOldAgentReregistration_Bgz7OK/credential'
I0627 05:23:06.048065  4950 slave.cpp:282] Agent using credential for: 
test-principal
I0627 05:23:06.048085  4950 credentials.hpp:37] Loading credentials for 
authentication from 
'/tmp/MasterTest_IgnoreOldAgentReregistration_Bgz7OK/http_credentials'
I0627 05:23:06.048274  4950 http.cpp:974] Creating default 'basic' HTTP 
authenticator for realm 'mesos-agent-readonly'
I0627 05:23:06.048396  4950 http.cpp:974] Creating default 'basic' HTTP 
authenticator for realm 'mesos-agent-readwrite'
I0627 05:23:06.049597  4950 slave.cpp:553] Agent resources: 
[{"name":"cpus","scalar":{"value":2.0},"type":"SCALAR"},{"name":"mem","scalar":{"value":1024.0},"type":"SCALAR"},{"name":"disk","scalar":{"value":1024.0},"type":"SCALAR"},{"name":"ports","ranges":{"range":[{"begin":31000,"end":32000}]},"type":"RANGES"}]
I0627 05:23:06.049815  4950 slave.cpp:561] Agent attributes: [  ]
I0627 05:23:06.049830  4950 slave.cpp:566] Agent hostname: c43debbe7e32
I0627 05:23:06.049934  4941 status_update_manager.cpp:177] Pausing sending 
status updates
I0627 05:23:06.051394  4957 state.cpp:64] Recovering state from 
'/tmp/MasterTest_IgnoreOldAgentReregistration_US8t6c/meta'
I0627 05:23:06.051775  4954 status_update_manager.cpp:203] Recovering status 
update manager
I0627 05:23:06.052050  4959 containerizer.cpp:582] Recovering containerizer
I0627 05:23:06.053479  4946 provisioner.cpp:416] Provisioner recovery complete
I0627 05:23:06.053802  4937 slave.cpp:6168] Finished recovery
I0627 05:23:06.054203  4937 slave.cpp:6350] Querying resource estimator for 
oversubscribable resources
I0627 05:23:06.054503  4954 slave.cpp:946] New master detected at 
master@172.17.0.4:41747
I0627 05:23:06.054517  4950 status_update_manager.cpp:177] Pausing sending 
status updates
I0627 05:23:06.054641  4954 slave.cpp:981] Detecting new master
I0627 05:23:06.054774  4954 slave.cpp:6364] Received oversubscribable resources 
{} from the resource estimator
I0627 05:23:06.065029  4943 slave.cpp:1008] Authenticating with master 
master@172.17.0.4:41747
I0627 05:23:06.065158  4943 slave.cpp:1019] Using default CRAM-MD5 authenticatee
I0627 05:23:06.065412  4945 authenticatee.cpp:121] Creating new client SASL 
connection
I0627 05:23:06.065798  4938 master.cpp:7640] Authenticating 
slave(269)@172.17.0.4:41747
I0627 05:23:06.065937  4954 authenticator.cpp:414] Starting authentication 
session for crammd5-authenticatee(577)@172.17.0.4:41747
I0627 05:23:06.066174  4937 authenticator.cpp:98] Creating new server SASL 
connection
I0627 05:23:06.066419  4958 authenticatee.cpp:213] Received SASL authentication 
mechanisms: CRAM-MD5
I0627 05:23:06.066448  4958 authenticatee.cpp:239] Attempting to authenticate 
with mechanism 'CRAM-MD5'
I0627 05:23:06.066550  4958 authenticator.cpp:204] Received SASL authentication 
start
I0627 05:23:06.066612  4958 authenticator.cpp:326] Authentication requires more 
steps
I0627 05:23:06.066845  4952 authenticatee.cpp:259] Received SASL authentication 
step
I0627 05:23:06.067041  4936 authenticator.cpp:232] Received SASL authentication 
step
I0627 05:23:06.067081  4936 auxprop.cpp:109] Request to lookup properties for 
user: 'test-principal' realm: 'c43debbe7e32' server FQDN: 'c43debbe7e32' 
SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false 
SASL_AUXPROP_AUTHZID: false 
I0627 05:23:06.067096  4936 auxprop.cpp:181] Looking up auxiliary property 
'*userPassword'
I0627 05:23:06.067129  4936 auxprop.cpp:181] Looking up auxiliary property 
'*cmusaslsecretCRAM-MD5'
I0627 05:23:06.067149  4936 auxprop.cpp:109] Request to lookup properties for 
user: 'test-principal' realm: 'c43debbe7e32' server FQDN: 'c43debbe7e32' 
SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false 
SASL_AUXPROP_AUTHZID: true 
I0627 05:23:06.067159  4936 auxprop.cpp:131] Skipping auxiliary property 
'*userPassword' since SASL_AUXPROP_AUTHZID == true
I0627 05:23:06.067167  4936 auxprop.cpp:131] Skipping auxiliary property 
'*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true
I0627 05:23:06.067184  4936 authenticator.cpp:318] Authentication success
I0627 05:23:06.067267  4945 authenticatee.cpp:299] Authentication success
I0627 05:23:06.067364  4954 master.cpp:7670] Successfully authenticated 
principal 'test-principal' at slave(269)@172.17.0.4:41747
I0627 05:23:06.067433  4943 authenticator.cpp:432] Authentication session 
cleanup for crammd5-authenticatee(577)@172.17.0.4:41747
I0627 05:23:06.067656  4939 slave.cpp:1103] Successfully authenticated with 
master master@172.17.0.4:41747
I0627 05:23:06.067855  4939 slave.cpp:1545] Will retry registration in 
1.835348ms if necessary
I0627 05:23:06.068022  4949 master.cpp:5602] Received register agent message 
from slave(269)@172.17.0.4:41747 (c43debbe7e32)
I0627 05:23:06.068155  4949 master.cpp:3742] Authorizing agent with principal 
'test-principal'
I0627 05:23:06.068568  4957 master.cpp:5662] Authorized registration of agent 
at slave(269)@172.17.0.4:41747 (c43debbe7e32)
I0627 05:23:06.068668  4957 master.cpp:5741] Registering agent at 
slave(269)@172.17.0.4:41747 (c43debbe7e32) with id 
a8778782-0da1-49a5-9cb8-9f6d11701733-S0
I0627 05:23:06.069067  4937 registrar.cpp:493] Applied 1 operations in 56568ns; 
attempting to update the registry
I0627 05:23:06.069788  4937 registrar.cpp:550] Successfully updated the 
registry in 659968ns
I0627 05:23:06.069957  4946 master.cpp:5788] Admitted agent 
a8778782-0da1-49a5-9cb8-9f6d11701733-S0 at slave(269)@172.17.0.4:41747 
(c43debbe7e32)
I0627 05:23:06.070394  4950 slave.cpp:1545] Will retry registration in 
34.487053ms if necessary
I0627 05:23:06.070508  4950 slave.cpp:4866] Received ping from 
slave-observer(272)@172.17.0.4:41747
I0627 05:23:06.070776  4950 slave.cpp:1149] Registered with master 
master@172.17.0.4:41747; given agent ID a8778782-0da1-49a5-9cb8-9f6d11701733-S0
I0627 05:23:06.070569  4946 master.cpp:5819] Registered agent 
a8778782-0da1-49a5-9cb8-9f6d11701733-S0 at slave(269)@172.17.0.4:41747 
(c43debbe7e32) with 
[{"name":"cpus","scalar":{"value":2.0},"type":"SCALAR"},{"name":"mem","scalar":{"value":1024.0},"type":"SCALAR"},{"name":"disk","scalar":{"value":1024.0},"type":"SCALAR"},{"name":"ports","ranges":{"range":[{"begin":31000,"end":32000}]},"type":"RANGES"}]
I0627 05:23:06.070894  4942 status_update_manager.cpp:184] Resuming sending 
status updates
I0627 05:23:06.070890  4949 hierarchical.cpp:587] Added agent 
a8778782-0da1-49a5-9cb8-9f6d11701733-S0 (c43debbe7e32) with cpus:2; mem:1024; 
disk:1024; ports:[31000-32000] (allocated: {})
I0627 05:23:06.071032  4946 master.cpp:5602] Received register agent message 
from slave(269)@172.17.0.4:41747 (c43debbe7e32)
I0627 05:23:06.071089  4950 slave.cpp:1169] Checkpointing SlaveInfo to 
'/tmp/MasterTest_IgnoreOldAgentReregistration_US8t6c/meta/slaves/a8778782-0da1-49a5-9cb8-9f6d11701733-S0/slave.info'
I0627 05:23:06.071142  4946 master.cpp:3742] Authorizing agent with principal 
'test-principal'
I0627 05:23:06.071179  4949 hierarchical.cpp:1938] No allocations performed
I0627 05:23:06.071238  4949 hierarchical.cpp:1493] Performed allocation for 1 
agents in 180904ns
I0627 05:23:06.071403  4950 slave.cpp:1207] Forwarding total oversubscribed 
resources {}
I0627 05:23:06.071549  4944 master.cpp:5662] Authorized registration of agent 
at slave(269)@172.17.0.4:41747 (c43debbe7e32)
I0627 05:23:06.071624  4952 status_update_manager.cpp:177] Pausing sending 
status updates
I0627 05:23:06.071624  4944 master.cpp:5719] Agent 
a8778782-0da1-49a5-9cb8-9f6d11701733-S0 at slave(269)@172.17.0.4:41747 
(c43debbe7e32) already registered, resending acknowledgement
I0627 05:23:06.071624  4950 slave.cpp:946] New master detected at 
master@172.17.0.4:41747
I0627 05:23:06.071753  4950 slave.cpp:981] Detecting new master
I0627 05:23:06.071769  4944 master.cpp:6508] Received update of agent 
a8778782-0da1-49a5-9cb8-9f6d11701733-S0 at slave(269)@172.17.0.4:41747 
(c43debbe7e32) with total oversubscribed resources {}
I0627 05:23:06.071905  4950 slave.cpp:1149] Registered with master 
master@172.17.0.4:41747; given agent ID a8778782-0da1-49a5-9cb8-9f6d11701733-S0
I0627 05:23:06.072108  4943 status_update_manager.cpp:184] Resuming sending 
status updates
I0627 05:23:06.072243  4950 slave.cpp:1169] Checkpointing SlaveInfo to 
'/tmp/MasterTest_IgnoreOldAgentReregistration_US8t6c/meta/slaves/a8778782-0da1-49a5-9cb8-9f6d11701733-S0/slave.info'
I0627 05:23:06.072391  4948 hierarchical.cpp:1938] No allocations performed
I0627 05:23:06.072443  4948 hierarchical.cpp:1493] Performed allocation for 1 
agents in 158723ns
I0627 05:23:06.072525  4950 slave.cpp:1207] Forwarding total oversubscribed 
resources {}
I0627 05:23:06.072618  4950 slave.cpp:1008] Authenticating with master 
master@172.17.0.4:41747
I0627 05:23:06.072661  4950 slave.cpp:1019] Using default CRAM-MD5 authenticatee
I0627 05:23:06.072693  4959 master.cpp:6508] Received update of agent 
a8778782-0da1-49a5-9cb8-9f6d11701733-S0 at slave(269)@172.17.0.4:41747 
(c43debbe7e32) with total oversubscribed resources {}
I0627 05:23:06.072882  4946 authenticatee.cpp:121] Creating new client SASL 
connection
I0627 05:23:06.073169  4944 master.cpp:7640] Authenticating 
slave(269)@172.17.0.4:41747
I0627 05:23:06.073289  4950 authenticator.cpp:414] Starting authentication 
session for crammd5-authenticatee(578)@172.17.0.4:41747
I0627 05:23:06.073531  4952 authenticator.cpp:98] Creating new server SASL 
connection
I0627 05:23:06.073796  4949 authenticatee.cpp:213] Received SASL authentication 
mechanisms: CRAM-MD5
I0627 05:23:06.073827  4949 authenticatee.cpp:239] Attempting to authenticate 
with mechanism 'CRAM-MD5'
I0627 05:23:06.073998  4936 authenticator.cpp:204] Received SASL authentication 
start
I0627 05:23:06.074061  4936 authenticator.cpp:326] Authentication requires more 
steps
I0627 05:23:06.074195  4957 authenticatee.cpp:259] Received SASL authentication 
step
I0627 05:23:06.074335  4953 authenticator.cpp:232] Received SASL authentication 
step
I0627 05:23:06.074370  4953 auxprop.cpp:109] Request to lookup properties for 
user: 'test-principal' realm: 'c43debbe7e32' server FQDN: 'c43debbe7e32' 
SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false 
SASL_AUXPROP_AUTHZID: false 
I0627 05:23:06.074388  4953 auxprop.cpp:181] Looking up auxiliary property 
'*userPassword'
I0627 05:23:06.074416  4953 auxprop.cpp:181] Looking up auxiliary property 
'*cmusaslsecretCRAM-MD5'
I0627 05:23:06.074436  4953 auxprop.cpp:109] Request to lookup properties for 
user: 'test-principal' realm: 'c43debbe7e32' server FQDN: 'c43debbe7e32' 
SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false 
SASL_AUXPROP_AUTHZID: true 
I0627 05:23:06.074446  4953 auxprop.cpp:131] Skipping auxiliary property 
'*userPassword' since SASL_AUXPROP_AUTHZID == true
I0627 05:23:06.074455  4953 auxprop.cpp:131] Skipping auxiliary property 
'*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true
I0627 05:23:06.074468  4953 authenticator.cpp:318] Authentication success
I0627 05:23:06.074553  4956 authenticatee.cpp:299] Authentication success
I0627 05:23:06.074594  4945 master.cpp:7670] Successfully authenticated 
principal 'test-principal' at slave(269)@172.17.0.4:41747
I0627 05:23:06.074642  4952 authenticator.cpp:432] Authentication session 
cleanup for crammd5-authenticatee(578)@172.17.0.4:41747
I0627 05:23:06.074981  4951 slave.cpp:1103] Successfully authenticated with 
master master@172.17.0.4:41747
../../src/tests/master_tests.cpp:7110: Failure
Failed to wait 15secs for reregisterSlaveMessage
I0627 05:23:21.082969  4953 slave.cpp:818] Agent terminating
I0627 05:23:21.083305  4936 master.cpp:1316] Agent 
a8778782-0da1-49a5-9cb8-9f6d11701733-S0 at slave(269)@172.17.0.4:41747 
(c43debbe7e32) disconnected
I0627 05:23:21.083349  4936 master.cpp:3240] Disconnecting agent 
a8778782-0da1-49a5-9cb8-9f6d11701733-S0 at slave(269)@172.17.0.4:41747 
(c43debbe7e32)
I0627 05:23:21.083443  4936 master.cpp:3259] Deactivating agent 
a8778782-0da1-49a5-9cb8-9f6d11701733-S0 at slave(269)@172.17.0.4:41747 
(c43debbe7e32)
I0627 05:23:21.083665  4936 hierarchical.cpp:715] Agent 
a8778782-0da1-49a5-9cb8-9f6d11701733-S0 deactivated
I0627 05:23:21.089186  4917 master.cpp:1158] Master terminating
I0627 05:23:21.090092  4944 hierarchical.cpp:620] Removed agent 
a8778782-0da1-49a5-9cb8-9f6d11701733-S0
../../3rdparty/libprocess/include/process/gmock.hpp:441: Failure
Actual function call count doesn't match EXPECT_CALL(filter->mock, 
filter(testing::A<const MessageEvent&>()))...
    Expected args: message matcher (8-byte object <E8-AB 06-80 6C-2B 00-00>, 
1-byte object <AB>, 1-byte object <E8>)
         Expected: to be called once
           Actual: never called - unsatisfied and active
[  FAILED  ] MasterTest.IgnoreOldAgentReregistration (15070 ms)

{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to