[jira] [Updated] (MESOS-7216) Delayed executor termination leads to test failures
[ https://issues.apache.org/jira/browse/MESOS-7216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gastón Kleiman updated MESOS-7216: -- Sprint: Mesosphere Sprint 52 (was: Mesosphere Sprint 52, Mesosphere Sprint 53) > Delayed executor termination leads to test failures > --- > > Key: MESOS-7216 > URL: https://issues.apache.org/jira/browse/MESOS-7216 > Project: Mesos > Issue Type: Bug >Reporter: Gastón Kleiman >Assignee: Gastón Kleiman > Labels: debugging, mesosphere > > This bug came up during the development of a test for the new COMMAND health > checks that use nested containers. The test can be found here: > https://reviews.apache.org/r/55901/. > The test setup was explained in MESOS-7050: > 1) Start the scheduler driver > 2) Launch a task group with the default executor that includes a single long > running task with a COMMAND health check > 3) Wait for the task to return a status of HEALTHY one time > 4) Stop the scheduler driver without explicitly waiting for any of the tasks > to complete > 5) Wait for all the containers to complete > 6) Exit the test > With this setup, all of the ASSERTS in the test itself pass, but the test > failed because there were remaining processes once the test exited (after a > timeout of 15 seconds): > {noformat} > [==] Running 1 test from 1 test case. > [--] Global test environment set-up. > [--] 1 test from HealthCheckTest > [ RUN ] HealthCheckTest.DefaultExecutorCommandHealthCheck > I0228 14:29:19.078368 3475919808 cluster.cpp:160] Creating default 'local' > authorizer > I0228 14:29:19.084883 238907392 master.cpp:383] Master > 98c48dab-fd2b-404e-85dc-4ec5dd0d635c (172.18.8.139) started on > 172.18.8.139:55836 > I0228 14:29:19.084915 238907392 master.cpp:385] Flags at startup: --acls="" > --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" > --allocation_interval="1secs" --allocator="HierarchicalDRF" > --authenticate_agents="true" --authenticate_frameworks="true" > --authenticate_http_frameworks="true" --authenticate_http_readonly="true" > --authenticate_http_readwrite="true" --authenticators="crammd5" > --authorizers="local" > --credentials="/private/var/folders/jr/17y2w4ld019bsn9vhx0c13f8gn/T/xZZCGr/credentials" > --framework_sorter="drf" --help="false" --hostname_lookup="true" > --http_authenticators="basic" --http_framework_authenticators="basic" > --initialize_driver_logging="true" --log_auto_initialize="true" > --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" > --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" > --max_unreachable_tasks_per_framework="1000" --quiet="false" > --recovery_agent_removal_limit="100%" --registry="in_memory" > --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" > --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" > --registry_store_timeout="100secs" --registry_strict="false" > --root_submissions="true" --user_sorter="drf" --version="false" > --webui_dir="/Users/gaston/mesos/master/share/mesos/webui" > --work_dir="/private/var/folders/jr/17y2w4ld019bsn9vhx0c13f8gn/T/xZZCGr/master" > --zk_session_timeout="10secs" > I0228 14:29:19.086030 238907392 master.cpp:435] Master only allowing > authenticated frameworks to register > I0228 14:29:19.086041 238907392 master.cpp:449] Master only allowing > authenticated agents to register > I0228 14:29:19.086046 238907392 master.cpp:462] Master only allowing > authenticated HTTP frameworks to register > I0228 14:29:19.086050 238907392 credentials.hpp:37] Loading credentials for > authentication from > '/private/var/folders/jr/17y2w4ld019bsn9vhx0c13f8gn/T/xZZCGr/credentials' > I0228 14:29:19.086334 238907392 master.cpp:507] Using default 'crammd5' > authenticator > I0228 14:29:19.086369 238907392 authenticator.cpp:519] Initializing server > SASL > I0228 14:29:19.100981 238907392 auxprop.cpp:73] Initialized in-memory > auxiliary property plugin > I0228 14:29:19.101080 238907392 http.cpp:933] Using default 'basic' HTTP > authenticator for realm 'mesos-master-readonly' > I0228 14:29:19.101274 238907392 http.cpp:933] Using default 'basic' HTTP > authenticator for realm 'mesos-master-readwrite' > I0228 14:29:19.101414 238907392 http.cpp:933] Using default 'basic' HTTP > authenticator for realm 'mesos-master-scheduler' > I0228 14:29:19.101528 238907392 master.cpp:587] Authorization enabled > I0228 14:29:19.101702 240517120 hierarchical.cpp:161] Initialized > hierarchical allocator process > I0228 14:29:19.101740 239443968 whitelist_watcher.cpp:77] No whitelist given > I0228 14:29:19.105717 240517120 master.cpp:2122] Elected as the leading > master! > I0228 14:29:19.105738 240517120 master.cpp:1646] Recovering from registrar > I0228
[jira] [Updated] (MESOS-7216) Delayed executor termination leads to test failures
[ https://issues.apache.org/jira/browse/MESOS-7216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-7216: - Sprint: Mesosphere Sprint 52, Mesosphere Sprint 53 (was: Mesosphere Sprint 52) > Delayed executor termination leads to test failures > --- > > Key: MESOS-7216 > URL: https://issues.apache.org/jira/browse/MESOS-7216 > Project: Mesos > Issue Type: Bug >Reporter: Gastón Kleiman >Assignee: Gastón Kleiman > Labels: debugging, mesosphere > > This bug came up during the development of a test for the new COMMAND health > checks that use nested containers. The test can be found here: > https://reviews.apache.org/r/55901/. > The test setup was explained in MESOS-7050: > 1) Start the scheduler driver > 2) Launch a task group with the default executor that includes a single long > running task with a COMMAND health check > 3) Wait for the task to return a status of HEALTHY one time > 4) Stop the scheduler driver without explicitly waiting for any of the tasks > to complete > 5) Wait for all the containers to complete > 6) Exit the test > With this setup, all of the ASSERTS in the test itself pass, but the test > failed because there were remaining processes once the test exited (after a > timeout of 15 seconds): > {noformat} > [==] Running 1 test from 1 test case. > [--] Global test environment set-up. > [--] 1 test from HealthCheckTest > [ RUN ] HealthCheckTest.DefaultExecutorCommandHealthCheck > I0228 14:29:19.078368 3475919808 cluster.cpp:160] Creating default 'local' > authorizer > I0228 14:29:19.084883 238907392 master.cpp:383] Master > 98c48dab-fd2b-404e-85dc-4ec5dd0d635c (172.18.8.139) started on > 172.18.8.139:55836 > I0228 14:29:19.084915 238907392 master.cpp:385] Flags at startup: --acls="" > --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" > --allocation_interval="1secs" --allocator="HierarchicalDRF" > --authenticate_agents="true" --authenticate_frameworks="true" > --authenticate_http_frameworks="true" --authenticate_http_readonly="true" > --authenticate_http_readwrite="true" --authenticators="crammd5" > --authorizers="local" > --credentials="/private/var/folders/jr/17y2w4ld019bsn9vhx0c13f8gn/T/xZZCGr/credentials" > --framework_sorter="drf" --help="false" --hostname_lookup="true" > --http_authenticators="basic" --http_framework_authenticators="basic" > --initialize_driver_logging="true" --log_auto_initialize="true" > --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" > --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" > --max_unreachable_tasks_per_framework="1000" --quiet="false" > --recovery_agent_removal_limit="100%" --registry="in_memory" > --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" > --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" > --registry_store_timeout="100secs" --registry_strict="false" > --root_submissions="true" --user_sorter="drf" --version="false" > --webui_dir="/Users/gaston/mesos/master/share/mesos/webui" > --work_dir="/private/var/folders/jr/17y2w4ld019bsn9vhx0c13f8gn/T/xZZCGr/master" > --zk_session_timeout="10secs" > I0228 14:29:19.086030 238907392 master.cpp:435] Master only allowing > authenticated frameworks to register > I0228 14:29:19.086041 238907392 master.cpp:449] Master only allowing > authenticated agents to register > I0228 14:29:19.086046 238907392 master.cpp:462] Master only allowing > authenticated HTTP frameworks to register > I0228 14:29:19.086050 238907392 credentials.hpp:37] Loading credentials for > authentication from > '/private/var/folders/jr/17y2w4ld019bsn9vhx0c13f8gn/T/xZZCGr/credentials' > I0228 14:29:19.086334 238907392 master.cpp:507] Using default 'crammd5' > authenticator > I0228 14:29:19.086369 238907392 authenticator.cpp:519] Initializing server > SASL > I0228 14:29:19.100981 238907392 auxprop.cpp:73] Initialized in-memory > auxiliary property plugin > I0228 14:29:19.101080 238907392 http.cpp:933] Using default 'basic' HTTP > authenticator for realm 'mesos-master-readonly' > I0228 14:29:19.101274 238907392 http.cpp:933] Using default 'basic' HTTP > authenticator for realm 'mesos-master-readwrite' > I0228 14:29:19.101414 238907392 http.cpp:933] Using default 'basic' HTTP > authenticator for realm 'mesos-master-scheduler' > I0228 14:29:19.101528 238907392 master.cpp:587] Authorization enabled > I0228 14:29:19.101702 240517120 hierarchical.cpp:161] Initialized > hierarchical allocator process > I0228 14:29:19.101740 239443968 whitelist_watcher.cpp:77] No whitelist given > I0228 14:29:19.105717 240517120 master.cpp:2122] Elected as the leading > master! > I0228 14:29:19.105738 240517120 master.cpp:1646] Recovering from registrar >
[jira] [Updated] (MESOS-7216) Delayed executor termination leads to test failures
[ https://issues.apache.org/jira/browse/MESOS-7216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gastón Kleiman updated MESOS-7216: -- Summary: Delayed executor termination leads to test failures (was: Termination of nested containers sometimes fails) > Delayed executor termination leads to test failures > --- > > Key: MESOS-7216 > URL: https://issues.apache.org/jira/browse/MESOS-7216 > Project: Mesos > Issue Type: Bug >Reporter: Gastón Kleiman >Assignee: Kevin Klues > Labels: debugging, mesosphere > > This bug came up during the development of a test for the new COMMAND health > checks that use nested containers. The test can be found here: > https://reviews.apache.org/r/55901/. > The test setup was explained in MESOS-7050: > 1) Start the scheduler driver > 2) Launch a task group with the default executor that includes a single long > running task with a COMMAND health check > 3) Wait for the task to return a status of HEALTHY one time > 4) Stop the scheduler driver without explicitly waiting for any of the tasks > to complete > 5) Exit the test > With this setup, all of the ASSERTS in the test itself pass, but the test > failed because there were remaining processes once the test exited (after a > timeout of 15 seconds): > {noformat} > [==] Running 1 test from 1 test case. > [--] Global test environment set-up. > [--] 1 test from HealthCheckTest > [ RUN ] HealthCheckTest.DefaultExecutorCommandHealthCheck > I0228 14:29:19.078368 3475919808 cluster.cpp:160] Creating default 'local' > authorizer > I0228 14:29:19.084883 238907392 master.cpp:383] Master > 98c48dab-fd2b-404e-85dc-4ec5dd0d635c (172.18.8.139) started on > 172.18.8.139:55836 > I0228 14:29:19.084915 238907392 master.cpp:385] Flags at startup: --acls="" > --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" > --allocation_interval="1secs" --allocator="HierarchicalDRF" > --authenticate_agents="true" --authenticate_frameworks="true" > --authenticate_http_frameworks="true" --authenticate_http_readonly="true" > --authenticate_http_readwrite="true" --authenticators="crammd5" > --authorizers="local" > --credentials="/private/var/folders/jr/17y2w4ld019bsn9vhx0c13f8gn/T/xZZCGr/credentials" > --framework_sorter="drf" --help="false" --hostname_lookup="true" > --http_authenticators="basic" --http_framework_authenticators="basic" > --initialize_driver_logging="true" --log_auto_initialize="true" > --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" > --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" > --max_unreachable_tasks_per_framework="1000" --quiet="false" > --recovery_agent_removal_limit="100%" --registry="in_memory" > --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" > --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" > --registry_store_timeout="100secs" --registry_strict="false" > --root_submissions="true" --user_sorter="drf" --version="false" > --webui_dir="/Users/gaston/mesos/master/share/mesos/webui" > --work_dir="/private/var/folders/jr/17y2w4ld019bsn9vhx0c13f8gn/T/xZZCGr/master" > --zk_session_timeout="10secs" > I0228 14:29:19.086030 238907392 master.cpp:435] Master only allowing > authenticated frameworks to register > I0228 14:29:19.086041 238907392 master.cpp:449] Master only allowing > authenticated agents to register > I0228 14:29:19.086046 238907392 master.cpp:462] Master only allowing > authenticated HTTP frameworks to register > I0228 14:29:19.086050 238907392 credentials.hpp:37] Loading credentials for > authentication from > '/private/var/folders/jr/17y2w4ld019bsn9vhx0c13f8gn/T/xZZCGr/credentials' > I0228 14:29:19.086334 238907392 master.cpp:507] Using default 'crammd5' > authenticator > I0228 14:29:19.086369 238907392 authenticator.cpp:519] Initializing server > SASL > I0228 14:29:19.100981 238907392 auxprop.cpp:73] Initialized in-memory > auxiliary property plugin > I0228 14:29:19.101080 238907392 http.cpp:933] Using default 'basic' HTTP > authenticator for realm 'mesos-master-readonly' > I0228 14:29:19.101274 238907392 http.cpp:933] Using default 'basic' HTTP > authenticator for realm 'mesos-master-readwrite' > I0228 14:29:19.101414 238907392 http.cpp:933] Using default 'basic' HTTP > authenticator for realm 'mesos-master-scheduler' > I0228 14:29:19.101528 238907392 master.cpp:587] Authorization enabled > I0228 14:29:19.101702 240517120 hierarchical.cpp:161] Initialized > hierarchical allocator process > I0228 14:29:19.101740 239443968 whitelist_watcher.cpp:77] No whitelist given > I0228 14:29:19.105717 240517120 master.cpp:2122] Elected as the leading > master! > I0228 14:29:19.105738 240517120 master.cpp:1646] Recovering from registrar > I0228