[jira] [Commented] (MESOS-6135) ContainerLoggerTest.LOGROTATE_RotateInSandbox is flaky

2017-01-28 Thread Greg Mann (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15844112#comment-15844112
 ] 

Greg Mann commented on MESOS-6135:
--

This same failure mode was observed again yesterday on ASF CI.

> ContainerLoggerTest.LOGROTATE_RotateInSandbox is flaky
> --
>
> Key: MESOS-6135
> URL: https://issues.apache.org/jira/browse/MESOS-6135
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.0.1
> Environment: Ubuntu 14, libev, non-SSL
>Reporter: Greg Mann
>  Labels: logging, mesosphere
>
> Observed in our internal CI:
> {code}
> [19:53:51] :   [Step 10/10] [ RUN  ] 
> ContainerLoggerTest.LOGROTATE_RotateInSandbox
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.460055 23729 cluster.cpp:157] 
> Creating default 'local' authorizer
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.468907 23729 leveldb.cpp:174] 
> Opened db in 8.730166ms
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.472470 23729 leveldb.cpp:181] 
> Compacted db in 3.544028ms
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.472491 23729 leveldb.cpp:196] 
> Created db iterator in 3678ns
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.472496 23729 leveldb.cpp:202] 
> Seeked to beginning of db in 673ns
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.472499 23729 leveldb.cpp:271] 
> Iterated through 0 keys in the db in 256ns
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.472510 23729 replica.cpp:776] 
> Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.472709 23744 recover.cpp:451] 
> Starting replica recovery
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.472820 23748 recover.cpp:477] 
> Replica is in EMPTY status
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.473059 23748 replica.cpp:673] 
> Replica in EMPTY status received a broadcasted recover request from 
> __req_res__(177)@172.30.2.89:44578
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.473146 23746 recover.cpp:197] 
> Received a recover response from a replica in EMPTY status
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.473234 23745 recover.cpp:568] 
> Updating replica status to STARTING
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.473629 23747 master.cpp:379] 
> Master 6d1b2727-f42d-446b-b2f8-a9f7e7667340 (ip-172-30-2-89.mesosphere.io) 
> started on 172.30.2.89:44578
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.473644 23747 master.cpp:381] Flags 
> at startup: --acls="" --agent_ping_timeout="15secs" 
> --agent_reregister_timeout="10mins" --allocation_interval="1secs" 
> --allocator="HierarchicalDRF" --authenticate_agents="true" 
> --authenticate_frameworks="true" --authenticate_http_frameworks="true" 
> --authenticate_http_readonly="true" --authenticate_http_readwrite="true" 
> --authenticators="crammd5" --authorizers="local" 
> --credentials="/tmp/ceLmd7/credentials" --framework_sorter="drf" 
> --help="false" --hostname_lookup="true" --http_authenticators="basic" 
> --http_framework_authenticators="basic" --initialize_driver_logging="true" 
> --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
> --max_agent_ping_timeouts="5" --max_completed_frameworks="50" 
> --max_completed_tasks_per_framework="1000" --quiet="false" 
> --recovery_agent_removal_limit="100%" --registry="replicated_log" 
> --registry_fetch_timeout="1mins" --registry_store_timeout="100secs" 
> --registry_strict="true" --root_submissions="true" --user_sorter="drf" 
> --version="false" --webui_dir="/usr/local/share/mesos/webui" 
> --work_dir="/tmp/ceLmd7/master" --zk_session_timeout="10secs"
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.473832 23747 master.cpp:431] 
> Master only allowing authenticated frameworks to register
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.473844 23747 master.cpp:445] 
> Master only allowing authenticated agents to register
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.473850 23747 master.cpp:458] 
> Master only allowing authenticated HTTP frameworks to register
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.473856 23747 credentials.hpp:37] 
> Loading credentials for authentication from '/tmp/ceLmd7/credentials'
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.473975 23747 master.cpp:503] Using 
> default 'crammd5' authenticator
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.474028 23747 http.cpp:883] Using 
> default 'basic' HTTP authenticator for realm 'mesos-master-readonly'
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.474097 23747 http.cpp:883] Using 
> default 'basic' HTTP authenticator for realm 'mesos-master-readwrite'
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.474161 23747 http.cpp:883] Using 
> default 'basic' HTTP authenticator for realm 'mesos-master-scheduler'
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.474242 23747 master.cpp:583] 
> Authorization enabled
> [19:53:51]W:   [Step 10/10] I0906 

[jira] [Commented] (MESOS-6135) ContainerLoggerTest.LOGROTATE_RotateInSandbox is flaky

2016-09-20 Thread Neil Conway (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15506780#comment-15506780
 ] 

Neil Conway commented on MESOS-6135:


I'd opt for silencing the executor's logging, because it is a bit more precise.

> ContainerLoggerTest.LOGROTATE_RotateInSandbox is flaky
> --
>
> Key: MESOS-6135
> URL: https://issues.apache.org/jira/browse/MESOS-6135
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.0.1
> Environment: Ubuntu 14, libev, non-SSL
>Reporter: Greg Mann
>  Labels: logging, mesosphere
>
> Observed in our internal CI:
> {code}
> [19:53:51] :   [Step 10/10] [ RUN  ] 
> ContainerLoggerTest.LOGROTATE_RotateInSandbox
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.460055 23729 cluster.cpp:157] 
> Creating default 'local' authorizer
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.468907 23729 leveldb.cpp:174] 
> Opened db in 8.730166ms
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.472470 23729 leveldb.cpp:181] 
> Compacted db in 3.544028ms
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.472491 23729 leveldb.cpp:196] 
> Created db iterator in 3678ns
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.472496 23729 leveldb.cpp:202] 
> Seeked to beginning of db in 673ns
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.472499 23729 leveldb.cpp:271] 
> Iterated through 0 keys in the db in 256ns
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.472510 23729 replica.cpp:776] 
> Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.472709 23744 recover.cpp:451] 
> Starting replica recovery
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.472820 23748 recover.cpp:477] 
> Replica is in EMPTY status
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.473059 23748 replica.cpp:673] 
> Replica in EMPTY status received a broadcasted recover request from 
> __req_res__(177)@172.30.2.89:44578
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.473146 23746 recover.cpp:197] 
> Received a recover response from a replica in EMPTY status
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.473234 23745 recover.cpp:568] 
> Updating replica status to STARTING
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.473629 23747 master.cpp:379] 
> Master 6d1b2727-f42d-446b-b2f8-a9f7e7667340 (ip-172-30-2-89.mesosphere.io) 
> started on 172.30.2.89:44578
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.473644 23747 master.cpp:381] Flags 
> at startup: --acls="" --agent_ping_timeout="15secs" 
> --agent_reregister_timeout="10mins" --allocation_interval="1secs" 
> --allocator="HierarchicalDRF" --authenticate_agents="true" 
> --authenticate_frameworks="true" --authenticate_http_frameworks="true" 
> --authenticate_http_readonly="true" --authenticate_http_readwrite="true" 
> --authenticators="crammd5" --authorizers="local" 
> --credentials="/tmp/ceLmd7/credentials" --framework_sorter="drf" 
> --help="false" --hostname_lookup="true" --http_authenticators="basic" 
> --http_framework_authenticators="basic" --initialize_driver_logging="true" 
> --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
> --max_agent_ping_timeouts="5" --max_completed_frameworks="50" 
> --max_completed_tasks_per_framework="1000" --quiet="false" 
> --recovery_agent_removal_limit="100%" --registry="replicated_log" 
> --registry_fetch_timeout="1mins" --registry_store_timeout="100secs" 
> --registry_strict="true" --root_submissions="true" --user_sorter="drf" 
> --version="false" --webui_dir="/usr/local/share/mesos/webui" 
> --work_dir="/tmp/ceLmd7/master" --zk_session_timeout="10secs"
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.473832 23747 master.cpp:431] 
> Master only allowing authenticated frameworks to register
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.473844 23747 master.cpp:445] 
> Master only allowing authenticated agents to register
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.473850 23747 master.cpp:458] 
> Master only allowing authenticated HTTP frameworks to register
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.473856 23747 credentials.hpp:37] 
> Loading credentials for authentication from '/tmp/ceLmd7/credentials'
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.473975 23747 master.cpp:503] Using 
> default 'crammd5' authenticator
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.474028 23747 http.cpp:883] Using 
> default 'basic' HTTP authenticator for realm 'mesos-master-readonly'
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.474097 23747 http.cpp:883] Using 
> default 'basic' HTTP authenticator for realm 'mesos-master-readwrite'
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.474161 23747 http.cpp:883] Using 
> default 'basic' HTTP authenticator for realm 'mesos-master-scheduler'
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.474242 23747 master.cpp:583] 
> Authorization enabled
> [19:53:51]W:   [Step 

[jira] [Commented] (MESOS-6135) ContainerLoggerTest.LOGROTATE_RotateInSandbox is flaky

2016-09-07 Thread Joseph Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15471542#comment-15471542
 ] 

Joseph Wu commented on MESOS-6135:
--

This test is flaky because it is reading the contents of the {{stdout}} file 
and expecting a certain size (with some amount of leeway).  It looks like it 
can fail when the executor logs slightly more (~1 KB) more text than expected.

We can fix this by either: 
1) silencing the executor's logging with an environment variable 
({{MESOS_LOGGING_LEVEL=ERROR}}); or
2) increasing the amount of leeway the test can tolerate 
(https://github.com/apache/mesos/blob/b2101157fd61bbe42c9536935ee9fda44a929ee9/src/tests/container_logger_tests.cpp#L505).

> ContainerLoggerTest.LOGROTATE_RotateInSandbox is flaky
> --
>
> Key: MESOS-6135
> URL: https://issues.apache.org/jira/browse/MESOS-6135
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.0.1
> Environment: Ubuntu 14, libev, non-SSL
>Reporter: Greg Mann
>  Labels: logging, mesosphere
>
> Observed in our internal CI:
> {code}
> [19:53:51] :   [Step 10/10] [ RUN  ] 
> ContainerLoggerTest.LOGROTATE_RotateInSandbox
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.460055 23729 cluster.cpp:157] 
> Creating default 'local' authorizer
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.468907 23729 leveldb.cpp:174] 
> Opened db in 8.730166ms
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.472470 23729 leveldb.cpp:181] 
> Compacted db in 3.544028ms
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.472491 23729 leveldb.cpp:196] 
> Created db iterator in 3678ns
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.472496 23729 leveldb.cpp:202] 
> Seeked to beginning of db in 673ns
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.472499 23729 leveldb.cpp:271] 
> Iterated through 0 keys in the db in 256ns
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.472510 23729 replica.cpp:776] 
> Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.472709 23744 recover.cpp:451] 
> Starting replica recovery
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.472820 23748 recover.cpp:477] 
> Replica is in EMPTY status
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.473059 23748 replica.cpp:673] 
> Replica in EMPTY status received a broadcasted recover request from 
> __req_res__(177)@172.30.2.89:44578
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.473146 23746 recover.cpp:197] 
> Received a recover response from a replica in EMPTY status
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.473234 23745 recover.cpp:568] 
> Updating replica status to STARTING
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.473629 23747 master.cpp:379] 
> Master 6d1b2727-f42d-446b-b2f8-a9f7e7667340 (ip-172-30-2-89.mesosphere.io) 
> started on 172.30.2.89:44578
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.473644 23747 master.cpp:381] Flags 
> at startup: --acls="" --agent_ping_timeout="15secs" 
> --agent_reregister_timeout="10mins" --allocation_interval="1secs" 
> --allocator="HierarchicalDRF" --authenticate_agents="true" 
> --authenticate_frameworks="true" --authenticate_http_frameworks="true" 
> --authenticate_http_readonly="true" --authenticate_http_readwrite="true" 
> --authenticators="crammd5" --authorizers="local" 
> --credentials="/tmp/ceLmd7/credentials" --framework_sorter="drf" 
> --help="false" --hostname_lookup="true" --http_authenticators="basic" 
> --http_framework_authenticators="basic" --initialize_driver_logging="true" 
> --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
> --max_agent_ping_timeouts="5" --max_completed_frameworks="50" 
> --max_completed_tasks_per_framework="1000" --quiet="false" 
> --recovery_agent_removal_limit="100%" --registry="replicated_log" 
> --registry_fetch_timeout="1mins" --registry_store_timeout="100secs" 
> --registry_strict="true" --root_submissions="true" --user_sorter="drf" 
> --version="false" --webui_dir="/usr/local/share/mesos/webui" 
> --work_dir="/tmp/ceLmd7/master" --zk_session_timeout="10secs"
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.473832 23747 master.cpp:431] 
> Master only allowing authenticated frameworks to register
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.473844 23747 master.cpp:445] 
> Master only allowing authenticated agents to register
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.473850 23747 master.cpp:458] 
> Master only allowing authenticated HTTP frameworks to register
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.473856 23747 credentials.hpp:37] 
> Loading credentials for authentication from '/tmp/ceLmd7/credentials'
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.473975 23747 master.cpp:503] Using 
> default 'crammd5' authenticator
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.474028 23747 http.cpp:883] Using 
> default 'basic' HTTP 

[jira] [Commented] (MESOS-6135) ContainerLoggerTest.LOGROTATE_RotateInSandbox is flaky

2016-09-07 Thread Greg Mann (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15471271#comment-15471271
 ] 

Greg Mann commented on MESOS-6135:
--

[~kaysoky] FYI

> ContainerLoggerTest.LOGROTATE_RotateInSandbox is flaky
> --
>
> Key: MESOS-6135
> URL: https://issues.apache.org/jira/browse/MESOS-6135
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.0.1
> Environment: Ubuntu 14, libev, non-SSL
>Reporter: Greg Mann
>  Labels: logging, mesosphere
>
> Observed in our internal CI:
> {code}
> [19:53:51] :   [Step 10/10] [ RUN  ] 
> ContainerLoggerTest.LOGROTATE_RotateInSandbox
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.460055 23729 cluster.cpp:157] 
> Creating default 'local' authorizer
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.468907 23729 leveldb.cpp:174] 
> Opened db in 8.730166ms
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.472470 23729 leveldb.cpp:181] 
> Compacted db in 3.544028ms
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.472491 23729 leveldb.cpp:196] 
> Created db iterator in 3678ns
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.472496 23729 leveldb.cpp:202] 
> Seeked to beginning of db in 673ns
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.472499 23729 leveldb.cpp:271] 
> Iterated through 0 keys in the db in 256ns
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.472510 23729 replica.cpp:776] 
> Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.472709 23744 recover.cpp:451] 
> Starting replica recovery
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.472820 23748 recover.cpp:477] 
> Replica is in EMPTY status
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.473059 23748 replica.cpp:673] 
> Replica in EMPTY status received a broadcasted recover request from 
> __req_res__(177)@172.30.2.89:44578
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.473146 23746 recover.cpp:197] 
> Received a recover response from a replica in EMPTY status
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.473234 23745 recover.cpp:568] 
> Updating replica status to STARTING
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.473629 23747 master.cpp:379] 
> Master 6d1b2727-f42d-446b-b2f8-a9f7e7667340 (ip-172-30-2-89.mesosphere.io) 
> started on 172.30.2.89:44578
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.473644 23747 master.cpp:381] Flags 
> at startup: --acls="" --agent_ping_timeout="15secs" 
> --agent_reregister_timeout="10mins" --allocation_interval="1secs" 
> --allocator="HierarchicalDRF" --authenticate_agents="true" 
> --authenticate_frameworks="true" --authenticate_http_frameworks="true" 
> --authenticate_http_readonly="true" --authenticate_http_readwrite="true" 
> --authenticators="crammd5" --authorizers="local" 
> --credentials="/tmp/ceLmd7/credentials" --framework_sorter="drf" 
> --help="false" --hostname_lookup="true" --http_authenticators="basic" 
> --http_framework_authenticators="basic" --initialize_driver_logging="true" 
> --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
> --max_agent_ping_timeouts="5" --max_completed_frameworks="50" 
> --max_completed_tasks_per_framework="1000" --quiet="false" 
> --recovery_agent_removal_limit="100%" --registry="replicated_log" 
> --registry_fetch_timeout="1mins" --registry_store_timeout="100secs" 
> --registry_strict="true" --root_submissions="true" --user_sorter="drf" 
> --version="false" --webui_dir="/usr/local/share/mesos/webui" 
> --work_dir="/tmp/ceLmd7/master" --zk_session_timeout="10secs"
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.473832 23747 master.cpp:431] 
> Master only allowing authenticated frameworks to register
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.473844 23747 master.cpp:445] 
> Master only allowing authenticated agents to register
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.473850 23747 master.cpp:458] 
> Master only allowing authenticated HTTP frameworks to register
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.473856 23747 credentials.hpp:37] 
> Loading credentials for authentication from '/tmp/ceLmd7/credentials'
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.473975 23747 master.cpp:503] Using 
> default 'crammd5' authenticator
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.474028 23747 http.cpp:883] Using 
> default 'basic' HTTP authenticator for realm 'mesos-master-readonly'
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.474097 23747 http.cpp:883] Using 
> default 'basic' HTTP authenticator for realm 'mesos-master-readwrite'
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.474161 23747 http.cpp:883] Using 
> default 'basic' HTTP authenticator for realm 'mesos-master-scheduler'
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.474242 23747 master.cpp:583] 
> Authorization enabled
> [19:53:51]W:   [Step 10/10] I0906 19:53:51.474308 23744 hierarchical.cpp:149] 
>