[jira] [Commented] (MESOS-8723) ROOT_HealthCheckUsingPersistentVolume is flaky.

2018-11-06 Thread Joseph Wu (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-8723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16677515#comment-16677515
 ] 

Joseph Wu commented on MESOS-8723:
--

Another bad run, on the 1.6.x branch (Ubuntu 16)
{code}
[ RUN  ] 
LauncherAndIsolationParam/PersistentVolumeDefaultExecutor.ROOT_HealthCheckUsingPersistentVolume/1
I1106 20:15:34.354775 32499 cluster.cpp:172] Creating default 'local' authorizer
I1106 20:15:34.355837 22262 master.cpp:463] Master 
ee3a72ac-f1ea-4572-ab7a-424ecc6e517c (ip-172-16-10-158.ec2.internal) started on 
172.16.10.158:46488
I1106 20:15:34.355865 22262 master.cpp:466] Flags at startup: --acls="" 
--agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
--allocation_interval="1secs" --allocator="hierarchical" 
--authenticate_agents="true" --authenticate_frameworks="true" 
--authenticate_http_frameworks="true" --authenticate_http_readonly="true" 
--authenticate_http_readwrite="true" --authentication_v0_timeout="15secs" 
--authenticators="crammd5" --authorizers="local" 
--credentials="/tmp/yQNdFw/credentials" --filter_gpu_resources="true" 
--framework_sorter="drf" --help="false" --hostname_lookup="true" 
--http_authenticators="basic" --http_framework_authenticators="basic" 
--initialize_driver_logging="true" --log_auto_initialize="true" 
--logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" 
--max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
--max_unreachable_tasks_per_framework="1000" --memory_profiling="false" 
--min_allocatable_resources="cpus:0.01|mem:32" --port="5050" --quiet="false" 
--recovery_agent_removal_limit="100%" --registry="in_memory" 
--registry_fetch_timeout="1mins" --registry_gc_interval="15mins" 
--registry_max_agent_age="2weeks" --registry_max_agent_count="102400" 
--registry_store_timeout="100secs" --registry_strict="false" 
--require_agent_domain="false" --role_sorter="drf" --root_submissions="true" 
--version="false" --webui_dir="/usr/local/share/mesos/webui" 
--work_dir="/tmp/yQNdFw/master" --zk_session_timeout="10secs"
I1106 20:15:34.356046 22262 master.cpp:515] Master only allowing authenticated 
frameworks to register
I1106 20:15:34.356058 22262 master.cpp:521] Master only allowing authenticated 
agents to register
I1106 20:15:34.356146 22262 master.cpp:527] Master only allowing authenticated 
HTTP frameworks to register
I1106 20:15:34.356154 22262 credentials.hpp:37] Loading credentials for 
authentication from '/tmp/yQNdFw/credentials'
I1106 20:15:34.356290 22262 master.cpp:571] Using default 'crammd5' 
authenticator
I1106 20:15:34.356390 22262 http.cpp:959] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-readonly'
I1106 20:15:34.356514 22262 http.cpp:959] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-readwrite'
I1106 20:15:34.356560 22262 http.cpp:959] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-scheduler'
I1106 20:15:34.356707 22262 master.cpp:652] Authorization enabled
I1106 20:15:34.356874 22263 hierarchical.cpp:177] Initialized hierarchical 
allocator process
I1106 20:15:34.356999 22263 whitelist_watcher.cpp:77] No whitelist given
I1106 20:15:34.357659 22263 master.cpp:2162] Elected as the leading master!
I1106 20:15:34.357681 22263 master.cpp:1717] Recovering from registrar
I1106 20:15:34.357723 22263 registrar.cpp:339] Recovering registrar
I1106 20:15:34.357962 22257 registrar.cpp:383] Successfully fetched the 
registry (0B) in 184832ns
I1106 20:15:34.358003 22257 registrar.cpp:487] Applied 1 operations in 7418ns; 
attempting to update the registry
I1106 20:15:34.358218 22262 registrar.cpp:544] Successfully updated the 
registry in 129792ns
I1106 20:15:34.358259 22262 registrar.cpp:416] Successfully recovered registrar
I1106 20:15:34.358475 22262 master.cpp:1831] Recovered 0 agents from the 
registry (176B); allowing 10mins for agents to reregister
I1106 20:15:34.358522 22262 hierarchical.cpp:215] Skipping recovery of 
hierarchical allocator: nothing to recover
I1106 20:15:34.359434 32499 containerizer.cpp:296] Using isolation { 
environment_secret, network/cni, filesystem/posix, volume/sandbox_path }
I1106 20:15:34.361624 32499 linux_launcher.cpp:147] Using 
/sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher
I1106 20:15:34.362061 32499 provisioner.cpp:299] Using default backend 'overlay'
W1106 20:15:34.363476 32499 process.cpp:2829] Attempted to spawn already 
running process files@172.16.10.158:46488
I1106 20:15:34.363585 32499 cluster.cpp:460] Creating default 'local' authorizer
I1106 20:15:34.364131 22257 slave.cpp:265] Mesos agent started on 
(1090)@172.16.10.158:46488
I1106 20:15:34.364296 22257 slave.cpp:266] Flags at startup: --acls="" 
--appc_simple_discovery_uri_prefix="http://; 
--appc_store_dir="/tmp/LauncherAndIsolationParam_PersistentVolumeDefaultExecutor_ROOT_HealthCheckUsingPersistentVolume_1_9fXuUo/store/appc"
 

[jira] [Commented] (MESOS-9285) DockerVolumeIsolatorTest.ROOT_INTERNET_CURL_CommandTaskRootfsWithAbsolutePathVolume is flaky

2018-11-06 Thread Joseph Wu (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-9285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16677505#comment-16677505
 ] 

Joseph Wu commented on MESOS-9285:
--

Observed on another internal CI run, albeit on CentOS7 and on the 1.5.x branch 
(https://github.com/apache/mesos/tree/6008868c715733b7d798279e9b39ae3483f7d955)
{code}
[ RUN  ] 
DockerVolumeIsolatorTest.ROOT_INTERNET_CURL_CommandTaskRootfsWithAbsolutePathVolume
I1106 20:21:29.367887 31384 cluster.cpp:172] Creating default 'local' authorizer
I1106 20:21:29.368988 21440 master.cpp:457] Master 
0d219c14-565e-46dd-b5c2-56bd9e97e4d1 (ip-172-16-10-72.ec2.internal) started on 
172.16.10.72:46670
I1106 20:21:29.369009 21440 master.cpp:459] Flags at startup: --acls="" 
--agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
--allocation_interval="1secs" --allocator="hierarchical" 
--authenticate_agents="true" --authenticate_frameworks="true" 
--authenticate_http_frameworks="true" --authenticate_http_readonly="true" 
--authenticate_http_readwrite="true" --authentication_v0_timeout="15secs" 
--authenticators="crammd5" --authorizers="local" 
--credentials="/tmp/zLJ6wA/credentials" --filter_gpu_resources="true" 
--framework_sorter="drf" --help="false" --hostname_lookup="true" 
--http_authenticators="basic" --http_framework_authenticators="basic" 
--initialize_driver_logging="true" --log_auto_initialize="true" 
--logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" 
--max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
--max_unreachable_tasks_per_framework="1000" 
--min_allocatable_resources="cpus:0.01|mem:32" --port="5050" --quiet="false" 
--recovery_agent_removal_limit="100%" --registry="in_memory" 
--registry_fetch_timeout="1mins" --registry_gc_interval="15mins" 
--registry_max_agent_age="2weeks" --registry_max_agent_count="102400" 
--registry_store_timeout="100secs" --registry_strict="false" 
--require_agent_domain="false" --role_sorter="drf" --root_submissions="true" 
--version="false" --webui_dir="/usr/local/share/mesos/webui" 
--work_dir="/tmp/zLJ6wA/master" --zk_session_timeout="10secs"
I1106 20:21:29.369148 21440 master.cpp:508] Master only allowing authenticated 
frameworks to register
I1106 20:21:29.369156 21440 master.cpp:514] Master only allowing authenticated 
agents to register
I1106 20:21:29.369163 21440 master.cpp:520] Master only allowing authenticated 
HTTP frameworks to register
I1106 20:21:29.369168 21440 credentials.hpp:37] Loading credentials for 
authentication from '/tmp/zLJ6wA/credentials'
I1106 20:21:29.369271 21440 master.cpp:564] Using default 'crammd5' 
authenticator
I1106 20:21:29.369320 21440 http.cpp:1045] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-readonly'
I1106 20:21:29.369357 21440 http.cpp:1045] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-readwrite'
I1106 20:21:29.369383 21440 http.cpp:1045] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-scheduler'
I1106 20:21:29.369403 21440 master.cpp:643] Authorization enabled
I1106 20:21:29.369576 21438 hierarchical.cpp:177] Initialized hierarchical 
allocator process
I1106 20:21:29.369608 21438 whitelist_watcher.cpp:77] No whitelist given
I1106 20:21:29.370110 21440 master.cpp:2247] Elected as the leading master!
I1106 20:21:29.370126 21440 master.cpp:1727] Recovering from registrar
I1106 20:21:29.370163 21440 registrar.cpp:347] Recovering registrar
I1106 20:21:29.370302 21437 registrar.cpp:391] Successfully fetched the 
registry (0B) in 114944ns
I1106 20:21:29.370345 21437 registrar.cpp:495] Applied 1 operations in 6941ns; 
attempting to update the registry
I1106 20:21:29.370510 21442 registrar.cpp:552] Successfully updated the 
registry in 143104ns
I1106 20:21:29.370553 21442 registrar.cpp:424] Successfully recovered registrar
I1106 20:21:29.370631 21442 master.cpp:1840] Recovered 0 agents from the 
registry (172B); allowing 10mins for agents to re-register
I1106 20:21:29.370667 21438 hierarchical.cpp:215] Skipping recovery of 
hierarchical allocator: nothing to recover
I1106 20:21:29.372022 31384 isolator.cpp:136] Initialized the docker volume 
information root directory at '/run/mesos/isolators/docker/volume'
I1106 20:21:29.379148 31384 linux_launcher.cpp:145] Using 
/sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher
sh: hadoop: command not found
I1106 20:21:29.462009 31384 fetcher.cpp:69] Skipping URI fetcher plugin 
'hadoop' as it could not be created: Failed to create HDFS client: Hadoop 
client is not available, exit status: 32512
I1106 20:21:29.462098 31384 registry_puller.cpp:129] Creating registry puller 
with docker registry 'https://registry-1.docker.io'
I1106 20:21:29.462872 31384 provisioner.cpp:299] Using default backend 'copy'
W1106 20:21:29.464048 31384 process.cpp:2745] Attempted to spawn already 
running process files@172.16.10.72:46670
I1106 

[jira] [Commented] (MESOS-6949) SchedulerTest.MasterFailover is flaky

2018-11-06 Thread Joseph Wu (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-6949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16677498#comment-16677498
 ] 

Joseph Wu commented on MESOS-6949:
--

Another fix: https://reviews.apache.org/r/69267/

> SchedulerTest.MasterFailover is flaky
> -
>
> Key: MESOS-6949
> URL: https://issues.apache.org/jira/browse/MESOS-6949
> Project: Mesos
>  Issue Type: Bug
>  Components: test
> Environment: Observed on:
> CentOS 7 VM, libevent and SSL enabled;
> Ubuntu 14.04, cmake/clang, without libevent/SSL, on ASF CI
>Reporter: Greg Mann
>Assignee: Alexander Rukletsov
>Priority: Major
>  Labels: flaky-test, tests
> Fix For: 1.5.0
>
> Attachments: MasterFailover-badrun.txt, 
> SchedulerTest.MasterFailover-on-ASF-CI.txt, SchedulerTest.MasterFailover.txt, 
> SchedulerTest_MasterFailover_1_badrun.txt
>
>
> This was observed in a CentOS 7 VM, with libevent and SSL enabled:
> {code}
> W0118 22:38:33.789465  3407 scheduler.cpp:513] Dropping SUBSCRIBE: Scheduler 
> is in state DISCONNECTED
> I0118 22:38:33.811820  3408 scheduler.cpp:361] Connected with the master at 
> http://127.0.0.1:43211/master/api/v1/scheduler
> ../../src/tests/scheduler_tests.cpp:315: Failure
> Mock function called more times than expected - returning directly.
> Function call: connected(0x7fff97227550)
>  Expected: to be called once
>Actual: called twice - over-saturated and active
> {code}
> Find attached the entire log from a failed run.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-7971) PersistentVolumeEndpointsTest.EndpointCreateThenOfferRemove test is flaky

2018-11-06 Thread Joseph Wu (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-7971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16677306#comment-16677306
 ] 

Joseph Wu commented on MESOS-7971:
--

Slightly different logs observed on an internal CI run (Ubuntu 16, no SSL).  
One HTTP response in this run expects a 202, but gets a 409 instead.
{code}
[ RUN  ] PersistentVolumeEndpointsTest.EndpointCreateThenOfferRemove
I1106 19:50:14.650254 19563 cluster.cpp:162] Creating default 'local' authorizer
I1106 19:50:14.651284 19588 master.cpp:442] Master 
d5905469-73fc-4219-b939-c6056f1f62a1 (ip-172-16-10-48.ec2.internal) started on 
172.16.10.48:39946
I1106 19:50:14.651309 19588 master.cpp:444] Flags at startup: --acls="" 
--agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
--allocation_interval="50ms" --allocator="hierarchical" 
--authenticate_agents="true" --authenticate_frameworks="true" 
--authenticate_http_frameworks="true" --authenticate_http_readonly="true" 
--authenticate_http_readwrite="true" --authentication_v0_timeout="15secs" 
--authenticators="crammd5" --authorizers="local" 
--credentials="/tmp/fZatVl/credentials" --filter_gpu_resources="true" 
--framework_sorter="drf" --help="false" --hostname_lookup="true" 
--http_authenticators="basic" --http_framework_authenticators="basic" 
--initialize_driver_logging="true" --log_auto_initialize="true" 
--logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" 
--max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
--max_unreachable_tasks_per_framework="1000" 
--min_allocatable_resources="cpus:0.01|mem:32" --port="5050" --quiet="false" 
--recovery_agent_removal_limit="100%" --registry="in_memory" 
--registry_fetch_timeout="1mins" --registry_gc_interval="15mins" 
--registry_max_agent_age="2weeks" --registry_max_agent_count="102400" 
--registry_store_timeout="100secs" --registry_strict="false" 
--role_sorter="drf" --roles="role1" --root_submissions="true" --version="false" 
--webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/fZatVl/master" 
--zk_session_timeout="10secs"
I1106 19:50:14.651437 19588 master.cpp:494] Master only allowing authenticated 
frameworks to register
I1106 19:50:14.651448 19588 master.cpp:508] Master only allowing authenticated 
agents to register
I1106 19:50:14.651453 19588 master.cpp:521] Master only allowing authenticated 
HTTP frameworks to register
I1106 19:50:14.651459 19588 credentials.hpp:37] Loading credentials for 
authentication from '/tmp/fZatVl/credentials'
I1106 19:50:14.651548 19588 master.cpp:566] Using default 'crammd5' 
authenticator
I1106 19:50:14.651593 19588 http.cpp:1045] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-readonly'
I1106 19:50:14.651643 19588 http.cpp:1045] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-readwrite'
I1106 19:50:14.651672 19588 http.cpp:1045] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-scheduler'
I1106 19:50:14.651700 19588 master.cpp:646] Authorization enabled
W1106 19:50:14.651710 19588 master.cpp:709] The '--roles' flag is deprecated. 
This flag will be removed in the future. See the Mesos 0.27 upgrade notes for 
more information
I1106 19:50:14.651803 19590 hierarchical.cpp:173] Initialized hierarchical 
allocator process
I1106 19:50:14.651830 19590 whitelist_watcher.cpp:77] No whitelist given
I1106 19:50:14.652432 19590 master.cpp:2200] Elected as the leading master!
I1106 19:50:14.652454 19590 master.cpp:1739] Recovering from registrar
I1106 19:50:14.652506 19590 registrar.cpp:347] Recovering registrar
I1106 19:50:14.652595 19590 registrar.cpp:391] Successfully fetched the 
registry (0B) in 72960ns
I1106 19:50:14.652622 19590 registrar.cpp:495] Applied 1 operations in 5332ns; 
attempting to update the registry
I1106 19:50:14.656131 19586 registrar.cpp:552] Successfully updated the 
registry in 3.472128ms
I1106 19:50:14.656177 19586 registrar.cpp:424] Successfully recovered registrar
I1106 19:50:14.656266 19588 master.cpp:1838] Recovered 0 agents from the 
registry (168B); allowing 10mins for agents to re-register
I1106 19:50:14.656299 19588 hierarchical.cpp:211] Skipping recovery of 
hierarchical allocator: nothing to recover
W1106 19:50:14.657806 19563 process.cpp:3196] Attempted to spawn already 
running process files@172.16.10.48:39946
I1106 19:50:14.658203 19563 containerizer.cpp:246] Using isolation: 
posix/cpu,posix/mem,filesystem/posix,network/cni,environment_secret
I1106 19:50:14.661717 19563 linux_launcher.cpp:149] Using 
/sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher
I1106 19:50:14.662039 19563 provisioner.cpp:255] Using default backend 'overlay'
I1106 19:50:14.662547 19563 cluster.cpp:448] Creating default 'local' authorizer
I1106 19:50:14.662969 19589 slave.cpp:249] Mesos agent started on 
(378)@172.16.10.48:39946
I1106 19:50:14.662987 19589 slave.cpp:250] Flags at startup: --acls=""