[jira] [Commented] (MESOS-8723) ROOT_HealthCheckUsingPersistentVolume is flaky.
[ https://issues.apache.org/jira/browse/MESOS-8723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16677515#comment-16677515 ] Joseph Wu commented on MESOS-8723: -- Another bad run, on the 1.6.x branch (Ubuntu 16) {code} [ RUN ] LauncherAndIsolationParam/PersistentVolumeDefaultExecutor.ROOT_HealthCheckUsingPersistentVolume/1 I1106 20:15:34.354775 32499 cluster.cpp:172] Creating default 'local' authorizer I1106 20:15:34.355837 22262 master.cpp:463] Master ee3a72ac-f1ea-4572-ab7a-424ecc6e517c (ip-172-16-10-158.ec2.internal) started on 172.16.10.158:46488 I1106 20:15:34.355865 22262 master.cpp:466] Flags at startup: --acls="" --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" --allocation_interval="1secs" --allocator="hierarchical" --authenticate_agents="true" --authenticate_frameworks="true" --authenticate_http_frameworks="true" --authenticate_http_readonly="true" --authenticate_http_readwrite="true" --authentication_v0_timeout="15secs" --authenticators="crammd5" --authorizers="local" --credentials="/tmp/yQNdFw/credentials" --filter_gpu_resources="true" --framework_sorter="drf" --help="false" --hostname_lookup="true" --http_authenticators="basic" --http_framework_authenticators="basic" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" --max_unreachable_tasks_per_framework="1000" --memory_profiling="false" --min_allocatable_resources="cpus:0.01|mem:32" --port="5050" --quiet="false" --recovery_agent_removal_limit="100%" --registry="in_memory" --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" --registry_store_timeout="100secs" --registry_strict="false" --require_agent_domain="false" --role_sorter="drf" --root_submissions="true" --version="false" --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/yQNdFw/master" --zk_session_timeout="10secs" I1106 20:15:34.356046 22262 master.cpp:515] Master only allowing authenticated frameworks to register I1106 20:15:34.356058 22262 master.cpp:521] Master only allowing authenticated agents to register I1106 20:15:34.356146 22262 master.cpp:527] Master only allowing authenticated HTTP frameworks to register I1106 20:15:34.356154 22262 credentials.hpp:37] Loading credentials for authentication from '/tmp/yQNdFw/credentials' I1106 20:15:34.356290 22262 master.cpp:571] Using default 'crammd5' authenticator I1106 20:15:34.356390 22262 http.cpp:959] Creating default 'basic' HTTP authenticator for realm 'mesos-master-readonly' I1106 20:15:34.356514 22262 http.cpp:959] Creating default 'basic' HTTP authenticator for realm 'mesos-master-readwrite' I1106 20:15:34.356560 22262 http.cpp:959] Creating default 'basic' HTTP authenticator for realm 'mesos-master-scheduler' I1106 20:15:34.356707 22262 master.cpp:652] Authorization enabled I1106 20:15:34.356874 22263 hierarchical.cpp:177] Initialized hierarchical allocator process I1106 20:15:34.356999 22263 whitelist_watcher.cpp:77] No whitelist given I1106 20:15:34.357659 22263 master.cpp:2162] Elected as the leading master! I1106 20:15:34.357681 22263 master.cpp:1717] Recovering from registrar I1106 20:15:34.357723 22263 registrar.cpp:339] Recovering registrar I1106 20:15:34.357962 22257 registrar.cpp:383] Successfully fetched the registry (0B) in 184832ns I1106 20:15:34.358003 22257 registrar.cpp:487] Applied 1 operations in 7418ns; attempting to update the registry I1106 20:15:34.358218 22262 registrar.cpp:544] Successfully updated the registry in 129792ns I1106 20:15:34.358259 22262 registrar.cpp:416] Successfully recovered registrar I1106 20:15:34.358475 22262 master.cpp:1831] Recovered 0 agents from the registry (176B); allowing 10mins for agents to reregister I1106 20:15:34.358522 22262 hierarchical.cpp:215] Skipping recovery of hierarchical allocator: nothing to recover I1106 20:15:34.359434 32499 containerizer.cpp:296] Using isolation { environment_secret, network/cni, filesystem/posix, volume/sandbox_path } I1106 20:15:34.361624 32499 linux_launcher.cpp:147] Using /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher I1106 20:15:34.362061 32499 provisioner.cpp:299] Using default backend 'overlay' W1106 20:15:34.363476 32499 process.cpp:2829] Attempted to spawn already running process files@172.16.10.158:46488 I1106 20:15:34.363585 32499 cluster.cpp:460] Creating default 'local' authorizer I1106 20:15:34.364131 22257 slave.cpp:265] Mesos agent started on (1090)@172.16.10.158:46488 I1106 20:15:34.364296 22257 slave.cpp:266] Flags at startup: --acls="" --appc_simple_discovery_uri_prefix="http://; --appc_store_dir="/tmp/LauncherAndIsolationParam_PersistentVolumeDefaultExecutor_ROOT_HealthCheckUsingPersistentVolume_1_9fXuUo/store/appc"
[jira] [Commented] (MESOS-9285) DockerVolumeIsolatorTest.ROOT_INTERNET_CURL_CommandTaskRootfsWithAbsolutePathVolume is flaky
[ https://issues.apache.org/jira/browse/MESOS-9285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16677505#comment-16677505 ] Joseph Wu commented on MESOS-9285: -- Observed on another internal CI run, albeit on CentOS7 and on the 1.5.x branch (https://github.com/apache/mesos/tree/6008868c715733b7d798279e9b39ae3483f7d955) {code} [ RUN ] DockerVolumeIsolatorTest.ROOT_INTERNET_CURL_CommandTaskRootfsWithAbsolutePathVolume I1106 20:21:29.367887 31384 cluster.cpp:172] Creating default 'local' authorizer I1106 20:21:29.368988 21440 master.cpp:457] Master 0d219c14-565e-46dd-b5c2-56bd9e97e4d1 (ip-172-16-10-72.ec2.internal) started on 172.16.10.72:46670 I1106 20:21:29.369009 21440 master.cpp:459] Flags at startup: --acls="" --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" --allocation_interval="1secs" --allocator="hierarchical" --authenticate_agents="true" --authenticate_frameworks="true" --authenticate_http_frameworks="true" --authenticate_http_readonly="true" --authenticate_http_readwrite="true" --authentication_v0_timeout="15secs" --authenticators="crammd5" --authorizers="local" --credentials="/tmp/zLJ6wA/credentials" --filter_gpu_resources="true" --framework_sorter="drf" --help="false" --hostname_lookup="true" --http_authenticators="basic" --http_framework_authenticators="basic" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" --max_unreachable_tasks_per_framework="1000" --min_allocatable_resources="cpus:0.01|mem:32" --port="5050" --quiet="false" --recovery_agent_removal_limit="100%" --registry="in_memory" --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" --registry_store_timeout="100secs" --registry_strict="false" --require_agent_domain="false" --role_sorter="drf" --root_submissions="true" --version="false" --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/zLJ6wA/master" --zk_session_timeout="10secs" I1106 20:21:29.369148 21440 master.cpp:508] Master only allowing authenticated frameworks to register I1106 20:21:29.369156 21440 master.cpp:514] Master only allowing authenticated agents to register I1106 20:21:29.369163 21440 master.cpp:520] Master only allowing authenticated HTTP frameworks to register I1106 20:21:29.369168 21440 credentials.hpp:37] Loading credentials for authentication from '/tmp/zLJ6wA/credentials' I1106 20:21:29.369271 21440 master.cpp:564] Using default 'crammd5' authenticator I1106 20:21:29.369320 21440 http.cpp:1045] Creating default 'basic' HTTP authenticator for realm 'mesos-master-readonly' I1106 20:21:29.369357 21440 http.cpp:1045] Creating default 'basic' HTTP authenticator for realm 'mesos-master-readwrite' I1106 20:21:29.369383 21440 http.cpp:1045] Creating default 'basic' HTTP authenticator for realm 'mesos-master-scheduler' I1106 20:21:29.369403 21440 master.cpp:643] Authorization enabled I1106 20:21:29.369576 21438 hierarchical.cpp:177] Initialized hierarchical allocator process I1106 20:21:29.369608 21438 whitelist_watcher.cpp:77] No whitelist given I1106 20:21:29.370110 21440 master.cpp:2247] Elected as the leading master! I1106 20:21:29.370126 21440 master.cpp:1727] Recovering from registrar I1106 20:21:29.370163 21440 registrar.cpp:347] Recovering registrar I1106 20:21:29.370302 21437 registrar.cpp:391] Successfully fetched the registry (0B) in 114944ns I1106 20:21:29.370345 21437 registrar.cpp:495] Applied 1 operations in 6941ns; attempting to update the registry I1106 20:21:29.370510 21442 registrar.cpp:552] Successfully updated the registry in 143104ns I1106 20:21:29.370553 21442 registrar.cpp:424] Successfully recovered registrar I1106 20:21:29.370631 21442 master.cpp:1840] Recovered 0 agents from the registry (172B); allowing 10mins for agents to re-register I1106 20:21:29.370667 21438 hierarchical.cpp:215] Skipping recovery of hierarchical allocator: nothing to recover I1106 20:21:29.372022 31384 isolator.cpp:136] Initialized the docker volume information root directory at '/run/mesos/isolators/docker/volume' I1106 20:21:29.379148 31384 linux_launcher.cpp:145] Using /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher sh: hadoop: command not found I1106 20:21:29.462009 31384 fetcher.cpp:69] Skipping URI fetcher plugin 'hadoop' as it could not be created: Failed to create HDFS client: Hadoop client is not available, exit status: 32512 I1106 20:21:29.462098 31384 registry_puller.cpp:129] Creating registry puller with docker registry 'https://registry-1.docker.io' I1106 20:21:29.462872 31384 provisioner.cpp:299] Using default backend 'copy' W1106 20:21:29.464048 31384 process.cpp:2745] Attempted to spawn already running process files@172.16.10.72:46670 I1106
[jira] [Commented] (MESOS-6949) SchedulerTest.MasterFailover is flaky
[ https://issues.apache.org/jira/browse/MESOS-6949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16677498#comment-16677498 ] Joseph Wu commented on MESOS-6949: -- Another fix: https://reviews.apache.org/r/69267/ > SchedulerTest.MasterFailover is flaky > - > > Key: MESOS-6949 > URL: https://issues.apache.org/jira/browse/MESOS-6949 > Project: Mesos > Issue Type: Bug > Components: test > Environment: Observed on: > CentOS 7 VM, libevent and SSL enabled; > Ubuntu 14.04, cmake/clang, without libevent/SSL, on ASF CI >Reporter: Greg Mann >Assignee: Alexander Rukletsov >Priority: Major > Labels: flaky-test, tests > Fix For: 1.5.0 > > Attachments: MasterFailover-badrun.txt, > SchedulerTest.MasterFailover-on-ASF-CI.txt, SchedulerTest.MasterFailover.txt, > SchedulerTest_MasterFailover_1_badrun.txt > > > This was observed in a CentOS 7 VM, with libevent and SSL enabled: > {code} > W0118 22:38:33.789465 3407 scheduler.cpp:513] Dropping SUBSCRIBE: Scheduler > is in state DISCONNECTED > I0118 22:38:33.811820 3408 scheduler.cpp:361] Connected with the master at > http://127.0.0.1:43211/master/api/v1/scheduler > ../../src/tests/scheduler_tests.cpp:315: Failure > Mock function called more times than expected - returning directly. > Function call: connected(0x7fff97227550) > Expected: to be called once >Actual: called twice - over-saturated and active > {code} > Find attached the entire log from a failed run. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-7971) PersistentVolumeEndpointsTest.EndpointCreateThenOfferRemove test is flaky
[ https://issues.apache.org/jira/browse/MESOS-7971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16677306#comment-16677306 ] Joseph Wu commented on MESOS-7971: -- Slightly different logs observed on an internal CI run (Ubuntu 16, no SSL). One HTTP response in this run expects a 202, but gets a 409 instead. {code} [ RUN ] PersistentVolumeEndpointsTest.EndpointCreateThenOfferRemove I1106 19:50:14.650254 19563 cluster.cpp:162] Creating default 'local' authorizer I1106 19:50:14.651284 19588 master.cpp:442] Master d5905469-73fc-4219-b939-c6056f1f62a1 (ip-172-16-10-48.ec2.internal) started on 172.16.10.48:39946 I1106 19:50:14.651309 19588 master.cpp:444] Flags at startup: --acls="" --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" --allocation_interval="50ms" --allocator="hierarchical" --authenticate_agents="true" --authenticate_frameworks="true" --authenticate_http_frameworks="true" --authenticate_http_readonly="true" --authenticate_http_readwrite="true" --authentication_v0_timeout="15secs" --authenticators="crammd5" --authorizers="local" --credentials="/tmp/fZatVl/credentials" --filter_gpu_resources="true" --framework_sorter="drf" --help="false" --hostname_lookup="true" --http_authenticators="basic" --http_framework_authenticators="basic" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" --max_unreachable_tasks_per_framework="1000" --min_allocatable_resources="cpus:0.01|mem:32" --port="5050" --quiet="false" --recovery_agent_removal_limit="100%" --registry="in_memory" --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" --registry_store_timeout="100secs" --registry_strict="false" --role_sorter="drf" --roles="role1" --root_submissions="true" --version="false" --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/fZatVl/master" --zk_session_timeout="10secs" I1106 19:50:14.651437 19588 master.cpp:494] Master only allowing authenticated frameworks to register I1106 19:50:14.651448 19588 master.cpp:508] Master only allowing authenticated agents to register I1106 19:50:14.651453 19588 master.cpp:521] Master only allowing authenticated HTTP frameworks to register I1106 19:50:14.651459 19588 credentials.hpp:37] Loading credentials for authentication from '/tmp/fZatVl/credentials' I1106 19:50:14.651548 19588 master.cpp:566] Using default 'crammd5' authenticator I1106 19:50:14.651593 19588 http.cpp:1045] Creating default 'basic' HTTP authenticator for realm 'mesos-master-readonly' I1106 19:50:14.651643 19588 http.cpp:1045] Creating default 'basic' HTTP authenticator for realm 'mesos-master-readwrite' I1106 19:50:14.651672 19588 http.cpp:1045] Creating default 'basic' HTTP authenticator for realm 'mesos-master-scheduler' I1106 19:50:14.651700 19588 master.cpp:646] Authorization enabled W1106 19:50:14.651710 19588 master.cpp:709] The '--roles' flag is deprecated. This flag will be removed in the future. See the Mesos 0.27 upgrade notes for more information I1106 19:50:14.651803 19590 hierarchical.cpp:173] Initialized hierarchical allocator process I1106 19:50:14.651830 19590 whitelist_watcher.cpp:77] No whitelist given I1106 19:50:14.652432 19590 master.cpp:2200] Elected as the leading master! I1106 19:50:14.652454 19590 master.cpp:1739] Recovering from registrar I1106 19:50:14.652506 19590 registrar.cpp:347] Recovering registrar I1106 19:50:14.652595 19590 registrar.cpp:391] Successfully fetched the registry (0B) in 72960ns I1106 19:50:14.652622 19590 registrar.cpp:495] Applied 1 operations in 5332ns; attempting to update the registry I1106 19:50:14.656131 19586 registrar.cpp:552] Successfully updated the registry in 3.472128ms I1106 19:50:14.656177 19586 registrar.cpp:424] Successfully recovered registrar I1106 19:50:14.656266 19588 master.cpp:1838] Recovered 0 agents from the registry (168B); allowing 10mins for agents to re-register I1106 19:50:14.656299 19588 hierarchical.cpp:211] Skipping recovery of hierarchical allocator: nothing to recover W1106 19:50:14.657806 19563 process.cpp:3196] Attempted to spawn already running process files@172.16.10.48:39946 I1106 19:50:14.658203 19563 containerizer.cpp:246] Using isolation: posix/cpu,posix/mem,filesystem/posix,network/cni,environment_secret I1106 19:50:14.661717 19563 linux_launcher.cpp:149] Using /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher I1106 19:50:14.662039 19563 provisioner.cpp:255] Using default backend 'overlay' I1106 19:50:14.662547 19563 cluster.cpp:448] Creating default 'local' authorizer I1106 19:50:14.662969 19589 slave.cpp:249] Mesos agent started on (378)@172.16.10.48:39946 I1106 19:50:14.662987 19589 slave.cpp:250] Flags at startup: --acls=""