Charles Natali created MESOS-10226: -------------------------------------- Summary: test suite hangs on ARM64 Key: MESOS-10226 URL: https://issues.apache.org/jira/browse/MESOS-10226 Project: Mesos Issue Type: Bug Reporter: Charles Natali Assignee: Charles Natali
Reported by [~mgrigorov]. {noformat} [ RUN ] NestedMesosContainerizerTest.ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace sh: 1: hadoop: not found Marked '/' as rslave I0726 11:59:17.812630 32 exec.cpp:164] Version: 1.12.0 I0726 11:59:17.827512 31 exec.cpp:237] Executor registered on agent 9076f44b-846d-4f00-a2dc-11f694cc1900-S0 I0726 11:59:17.830999 36 executor.cpp:190] Received SUBSCRIBED event I0726 11:59:17.832351 36 executor.cpp:194] Subscribed executor on martin-arm64 I0726 11:59:17.832775 36 executor.cpp:190] Received LAUNCH event I0726 11:59:17.834415 36 executor.cpp:722] Starting task d1bbb266-bee7-4c9d-929f-16aa41f4e9cf I0726 11:59:17.839910 36 executor.cpp:740] Forked command at 38 Preparing rootfs at '/tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace_1bL0mz/provisioner/containers/e8553a7c-145d-47a4-afd6-3a6cf326cd48/backends/overlay/rootfses/6a62b0ce-df7b-4bab-bf7c-633d9f860791' Changing root to /tmp/NestedMesosContainerizerTest_ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace_1bL0mz/provisioner/containers/e8553a7c-145d-47a4-afd6-3a6cf326cd48/backends/overlay/rootfses/6a62b0ce-df7b-4bab-bf7c-633d9f860791 Failed to execute 'sh': Exec format error I0726 11:59:18.113488 33 executor.cpp:1041] Command exited with status 1 (pid: 38) ../../src/tests/containerizer/nested_mesos_containerizer_tests.cpp:1111: Failure Mock function called more times than expected - returning directly. Function call: statusUpdate(0xffffc28527f0, @0xffffa2cf3a60 136-byte object <08-05 6C-B6 FF-FF 00-00 00-00 00-00 00-00 00-00 BE-A8 00-00 00-00 00-00 A8-F6 C0-B6 FF-FF 00-00 D0-04 05-94 FF-FF 00-00 A0-E6 04-94 FF-FF 00-00 A0-F1 05-94 FF-FF 00-00 60-78 04-94 FF-FF 00-00 ... 00-00 00-00 00-00 00-00 20-BD 01-78 FF-FF 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 20-5D 87-61 A5-3F D8-41 00-00 00-00 02-00 00-00 00-00 00-00 03-00 00-00>) Expected: to be called twice Actual: called 3 times - over-saturated and active I0726 11:59:19.117401 37 process.cpp:935] Stopped the socket accept loop{noformat} I asked him to provide a gdb traceback and we can see the following: {noformat} Thread 1 (Thread 0xffffa3bc2c60 (LWP 173475)): #0 0x0000ffffa518db20 in __libc_open64 (file=0xaaab00f342e0 "/tmp/7VXP3w/pipe", oflag=<optimized out>) at ../sysdeps/unix/sysv/linux/open64.c:48 #1 0x0000ffffa513adb0 in __GI__IO_file_open (fp=fp@entry=0xaaab00e439a0, filename=<optimized out>, posix_mode=<optimized out>, prot=prot@entry=438, read_write=8, is32not64=<optimized out>) at fileops.c:189 #2 0x0000ffffa513b0b0 in _IO_new_file_fopen (fp=fp@entry=0xaaab00e439a0, filename=filename@entry=0xaaab00f342e0 "/tmp/7VXP3w/pipe", mode=<optimized out>, mode@entry=0xaaaad762f3c8 "r", is32not64=is32not64@e ntry=1) at fileops.c:281 #3 0x0000ffffa512e0dc in __fopen_internal (filename=0xaaab00f342e0 "/tmp/7VXP3w/pipe", mode=0xaaaad762f3c8 "r", is32=1) at iofopen.c:75 #4 0x0000aaaad54f5350 in os::read (path="/tmp/7VXP3w/pipe") at ../../3rdparty/stout/include/stout/os/read.hpp:136 #5 0x0000aaaad74f1c1c in mesos::internal::tests::NestedMesosContainerizerTest_ROOT_CGROUPS_INTERNET_CURL_LaunchNestedDebugCheckMntNamespace_Test::TestBody (this=0xaaab00f88f50) at ../../src/tests/containeri zer/nested_mesos_containerizer_tests.cpp:1126 {noformat} Basically the test uses a named pipe to synchronize with the task being started, and if the task fails to start - in this case because we're trying to launch an x86 container on an arm64 host - the test will just hang reading from the pipe. I send Martin a tentative fix for him to test, and I'll open an MR if successful. -- This message was sent by Atlassian Jira (v8.3.4#803005)