Re: Running mesos-execute inside docker.
Hi Giulio, Can you share your exact docker commands to start the mesos slave and master? Thanks! Tim On Thu, May 21, 2015 at 12:17 PM, Giulio Eulisse giulio.euli...@cern.ch wrote: Mmm, no this does not seem to work. The message is still there. Any other suggestions? -- Ciao, Giulio On 21 May 2015, at 17:43, Tyson Norris wrote: You might try adding --pid=host - I found that running a docker based executor when running slave as a docker container also, I had to do this so the the pids are visible between containers. Tyson On May 21, 2015, at 6:04 AM, Giulio Eulisse giulio.euli...@cern.ch mailto:giulio.euli...@cern.ch wrote: Hi, I've a problem which can be reduced to running: mesos-execute --name=foo --command=uname -a hostname --master=leader.mesos:5050 inside a docker container. If I run without --net=host, it blocks completely (I guess the master / slave cannot communicate back to the framework), if I run with --net=host everything is fine but I get: May 21 14:59:13 cmsbuild30 mesos-slave[1514]: I0521 14:59:13.115659 1546 slave.cpp:1533] Asked to shut down framework 20150418-223037-3834547840-5050-6-2757 by master@128.142.142.228mailto: master@128.142.142.228:5050 May 21 14:59:13 cmsbuild30 mesos-slave[1514]: W0521 14:59:13.117231 1546 slave.cpp:1548] Cannot shut down unknown framework 20150418-223037-3834547840-5050-6-2757 in my host machine logs, which is not ideal. Any idea on how to do this correctly? The actual problem I'm trying to solve is using the mesos plugin for a jenkins instance which runs inside docker. -- Ciao Giulio
RE: Re:
Ah perfect! Thanks for the info! From: Adam Bordelon [a...@mesosphere.io] Sent: 01 June 2015 06:48 To: user@mesos.apache.org Subject: Re: FYI, Mesos will exclude 1GB from what it auto-detects, so that the mesos-slave process and other system processes can use some memory. See https://github.com/apache/mesos/blob/0.22.1/src/slave/containerizer/containerizer.cpp#L107 If you explicitly set the memory requirements as Ondrej suggests, you can override this. However, you run the risk of your tasks consuming all the memory in the system so that Mesos itself cannot run effectively. On Thu, May 21, 2015 at 4:24 AM, Ondrej Smola ondrej.sm...@gmail.commailto:ondrej.sm...@gmail.com wrote: It is little more complicated and it depends on your environment - you need to give some RAM to OS and running processes (Mesos, Docker etc.). Quick test - VM with 3GB RAM and Mesos offers 1.9G - so there should is no problem related to your mesos setup (mesos in both cases offers around 63% of RAM). About manual setup: you can use some automation tool (Ansible, Puppet) if you plan to setup large number of nodes. 2015-05-21 13:10 GMT+02:00 Aaron Carey aca...@ilm.commailto:aca...@ilm.com: Thanks Ondrej, Do I have to do this? I was under the impression if you didn't specify the resources then mesos would just offer everything available? Thanks, Aaron From: Ondrej Smola [ondrej.sm...@gmail.commailto:ondrej.sm...@gmail.com] Sent: 21 May 2015 12:04 To: user@mesos.apache.orgmailto:user@mesos.apache.org Subject: Hi Aaron, You can set memory in /etc/mesos-slave/resources example: cpus(*):4;mem(*):16067;ports(*):[80-80,31000-32000] with this configuration mesos offers 15.7GB RAM on one of our nodes. 2015-05-21 12:51 GMT+02:00 Aaron Carey aca...@ilm.commailto:aca...@ilm.com: I've managed to increase the disksize by playing with some docker options, Anyone have any idea about the memory? Thanks, Aaron From: Aaron Carey [aca...@ilm.commailto:aca...@ilm.com] Sent: 21 May 2015 11:19 To: user@mesos.apache.orgmailto:user@mesos.apache.org Subject: How slaves calculate resources Hi, I was just trying to figure out how Mesos slaves report the amount of resources available to them on the host? We have some slaves running on AWS t2.medium machines (2cpu, 4Gb RAM) with 32GB disks. The slaves are running inside docker containers. They report 2 cpus (correct), 2.5GB RAM and 4.9GB disk. Any ideas why this is different from what I can see on the machine? (both on the host and within the slave docker container)? Thanks, Aaron
Failed to make check and run example framework
Hi, I followed the exact steps in http://mesos.apache.org/gettingstarted/ to try Mesos, what I am using is a RHEL 6.5 x86_64 virtual machine. But make check failed: [--] 1 test from PerfEventIsolatorTest [ RUN ] PerfEventIsolatorTest.ROOT_CGROUPS_Sample F0601 22:45:27.851017 12655 isolator_tests.cpp:710] CHECK_SOME(isolator): Perf is not supported *** Check failure stack trace: *** @ 0x7ffe4fa7dd60 google::LogMessage::Fail() @ 0x7ffe4fa7dcb9 google::LogMessage::SendToLog() @ 0x7ffe4fa7d697 google::LogMessage::Flush() @ 0x7ffe4fa8061f google::LogMessageFatal::~LogMessageFatal() @ 0x97fe94 _CheckFatal::~_CheckFatal() @ 0xbf7829 mesos::internal::tests::PerfEventIsolatorTest_ROOT_CGROUPS_Sample_Test::TestBody() @ 0x10e5fa5 testing::internal::HandleSehExceptionsInMethodIfSupported() @ 0x10e0abe testing::internal::HandleExceptionsInMethodIfSupported() @ 0x10c783e testing::Test::Run() @ 0x10c8078 testing::TestInfo::Run() @ 0x10c86ac testing::TestCase::Run() @ 0x10cda39 testing::internal::UnitTestImpl::RunAllTests() @ 0x10e71f9 testing::internal::HandleSehExceptionsInMethodIfSupported() @ 0x10e1899 testing::internal::HandleExceptionsInMethodIfSupported() @ 0x10cc5c1 testing::UnitTest::Run() @ 0xc9fe96 main @ 0x7ffe4bb21d1d __libc_start_main @ 0x85ba69 (unknown) I0601 22:45:29.976155 14327 exec.cpp:450] Slave exited, but framework has checkpointing enabled. Waiting 15mins to reconnect with slave 20150601-224223-2574952640-39385-12655-S0 make[3]: *** [check-local] Aborted (core dumped) make[3]: Leaving directory `/root/mesos-0.22.1/build/src' make[2]: *** [check-am] Error 2 make[2]: Leaving directory `/root/mesos-0.22.1/build/src' make[1]: *** [check] Error 2 make[1]: Leaving directory `/root/mesos-0.22.1/build/src' make: *** [check-recursive] Error 1 [root@mesos build]# echo $? 2 And the example C++ framework also failed: # ./src/test-framework --master=127.0.0.1:5050 I0601 23:02:44.646636 14828 sched.cpp:157] Version: 0.22.1 I0601 23:02:44.662256 14849 sched.cpp:254] New master detected at master@127.0.0.1:5050 I0601 23:02:44.664237 14849 sched.cpp:264] No credentials provided. Attempting to register without authentication I0601 23:02:44.670964 14853 sched.cpp:448] Framework registered with 20150601-225015-16777343-5050-14668- Registered! Received offer 20150601-225015-16777343-5050-14668-O0 with cpus(*):4; mem(*):2806; disk(*):40810; ports(*):[31000-32000] Launching task 0 using offer 20150601-225015-16777343-5050-14668-O0 Launching task 1 using offer 20150601-225015-16777343-5050-14668-O0 Launching task 2 using offer 20150601-225015-16777343-5050-14668-O0 Launching task 3 using offer 20150601-225015-16777343-5050-14668-O0 Task 0 is in state TASK_LOST Aborting because task 0 is in unexpected state TASK_LOST with reason 1 from source 1 with message 'Executor terminated' I0601 23:02:44.880982 14848 sched.cpp:1623] Asked to abort the driver I0601 23:02:44.881239 14848 sched.cpp:856] Aborting framework '20150601-225015-16777343-5050-14668-' I0601 23:02:44.881921 14828 sched.cpp:1589] Asked to stop the driver Any help will be appreciated, thanks!
EXECUTOR_SIGNAL_ESCALATION_TIMEOUT vs EXECUTOR_SHUTDOWN_GRACE_PERIOD vs docker_stop_timeout
Hi, EXECUTOR_SIGNAL_ESCALATION_TIMEOUT is set to 3 seconds, hard-coded. EXECUTOR_SHUTDOWN_GRACE_PERIOD has a default of 5, and can be configured docker_stop_timeout - default of 0, configurable as well I am running a jobsystem app that needs to clean up and write back some data before it dies. Its run by mesos through docker and, preferably, it needs more than 3 seconds (15 would be safe) For testing, I have set: docker_stop_timeout = 20 secs and executor_shutdown_grace_period = 30secs How do the above two play with EXECUTOR_SIGNAL_ESCALATION_TIMEOUT (which is 3 seconds) ? Could someone explain the logic and order in which those params are enforced? Maciej Strzelecki Operations Engineer Tel: +49 30 6098381-50 Fax: +49 851-213728-88 E-mail: mstrzele...@crealytics.de www.crealytics.comhttp://www.crealytics.com blog.crealytics.com crealytics GmbH - Semantic PPC Advertising Technology Brunngasse 1 - 94032 Passau - Germany Oranienstraße 185 - 10999 Berlin - Germany Managing directors: Andreas Reiffen, Christof König, Dr. Markus Kurch Register court: Amtsgericht Passau, HRB 7466 Geschäftsführer: Andreas Reiffen, Christof König, Daniel Trost Reg.-Gericht: Amtsgericht Passau, HRB 7466
Re: Failed to make check and run example framework
Hi, @Qian Zhang. I think ``` # ./src/test-framework --master=127.0.0.1:5050 ``` works as normal. For the `make check`, I think the failed test case is because your machine don't install some test dependency, it should not affect your normal use. On Mon, Jun 1, 2015 at 11:20 PM, Qian Zhang zhq527...@gmail.com wrote: Hi, I followed the exact steps in http://mesos.apache.org/gettingstarted/ to try Mesos, what I am using is a RHEL 6.5 x86_64 virtual machine. But make check failed: [--] 1 test from PerfEventIsolatorTest [ RUN ] PerfEventIsolatorTest.ROOT_CGROUPS_Sample F0601 22:45:27.851017 12655 isolator_tests.cpp:710] CHECK_SOME(isolator): Perf is not supported *** Check failure stack trace: *** @ 0x7ffe4fa7dd60 google::LogMessage::Fail() @ 0x7ffe4fa7dcb9 google::LogMessage::SendToLog() @ 0x7ffe4fa7d697 google::LogMessage::Flush() @ 0x7ffe4fa8061f google::LogMessageFatal::~LogMessageFatal() @ 0x97fe94 _CheckFatal::~_CheckFatal() @ 0xbf7829 mesos::internal::tests::PerfEventIsolatorTest_ROOT_CGROUPS_Sample_Test::TestBody() @ 0x10e5fa5 testing::internal::HandleSehExceptionsInMethodIfSupported() @ 0x10e0abe testing::internal::HandleExceptionsInMethodIfSupported() @ 0x10c783e testing::Test::Run() @ 0x10c8078 testing::TestInfo::Run() @ 0x10c86ac testing::TestCase::Run() @ 0x10cda39 testing::internal::UnitTestImpl::RunAllTests() @ 0x10e71f9 testing::internal::HandleSehExceptionsInMethodIfSupported() @ 0x10e1899 testing::internal::HandleExceptionsInMethodIfSupported() @ 0x10cc5c1 testing::UnitTest::Run() @ 0xc9fe96 main @ 0x7ffe4bb21d1d __libc_start_main @ 0x85ba69 (unknown) I0601 22:45:29.976155 14327 exec.cpp:450] Slave exited, but framework has checkpointing enabled. Waiting 15mins to reconnect with slave 20150601-224223-2574952640-39385-12655-S0 make[3]: *** [check-local] Aborted (core dumped) make[3]: Leaving directory `/root/mesos-0.22.1/build/src' make[2]: *** [check-am] Error 2 make[2]: Leaving directory `/root/mesos-0.22.1/build/src' make[1]: *** [check] Error 2 make[1]: Leaving directory `/root/mesos-0.22.1/build/src' make: *** [check-recursive] Error 1 [root@mesos build]# echo $? 2 And the example C++ framework also failed: # ./src/test-framework --master=127.0.0.1:5050 I0601 23:02:44.646636 14828 sched.cpp:157] Version: 0.22.1 I0601 23:02:44.662256 14849 sched.cpp:254] New master detected at master@127.0.0.1:5050 I0601 23:02:44.664237 14849 sched.cpp:264] No credentials provided. Attempting to register without authentication I0601 23:02:44.670964 14853 sched.cpp:448] Framework registered with 20150601-225015-16777343-5050-14668- Registered! Received offer 20150601-225015-16777343-5050-14668-O0 with cpus(*):4; mem(*):2806; disk(*):40810; ports(*):[31000-32000] Launching task 0 using offer 20150601-225015-16777343-5050-14668-O0 Launching task 1 using offer 20150601-225015-16777343-5050-14668-O0 Launching task 2 using offer 20150601-225015-16777343-5050-14668-O0 Launching task 3 using offer 20150601-225015-16777343-5050-14668-O0 Task 0 is in state TASK_LOST Aborting because task 0 is in unexpected state TASK_LOST with reason 1 from source 1 with message 'Executor terminated' I0601 23:02:44.880982 14848 sched.cpp:1623] Asked to abort the driver I0601 23:02:44.881239 14848 sched.cpp:856] Aborting framework '20150601-225015-16777343-5050-14668-' I0601 23:02:44.881921 14828 sched.cpp:1589] Asked to stop the driver Any help will be appreciated, thanks! -- Best Regards, Haosdent Huang
Re: Failed to make check and run example framework
Correct, this is because you don't have perf installed on your host. It is only needed for a particular isolator (perf_event) so you can install perf if you want to use it or simple skip these tests using GTEST_FILTER=-Perf* make check if you don't need it. I've filed https://issues.apache.org/jira/browse/MESOS-2789 to automatically detect and skip these tests. On Mon, Jun 1, 2015 at 9:53 AM, haosdent haosd...@gmail.com wrote: Hi, @Qian Zhang. I think ``` # ./src/test-framework --master=127.0.0.1:5050 ``` works as normal. For the `make check`, I think the failed test case is because your machine don't install some test dependency, it should not affect your normal use. On Mon, Jun 1, 2015 at 11:20 PM, Qian Zhang zhq527...@gmail.com wrote: Hi, I followed the exact steps in http://mesos.apache.org/gettingstarted/ to try Mesos, what I am using is a RHEL 6.5 x86_64 virtual machine. But make check failed: [--] 1 test from PerfEventIsolatorTest [ RUN ] PerfEventIsolatorTest.ROOT_CGROUPS_Sample F0601 22:45:27.851017 12655 isolator_tests.cpp:710] CHECK_SOME(isolator): Perf is not supported *** Check failure stack trace: *** @ 0x7ffe4fa7dd60 google::LogMessage::Fail() @ 0x7ffe4fa7dcb9 google::LogMessage::SendToLog() @ 0x7ffe4fa7d697 google::LogMessage::Flush() @ 0x7ffe4fa8061f google::LogMessageFatal::~LogMessageFatal() @ 0x97fe94 _CheckFatal::~_CheckFatal() @ 0xbf7829 mesos::internal::tests::PerfEventIsolatorTest_ROOT_CGROUPS_Sample_Test::TestBody() @ 0x10e5fa5 testing::internal::HandleSehExceptionsInMethodIfSupported() @ 0x10e0abe testing::internal::HandleExceptionsInMethodIfSupported() @ 0x10c783e testing::Test::Run() @ 0x10c8078 testing::TestInfo::Run() @ 0x10c86ac testing::TestCase::Run() @ 0x10cda39 testing::internal::UnitTestImpl::RunAllTests() @ 0x10e71f9 testing::internal::HandleSehExceptionsInMethodIfSupported() @ 0x10e1899 testing::internal::HandleExceptionsInMethodIfSupported() @ 0x10cc5c1 testing::UnitTest::Run() @ 0xc9fe96 main @ 0x7ffe4bb21d1d __libc_start_main @ 0x85ba69 (unknown) I0601 22:45:29.976155 14327 exec.cpp:450] Slave exited, but framework has checkpointing enabled. Waiting 15mins to reconnect with slave 20150601-224223-2574952640-39385-12655-S0 make[3]: *** [check-local] Aborted (core dumped) make[3]: Leaving directory `/root/mesos-0.22.1/build/src' make[2]: *** [check-am] Error 2 make[2]: Leaving directory `/root/mesos-0.22.1/build/src' make[1]: *** [check] Error 2 make[1]: Leaving directory `/root/mesos-0.22.1/build/src' make: *** [check-recursive] Error 1 [root@mesos build]# echo $? 2 And the example C++ framework also failed: # ./src/test-framework --master=127.0.0.1:5050 I0601 23:02:44.646636 14828 sched.cpp:157] Version: 0.22.1 I0601 23:02:44.662256 14849 sched.cpp:254] New master detected at master@127.0.0.1:5050 I0601 23:02:44.664237 14849 sched.cpp:264] No credentials provided. Attempting to register without authentication I0601 23:02:44.670964 14853 sched.cpp:448] Framework registered with 20150601-225015-16777343-5050-14668- Registered! Received offer 20150601-225015-16777343-5050-14668-O0 with cpus(*):4; mem(*):2806; disk(*):40810; ports(*):[31000-32000] Launching task 0 using offer 20150601-225015-16777343-5050-14668-O0 Launching task 1 using offer 20150601-225015-16777343-5050-14668-O0 Launching task 2 using offer 20150601-225015-16777343-5050-14668-O0 Launching task 3 using offer 20150601-225015-16777343-5050-14668-O0 Task 0 is in state TASK_LOST Aborting because task 0 is in unexpected state TASK_LOST with reason 1 from source 1 with message 'Executor terminated' I0601 23:02:44.880982 14848 sched.cpp:1623] Asked to abort the driver I0601 23:02:44.881239 14848 sched.cpp:856] Aborting framework '20150601-225015-16777343-5050-14668-' I0601 23:02:44.881921 14828 sched.cpp:1589] Asked to stop the driver Any help will be appreciated, thanks! -- Best Regards, Haosdent Huang
[DISCUSS] Renaming Mesos Slave
There has been much discussion about finding a less offensive name than Slave, and many of these thoughts have been captured in https://issues.apache.org/jira/browse/MESOS-1478 I would like to open up the discussion on this topic for one week, and if we cannot arrive at a lazy consensus, I will draft a proposal from the discussion and call for a VOTE. Here are the questions I would like us to answer: 1. What should we call the Mesos Slave node/host/machine? 2. What should we call the mesos-slave process (could be the same)? 3. Do we need to rename Mesos Master too? Another topic worth discussing is the deprecation process, but we don't necessarily need to decide on that at the same time as deciding the new name(s). 4. How will we phase in the new name and phase out the old name? Please voice your thoughts and opinions below. Thanks! -Adam- P.S. My personal thoughts: 1. Mesos Worker [Node] 2. Mesos Worker or Agent 3. No 4. Carefully
Re: [DISCUSS] Renaming Mesos Slave
+1 1. Mesos Worker [node/host/machine] 2. Mesos Worker [process] 3. No, master/worker seems to address the issue with less changes. 4. Begin using the new name ASAP, add a disambiguation to the docs, and change old references over time. Fixing the official name, even before changes are in place, would be a good first step. -- Connor On Jun 1, 2015, at 14:18, Adam Bordelon a...@mesosphere.io wrote: There has been much discussion about finding a less offensive name than Slave, and many of these thoughts have been captured in https://issues.apache.org/jira/browse/MESOS-1478 I would like to open up the discussion on this topic for one week, and if we cannot arrive at a lazy consensus, I will draft a proposal from the discussion and call for a VOTE. Here are the questions I would like us to answer: 1. What should we call the Mesos Slave node/host/machine? 2. What should we call the mesos-slave process (could be the same)? 3. Do we need to rename Mesos Master too? Another topic worth discussing is the deprecation process, but we don't necessarily need to decide on that at the same time as deciding the new name(s). 4. How will we phase in the new name and phase out the old name? Please voice your thoughts and opinions below. Thanks! -Adam- P.S. My personal thoughts: 1. Mesos Worker [Node] 2. Mesos Worker or Agent 3. No 4. Carefully
Re: Failed to make check and run example framework
Thanks Haosdent and Ian. But in my machine, I already have perf installed, so I am not sure why those test cases still failed. # rpm -qa | grep perf perf-2.6.32-431.el6.x86_64 # which perf /usr/bin/perf And can you please let me know why my example C++ framework works as normal? I see the following message: Task 0 is in state TASK_LOST Aborting because task 0 is in unexpected state TASK_LOST with reason 1 from source 1 with message 'Executor terminated' It seems task is in an unexpected state, right? And after ./src/test-framework --master=127.0.0.1:5050 is executed, I ran echo $?, and its output is 1 which means something wrong, otherwise echo $? should output 0, right?
Re: Failed to make check and run example framework
Perf is specific to the kernel version and different versions have different flags and output formats. Specifically, the code requires a kernel release = 2.6.39 but you're running a 2.6.32 kernel: your version of perf is not currently supported and you should skip those tests. The only effect of this is that you cannot use the optional perf_event isolator. On Mon, Jun 1, 2015 at 5:03 PM, Qian Zhang zhq527...@gmail.com wrote: Thanks Haosdent and Ian. But in my machine, I already have perf installed, so I am not sure why those test cases still failed. # rpm -qa | grep perf perf-2.6.32-431.el6.x86_64 # which perf /usr/bin/perf And can you please let me know why my example C++ framework works as normal? I see the following message: Task 0 is in state TASK_LOST Aborting because task 0 is in unexpected state TASK_LOST with reason 1 from source 1 with message 'Executor terminated' It seems task is in an unexpected state, right? And after ./src/test-framework --master=127.0.0.1:5050 is executed, I ran echo $?, and its output is 1 which means something wrong, otherwise echo $? should output 0, right?
Re: Failed to make check and run example framework
I reran the check with GTEST_FILTER=-Perf* make check, but it failed again in another place: [ RUN ] UserCgroupIsolatorTest/1.ROOT_CGROUPS_UserCgroup -bash: /sys/fs/cgroup/cpu/mesos/container/cgroup.procs: No such file or directory mkdir: cannot create directory `/sys/fs/cgroup/cpu/mesos/container/user': No such file or directory ../../src/tests/isolator_tests.cpp:1127: Failure Value of: os::system( su - + UNPRIVILEGED_USERNAME + -c 'mkdir + path::join(flags.cgroups_hierarchy, userCgroup) + ') Actual: 256 Expected: 0 -bash: /sys/fs/cgroup/cpu/mesos/container/user/cgroup.procs: No such file or directory ../../src/tests/isolator_tests.cpp:1136: Failure Value of: os::system( su - + UNPRIVILEGED_USERNAME + -c 'echo $$ + path::join(flags.cgroups_hierarchy, userCgroup, cgroup.procs) + ') Actual: 256 Expected: 0 -bash: /sys/fs/cgroup/cpuacct/mesos/container/cgroup.procs: No such file or directory mkdir: cannot create directory `/sys/fs/cgroup/cpuacct/mesos/container/user': No such file or directory ../../src/tests/isolator_tests.cpp:1127: Failure Value of: os::system( su - + UNPRIVILEGED_USERNAME + -c 'mkdir + path::join(flags.cgroups_hierarchy, userCgroup) + ') Actual: 256 Expected: 0 -bash: /sys/fs/cgroup/cpuacct/mesos/container/user/cgroup.procs: No such file or directory ../../src/tests/isolator_tests.cpp:1136: Failure Value of: os::system( su - + UNPRIVILEGED_USERNAME + -c 'echo $$ + path::join(flags.cgroups_hierarchy, userCgroup, cgroup.procs) + ') Actual: 256 Expected: 0 [ FAILED ] UserCgroupIsolatorTest/1.ROOT_CGROUPS_UserCgroup, where TypeParam = mesos::internal::slave::CgroupsCpushareIsolatorProcess (443 ms) [--] 1 test from UserCgroupIsolatorTest/1 (443 ms total) [--] 1 test from UserCgroupIsolatorTest/2, where TypeParam = mesos::internal::slave::CgroupsPerfEventIsolatorProcess userdel: user 'mesos.test.unprivileged.user' does not exist [ RUN ] UserCgroupIsolatorTest/2.ROOT_CGROUPS_UserCgroup F0602 10:13:49.849755 4279 isolator_tests.cpp:1054] CHECK_SOME(isolator): Perf is not supported *** Check failure stack trace: *** 2015-06-02 10:13:49,863:4279(0x7fd4cf5fe700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:45737] zk retcode=-4, errno=111(Connection refused): server refused to accept the client @ 0x7fd569fc2d60 google::LogMessage::Fail() @ 0x7fd569fc2cb9 google::LogMessage::SendToLog() @ 0x7fd569fc2697 google::LogMessage::Flush() @ 0x7fd569fc561f google::LogMessageFatal::~LogMessageFatal() @ 0x97fe94 _CheckFatal::~_CheckFatal() @ 0xc10e55 mesos::internal::tests::UserCgroupIsolatorTest_ROOT_CGROUPS_UserCgroup_Test::TestBody() @ 0x10e5fa5 testing::internal::HandleSehExceptionsInMethodIfSupported() @ 0x10e0abe testing::internal::HandleExceptionsInMethodIfSupported() @ 0x10c783e testing::Test::Run() @ 0x10c8078 testing::TestInfo::Run() @ 0x10c86ac testing::TestCase::Run() @ 0x10cda39 testing::internal::UnitTestImpl::RunAllTests() @ 0x10e71f9 testing::internal::HandleSehExceptionsInMethodIfSupported() @ 0x10e1899 testing::internal::HandleExceptionsInMethodIfSupported() @ 0x10cc5c1 testing::UnitTest::Run() @ 0xc9fe96 main @ 0x38d141ed1d (unknown) @ 0x85ba69 (unknown) I0602 10:13:52.680408 5947 exec.cpp:450] Slave exited, but framework has checkpointing enabled. Waiting 15mins to reconnect with slave 20150602-101045-2574952640-43322-4279-S0 make[3]: *** [check-local] Aborted (core dumped) make[3]: Leaving directory `/root/mesos-0.22.1/build/src' make[2]: *** [check-am] Error 2 make[2]: Leaving directory `/root/mesos-0.22.1/build/src' make[1]: *** [check] Error 2 make[1]: Leaving directory `/root/mesos-0.22.1/build/src' make: *** [check-recursive] Error 1
Re: EXECUTOR_SIGNAL_ESCALATION_TIMEOUT vs EXECUTOR_SHUTDOWN_GRACE_PERIOD vs docker_stop_timeout
+1, I'd like to know that also. 2015-06-01 23:36 GMT+08:00 Maciej Strzelecki maciej.strzele...@crealytics.com: Hi, EXECUTOR_SIGNAL_ESCALATION_TIMEOUT is set to 3 seconds, hard-coded. EXECUTOR_SHUTDOWN_GRACE_PERIOD has a default of 5, and can be configured docker_stop_timeout - default of 0, configurable as well I am running a jobsystem app that needs to clean up and write back some data before it dies. Its run by mesos through docker and, preferably, it needs more than 3 seconds (15 would be safe) For testing, I have set: docker_stop_timeout = 20 secs and executor_shutdown_grace_period = 30secs How do the above two play with EXECUTOR_SIGNAL_ESCALATION_TIMEOUT (which is 3 seconds) ? Could someone explain the logic and order in which those params are enforced? Maciej Strzelecki Operations Engineer Tel: +49 30 6098381-50 Fax: +49 851-213728-88 E-mail: mstrzele...@crealytics.de www.crealytics.com blog.crealytics.com crealytics GmbH - Semantic PPC Advertising Technology Brunngasse 1 - 94032 Passau - Germany Oranienstraße 185 - 10999 Berlin - Germany Managing directors: Andreas Reiffen, Christof König, Dr. Markus Kurch Register court: Amtsgericht Passau, HRB 7466 Geschäftsführer: Andreas Reiffen, Christof König, Daniel Trost Reg.-Gericht: Amtsgericht Passau, HRB 7466