Re: Running mesos-execute inside docker.

2015-06-01 Thread Tim Chen
Hi Giulio,

Can you share your exact docker commands to start the mesos slave and
master?

Thanks!

Tim

On Thu, May 21, 2015 at 12:17 PM, Giulio Eulisse giulio.euli...@cern.ch
wrote:

 Mmm, no this does not seem to work. The message is still there. Any other
 suggestions?

 --
 Ciao,
 Giulio

 On 21 May 2015, at 17:43, Tyson Norris wrote:

  You might try adding --pid=host - I found that running a docker based
 executor when running slave as a docker container also, I had to do this so
 the the pids are visible between containers.

 Tyson

 On May 21, 2015, at 6:04 AM, Giulio Eulisse giulio.euli...@cern.ch
 mailto:giulio.euli...@cern.ch wrote:


 Hi,

 I've a problem which can be reduced to running:

  mesos-execute --name=foo --command=uname -a  hostname
 --master=leader.mesos:5050



 inside a docker container. If I run without --net=host, it blocks
 completely (I guess the master / slave cannot communicate back to the
 framework), if I run with --net=host everything is fine but I get:

 May 21 14:59:13 cmsbuild30 mesos-slave[1514]: I0521 14:59:13.115659  1546
 slave.cpp:1533] Asked to shut down framework
  20150418-223037-3834547840-5050-6-2757 by master@128.142.142.228mailto:
 master@128.142.142.228:5050
 May 21 14:59:13 cmsbuild30 mesos-slave[1514]: W0521 14:59:13.117231  1546
 slave.cpp:1548] Cannot shut down unknown framework
 20150418-223037-3834547840-5050-6-2757


 in my host machine logs, which is not ideal. Any idea on how to do this
 correctly?

 The actual problem I'm trying to solve is using the mesos plugin for a
 jenkins instance which runs inside docker.

 --
 Ciao
 Giulio




RE: Re:

2015-06-01 Thread Aaron Carey
Ah perfect! Thanks for the info!


From: Adam Bordelon [a...@mesosphere.io]
Sent: 01 June 2015 06:48
To: user@mesos.apache.org
Subject: Re:

FYI, Mesos will exclude 1GB from what it auto-detects, so that the mesos-slave 
process and other system processes can use some memory. See 
https://github.com/apache/mesos/blob/0.22.1/src/slave/containerizer/containerizer.cpp#L107
If you explicitly set the memory requirements as Ondrej suggests, you can 
override this. However, you run the risk of your tasks consuming all the memory 
in the system so that Mesos itself cannot run effectively.

On Thu, May 21, 2015 at 4:24 AM, Ondrej Smola 
ondrej.sm...@gmail.commailto:ondrej.sm...@gmail.com wrote:
It is little more complicated and it depends on your environment - you need to 
give some RAM to OS and running processes (Mesos, Docker etc.). Quick test - VM 
with 3GB RAM and Mesos offers 1.9G - so there should is no problem related to 
your mesos setup (mesos in both cases offers around 63% of RAM).

About manual setup: you can use some automation tool (Ansible, Puppet) if you 
plan to setup large number of nodes.



2015-05-21 13:10 GMT+02:00 Aaron Carey aca...@ilm.commailto:aca...@ilm.com:
Thanks Ondrej,

Do I have to do this? I was under the impression if you didn't specify the 
resources then mesos would just offer everything available?

Thanks,
Aaron


From: Ondrej Smola [ondrej.sm...@gmail.commailto:ondrej.sm...@gmail.com]
Sent: 21 May 2015 12:04
To: user@mesos.apache.orgmailto:user@mesos.apache.org
Subject:

Hi Aaron,

You can set memory in /etc/mesos-slave/resources

example:

cpus(*):4;mem(*):16067;ports(*):[80-80,31000-32000]

with this configuration mesos offers 15.7GB RAM on one of our nodes.







2015-05-21 12:51 GMT+02:00 Aaron Carey aca...@ilm.commailto:aca...@ilm.com:
I've managed to increase the disksize by playing with some docker options,

Anyone have any idea about the memory?

Thanks,
Aaron


From: Aaron Carey [aca...@ilm.commailto:aca...@ilm.com]
Sent: 21 May 2015 11:19
To: user@mesos.apache.orgmailto:user@mesos.apache.org
Subject: How slaves calculate resources

Hi,

I was just trying to figure out how Mesos slaves report the amount of resources 
available to them on the host?

We have some slaves running on AWS t2.medium machines (2cpu, 4Gb RAM) with 32GB 
disks.

The slaves are running inside docker containers.

They report 2 cpus (correct), 2.5GB RAM and 4.9GB disk.

Any ideas why this is different from what I can see on the machine? (both on 
the host and within the slave docker container)?

Thanks,
Aaron





Failed to make check and run example framework

2015-06-01 Thread Qian Zhang
Hi,

I followed the exact steps in http://mesos.apache.org/gettingstarted/ to
try Mesos, what I am using is a RHEL 6.5 x86_64 virtual machine. But make
check failed:
[--] 1 test from PerfEventIsolatorTest
[ RUN  ] PerfEventIsolatorTest.ROOT_CGROUPS_Sample
F0601 22:45:27.851017 12655 isolator_tests.cpp:710] CHECK_SOME(isolator):
Perf is not supported
*** Check failure stack trace: ***
@ 0x7ffe4fa7dd60  google::LogMessage::Fail()
@ 0x7ffe4fa7dcb9  google::LogMessage::SendToLog()
@ 0x7ffe4fa7d697  google::LogMessage::Flush()
@ 0x7ffe4fa8061f  google::LogMessageFatal::~LogMessageFatal()
@   0x97fe94  _CheckFatal::~_CheckFatal()
@   0xbf7829
 
mesos::internal::tests::PerfEventIsolatorTest_ROOT_CGROUPS_Sample_Test::TestBody()
@  0x10e5fa5
 testing::internal::HandleSehExceptionsInMethodIfSupported()
@  0x10e0abe
 testing::internal::HandleExceptionsInMethodIfSupported()
@  0x10c783e  testing::Test::Run()
@  0x10c8078  testing::TestInfo::Run()
@  0x10c86ac  testing::TestCase::Run()
@  0x10cda39  testing::internal::UnitTestImpl::RunAllTests()
@  0x10e71f9
 testing::internal::HandleSehExceptionsInMethodIfSupported()
@  0x10e1899
 testing::internal::HandleExceptionsInMethodIfSupported()
@  0x10cc5c1  testing::UnitTest::Run()
@   0xc9fe96  main
@ 0x7ffe4bb21d1d  __libc_start_main
@   0x85ba69  (unknown)
I0601 22:45:29.976155 14327 exec.cpp:450] Slave exited, but framework has
checkpointing enabled. Waiting 15mins to reconnect with slave
20150601-224223-2574952640-39385-12655-S0
make[3]: *** [check-local] Aborted (core dumped)
make[3]: Leaving directory `/root/mesos-0.22.1/build/src'
make[2]: *** [check-am] Error 2
make[2]: Leaving directory `/root/mesos-0.22.1/build/src'
make[1]: *** [check] Error 2
make[1]: Leaving directory `/root/mesos-0.22.1/build/src'
make: *** [check-recursive] Error 1
[root@mesos build]# echo $?
2


And the example C++ framework also failed:
# ./src/test-framework --master=127.0.0.1:5050
I0601 23:02:44.646636 14828 sched.cpp:157] Version: 0.22.1
I0601 23:02:44.662256 14849 sched.cpp:254] New master detected at
master@127.0.0.1:5050
I0601 23:02:44.664237 14849 sched.cpp:264] No credentials provided.
Attempting to register without authentication
I0601 23:02:44.670964 14853 sched.cpp:448] Framework registered with
20150601-225015-16777343-5050-14668-
Registered!
Received offer 20150601-225015-16777343-5050-14668-O0 with cpus(*):4;
mem(*):2806; disk(*):40810; ports(*):[31000-32000]
Launching task 0 using offer 20150601-225015-16777343-5050-14668-O0
Launching task 1 using offer 20150601-225015-16777343-5050-14668-O0
Launching task 2 using offer 20150601-225015-16777343-5050-14668-O0
Launching task 3 using offer 20150601-225015-16777343-5050-14668-O0
Task 0 is in state TASK_LOST
Aborting because task 0 is in unexpected state TASK_LOST with reason 1 from
source 1 with message 'Executor terminated'
I0601 23:02:44.880982 14848 sched.cpp:1623] Asked to abort the driver
I0601 23:02:44.881239 14848 sched.cpp:856] Aborting framework
'20150601-225015-16777343-5050-14668-'
I0601 23:02:44.881921 14828 sched.cpp:1589] Asked to stop the driver



Any help will be appreciated, thanks!


EXECUTOR_SIGNAL_ESCALATION_TIMEOUT vs EXECUTOR_SHUTDOWN_GRACE_PERIOD vs docker_stop_timeout

2015-06-01 Thread Maciej Strzelecki
Hi,


EXECUTOR_SIGNAL_ESCALATION_TIMEOUT is set to 3 seconds, hard-coded.

EXECUTOR_SHUTDOWN_GRACE_PERIOD  has a default of 5, and can be configured

docker_stop_timeout - default of 0, configurable as well


I am running a jobsystem app that needs to clean up and write back some data 
before it dies.  Its run by mesos through docker and, preferably, it needs more 
than 3 seconds (15 would be safe)


For testing, I have set:


docker_stop_timeout = 20 secs


and


executor_shutdown_grace_period = 30secs


How do the above two play with EXECUTOR_SIGNAL_ESCALATION_TIMEOUT (which is 3 
seconds) ? Could someone explain the logic and order in which those params are 
enforced?




Maciej Strzelecki
Operations Engineer
Tel: +49 30 6098381-50
Fax: +49 851-213728-88
E-mail: mstrzele...@crealytics.de
www.crealytics.comhttp://www.crealytics.com
blog.crealytics.com

crealytics GmbH - Semantic PPC Advertising Technology

Brunngasse 1 - 94032 Passau - Germany
Oranienstraße 185 - 10999 Berlin - Germany

Managing directors: Andreas Reiffen, Christof König, Dr. Markus Kurch
Register court: Amtsgericht Passau, HRB 7466
Geschäftsführer: Andreas Reiffen, Christof König, Daniel Trost
Reg.-Gericht: Amtsgericht Passau, HRB 7466


Re: Failed to make check and run example framework

2015-06-01 Thread haosdent
Hi, @Qian Zhang. I think

```
# ./src/test-framework --master=127.0.0.1:5050
```
works as normal.

For the `make check`, I think the failed test case is because your machine
don't install some test dependency, it should not affect your normal use.

On Mon, Jun 1, 2015 at 11:20 PM, Qian Zhang zhq527...@gmail.com wrote:

 Hi,

 I followed the exact steps in http://mesos.apache.org/gettingstarted/ to
 try Mesos, what I am using is a RHEL 6.5 x86_64 virtual machine. But make
 check failed:
 [--] 1 test from PerfEventIsolatorTest
 [ RUN  ] PerfEventIsolatorTest.ROOT_CGROUPS_Sample
 F0601 22:45:27.851017 12655 isolator_tests.cpp:710] CHECK_SOME(isolator):
 Perf is not supported
 *** Check failure stack trace: ***
 @ 0x7ffe4fa7dd60  google::LogMessage::Fail()
 @ 0x7ffe4fa7dcb9  google::LogMessage::SendToLog()
 @ 0x7ffe4fa7d697  google::LogMessage::Flush()
 @ 0x7ffe4fa8061f  google::LogMessageFatal::~LogMessageFatal()
 @   0x97fe94  _CheckFatal::~_CheckFatal()
 @   0xbf7829
  
 mesos::internal::tests::PerfEventIsolatorTest_ROOT_CGROUPS_Sample_Test::TestBody()
 @  0x10e5fa5
  testing::internal::HandleSehExceptionsInMethodIfSupported()
 @  0x10e0abe
  testing::internal::HandleExceptionsInMethodIfSupported()
 @  0x10c783e  testing::Test::Run()
 @  0x10c8078  testing::TestInfo::Run()
 @  0x10c86ac  testing::TestCase::Run()
 @  0x10cda39  testing::internal::UnitTestImpl::RunAllTests()
 @  0x10e71f9
  testing::internal::HandleSehExceptionsInMethodIfSupported()
 @  0x10e1899
  testing::internal::HandleExceptionsInMethodIfSupported()
 @  0x10cc5c1  testing::UnitTest::Run()
 @   0xc9fe96  main
 @ 0x7ffe4bb21d1d  __libc_start_main
 @   0x85ba69  (unknown)
 I0601 22:45:29.976155 14327 exec.cpp:450] Slave exited, but framework has
 checkpointing enabled. Waiting 15mins to reconnect with slave
 20150601-224223-2574952640-39385-12655-S0
 make[3]: *** [check-local] Aborted (core dumped)
 make[3]: Leaving directory `/root/mesos-0.22.1/build/src'
 make[2]: *** [check-am] Error 2
 make[2]: Leaving directory `/root/mesos-0.22.1/build/src'
 make[1]: *** [check] Error 2
 make[1]: Leaving directory `/root/mesos-0.22.1/build/src'
 make: *** [check-recursive] Error 1
 [root@mesos build]# echo $?
 2


 And the example C++ framework also failed:
 # ./src/test-framework --master=127.0.0.1:5050
 I0601 23:02:44.646636 14828 sched.cpp:157] Version: 0.22.1
 I0601 23:02:44.662256 14849 sched.cpp:254] New master detected at
 master@127.0.0.1:5050
 I0601 23:02:44.664237 14849 sched.cpp:264] No credentials provided.
 Attempting to register without authentication
 I0601 23:02:44.670964 14853 sched.cpp:448] Framework registered with
 20150601-225015-16777343-5050-14668-
 Registered!
 Received offer 20150601-225015-16777343-5050-14668-O0 with cpus(*):4;
 mem(*):2806; disk(*):40810; ports(*):[31000-32000]
 Launching task 0 using offer 20150601-225015-16777343-5050-14668-O0
 Launching task 1 using offer 20150601-225015-16777343-5050-14668-O0
 Launching task 2 using offer 20150601-225015-16777343-5050-14668-O0
 Launching task 3 using offer 20150601-225015-16777343-5050-14668-O0
 Task 0 is in state TASK_LOST
 Aborting because task 0 is in unexpected state TASK_LOST with reason 1
 from source 1 with message 'Executor terminated'
 I0601 23:02:44.880982 14848 sched.cpp:1623] Asked to abort the driver
 I0601 23:02:44.881239 14848 sched.cpp:856] Aborting framework
 '20150601-225015-16777343-5050-14668-'
 I0601 23:02:44.881921 14828 sched.cpp:1589] Asked to stop the driver



 Any help will be appreciated, thanks!




-- 
Best Regards,
Haosdent Huang


Re: Failed to make check and run example framework

2015-06-01 Thread Ian Downes
Correct, this is because you don't have perf installed on your host. It is
only needed for a particular isolator (perf_event) so you can install perf
if you want to use it or simple skip these tests using
GTEST_FILTER=-Perf* make check if you don't need it.

I've filed https://issues.apache.org/jira/browse/MESOS-2789 to
automatically detect and skip these tests.

On Mon, Jun 1, 2015 at 9:53 AM, haosdent haosd...@gmail.com wrote:

 Hi, @Qian Zhang. I think

 ```
 # ./src/test-framework --master=127.0.0.1:5050
 ```
 works as normal.

 For the `make check`, I think the failed test case is because your machine
 don't install some test dependency, it should not affect your normal use.

 On Mon, Jun 1, 2015 at 11:20 PM, Qian Zhang zhq527...@gmail.com wrote:

 Hi,

 I followed the exact steps in http://mesos.apache.org/gettingstarted/ to
 try Mesos, what I am using is a RHEL 6.5 x86_64 virtual machine. But make
 check failed:
 [--] 1 test from PerfEventIsolatorTest
 [ RUN  ] PerfEventIsolatorTest.ROOT_CGROUPS_Sample
 F0601 22:45:27.851017 12655 isolator_tests.cpp:710] CHECK_SOME(isolator):
 Perf is not supported
 *** Check failure stack trace: ***
 @ 0x7ffe4fa7dd60  google::LogMessage::Fail()
 @ 0x7ffe4fa7dcb9  google::LogMessage::SendToLog()
 @ 0x7ffe4fa7d697  google::LogMessage::Flush()
 @ 0x7ffe4fa8061f  google::LogMessageFatal::~LogMessageFatal()
 @   0x97fe94  _CheckFatal::~_CheckFatal()
 @   0xbf7829
  
 mesos::internal::tests::PerfEventIsolatorTest_ROOT_CGROUPS_Sample_Test::TestBody()
 @  0x10e5fa5
  testing::internal::HandleSehExceptionsInMethodIfSupported()
 @  0x10e0abe
  testing::internal::HandleExceptionsInMethodIfSupported()
 @  0x10c783e  testing::Test::Run()
 @  0x10c8078  testing::TestInfo::Run()
 @  0x10c86ac  testing::TestCase::Run()
 @  0x10cda39  testing::internal::UnitTestImpl::RunAllTests()
 @  0x10e71f9
  testing::internal::HandleSehExceptionsInMethodIfSupported()
 @  0x10e1899
  testing::internal::HandleExceptionsInMethodIfSupported()
 @  0x10cc5c1  testing::UnitTest::Run()
 @   0xc9fe96  main
 @ 0x7ffe4bb21d1d  __libc_start_main
 @   0x85ba69  (unknown)
 I0601 22:45:29.976155 14327 exec.cpp:450] Slave exited, but framework has
 checkpointing enabled. Waiting 15mins to reconnect with slave
 20150601-224223-2574952640-39385-12655-S0
 make[3]: *** [check-local] Aborted (core dumped)
 make[3]: Leaving directory `/root/mesos-0.22.1/build/src'
 make[2]: *** [check-am] Error 2
 make[2]: Leaving directory `/root/mesos-0.22.1/build/src'
 make[1]: *** [check] Error 2
 make[1]: Leaving directory `/root/mesos-0.22.1/build/src'
 make: *** [check-recursive] Error 1
 [root@mesos build]# echo $?
 2


 And the example C++ framework also failed:
 # ./src/test-framework --master=127.0.0.1:5050
 I0601 23:02:44.646636 14828 sched.cpp:157] Version: 0.22.1
 I0601 23:02:44.662256 14849 sched.cpp:254] New master detected at
 master@127.0.0.1:5050
 I0601 23:02:44.664237 14849 sched.cpp:264] No credentials provided.
 Attempting to register without authentication
 I0601 23:02:44.670964 14853 sched.cpp:448] Framework registered with
 20150601-225015-16777343-5050-14668-
 Registered!
 Received offer 20150601-225015-16777343-5050-14668-O0 with cpus(*):4;
 mem(*):2806; disk(*):40810; ports(*):[31000-32000]
 Launching task 0 using offer 20150601-225015-16777343-5050-14668-O0
 Launching task 1 using offer 20150601-225015-16777343-5050-14668-O0
 Launching task 2 using offer 20150601-225015-16777343-5050-14668-O0
 Launching task 3 using offer 20150601-225015-16777343-5050-14668-O0
 Task 0 is in state TASK_LOST
 Aborting because task 0 is in unexpected state TASK_LOST with reason 1
 from source 1 with message 'Executor terminated'
 I0601 23:02:44.880982 14848 sched.cpp:1623] Asked to abort the driver
 I0601 23:02:44.881239 14848 sched.cpp:856] Aborting framework
 '20150601-225015-16777343-5050-14668-'
 I0601 23:02:44.881921 14828 sched.cpp:1589] Asked to stop the driver



 Any help will be appreciated, thanks!




 --
 Best Regards,
 Haosdent Huang



[DISCUSS] Renaming Mesos Slave

2015-06-01 Thread Adam Bordelon
There has been much discussion about finding a less offensive name than
Slave, and many of these thoughts have been captured in
https://issues.apache.org/jira/browse/MESOS-1478

I would like to open up the discussion on this topic for one week, and if
we cannot arrive at a lazy consensus, I will draft a proposal from the
discussion and call for a VOTE.
Here are the questions I would like us to answer:
1. What should we call the Mesos Slave node/host/machine?
2. What should we call the mesos-slave process (could be the same)?
3. Do we need to rename Mesos Master too?

Another topic worth discussing is the deprecation process, but we don't
necessarily need to decide on that at the same time as deciding the new
name(s).
4. How will we phase in the new name and phase out the old name?

Please voice your thoughts and opinions below.

Thanks!
-Adam-

P.S. My personal thoughts:
1. Mesos Worker [Node]
2. Mesos Worker or Agent
3. No
4. Carefully


Re: [DISCUSS] Renaming Mesos Slave

2015-06-01 Thread Connor Doyle
+1

1. Mesos Worker [node/host/machine]
2. Mesos Worker [process]
3. No, master/worker seems to address the issue with less changes.
4. Begin using the new name ASAP, add a disambiguation to the docs, and change 
old references over time.  Fixing the official name, even before changes are 
in place, would be a good first step.

--
Connor


 On Jun 1, 2015, at 14:18, Adam Bordelon a...@mesosphere.io wrote:
 
 There has been much discussion about finding a less offensive name than 
 Slave, and many of these thoughts have been captured in 
 https://issues.apache.org/jira/browse/MESOS-1478
 
 I would like to open up the discussion on this topic for one week, and if we 
 cannot arrive at a lazy consensus, I will draft a proposal from the 
 discussion and call for a VOTE.
 Here are the questions I would like us to answer:
 1. What should we call the Mesos Slave node/host/machine?
 2. What should we call the mesos-slave process (could be the same)?
 3. Do we need to rename Mesos Master too?
 
 Another topic worth discussing is the deprecation process, but we don't 
 necessarily need to decide on that at the same time as deciding the new 
 name(s).
 4. How will we phase in the new name and phase out the old name?
 
 Please voice your thoughts and opinions below.
 
 Thanks!
 -Adam-
 
 P.S. My personal thoughts:
 1. Mesos Worker [Node]
 2. Mesos Worker or Agent
 3. No
 4. Carefully



Re: Failed to make check and run example framework

2015-06-01 Thread Qian Zhang
Thanks Haosdent and Ian.

But in my machine, I already have perf installed, so I am not sure why
those test cases still failed.
# rpm -qa | grep perf
perf-2.6.32-431.el6.x86_64
# which perf
/usr/bin/perf

And can you please let me know why my example C++ framework works as
normal? I see the following message:
Task 0 is in state TASK_LOST
Aborting because task 0 is in unexpected state TASK_LOST with reason 1
from source 1 with message 'Executor terminated'
It seems task is in an unexpected state, right? And after
./src/test-framework --master=127.0.0.1:5050 is executed, I ran echo
$?, and its output is 1 which means something wrong, otherwise echo $?
should output 0, right?


Re: Failed to make check and run example framework

2015-06-01 Thread Ian Downes
Perf is specific to the kernel version and different versions have
different flags and output formats. Specifically, the code requires a
kernel release = 2.6.39 but you're running a 2.6.32 kernel: your version
of perf is not currently supported and you should skip those tests. The
only effect of this is that you cannot use the optional perf_event isolator.

On Mon, Jun 1, 2015 at 5:03 PM, Qian Zhang zhq527...@gmail.com wrote:

 Thanks Haosdent and Ian.

 But in my machine, I already have perf installed, so I am not sure why
 those test cases still failed.
 # rpm -qa | grep perf
 perf-2.6.32-431.el6.x86_64
 # which perf
 /usr/bin/perf

 And can you please let me know why my example C++ framework works as
 normal? I see the following message:
 Task 0 is in state TASK_LOST
 Aborting because task 0 is in unexpected state TASK_LOST with reason 1
 from source 1 with message 'Executor terminated'
 It seems task is in an unexpected state, right? And after
 ./src/test-framework --master=127.0.0.1:5050 is executed, I ran echo
 $?, and its output is 1 which means something wrong, otherwise echo $?
 should output 0, right?





Re: Failed to make check and run example framework

2015-06-01 Thread Qian Zhang
I reran the check with GTEST_FILTER=-Perf* make check, but it failed
again in another place:

[ RUN  ] UserCgroupIsolatorTest/1.ROOT_CGROUPS_UserCgroup
-bash: /sys/fs/cgroup/cpu/mesos/container/cgroup.procs: No such file or
directory
mkdir: cannot create directory `/sys/fs/cgroup/cpu/mesos/container/user':
No such file or directory
../../src/tests/isolator_tests.cpp:1127: Failure
Value of: os::system( su -  + UNPRIVILEGED_USERNAME +  -c 'mkdir  +
path::join(flags.cgroups_hierarchy, userCgroup) + ')
  Actual: 256
Expected: 0
-bash: /sys/fs/cgroup/cpu/mesos/container/user/cgroup.procs: No such file
or directory
../../src/tests/isolator_tests.cpp:1136: Failure
Value of: os::system( su -  + UNPRIVILEGED_USERNAME +  -c 'echo $$  +
path::join(flags.cgroups_hierarchy, userCgroup, cgroup.procs) + ')
  Actual: 256
Expected: 0
-bash: /sys/fs/cgroup/cpuacct/mesos/container/cgroup.procs: No such file or
directory
mkdir: cannot create directory
`/sys/fs/cgroup/cpuacct/mesos/container/user': No such file or directory
../../src/tests/isolator_tests.cpp:1127: Failure
Value of: os::system( su -  + UNPRIVILEGED_USERNAME +  -c 'mkdir  +
path::join(flags.cgroups_hierarchy, userCgroup) + ')
  Actual: 256
Expected: 0
-bash: /sys/fs/cgroup/cpuacct/mesos/container/user/cgroup.procs: No such
file or directory
../../src/tests/isolator_tests.cpp:1136: Failure
Value of: os::system( su -  + UNPRIVILEGED_USERNAME +  -c 'echo $$  +
path::join(flags.cgroups_hierarchy, userCgroup, cgroup.procs) + ')
  Actual: 256
Expected: 0
[  FAILED  ] UserCgroupIsolatorTest/1.ROOT_CGROUPS_UserCgroup, where
TypeParam = mesos::internal::slave::CgroupsCpushareIsolatorProcess (443 ms)
[--] 1 test from UserCgroupIsolatorTest/1 (443 ms total)

[--] 1 test from UserCgroupIsolatorTest/2, where TypeParam =
mesos::internal::slave::CgroupsPerfEventIsolatorProcess
userdel: user 'mesos.test.unprivileged.user' does not exist
[ RUN  ] UserCgroupIsolatorTest/2.ROOT_CGROUPS_UserCgroup
F0602 10:13:49.849755  4279 isolator_tests.cpp:1054] CHECK_SOME(isolator):
Perf is not supported
*** Check failure stack trace: ***
2015-06-02
10:13:49,863:4279(0x7fd4cf5fe700):ZOO_ERROR@handle_socket_error_msg@1697:
Socket [127.0.0.1:45737] zk retcode=-4, errno=111(Connection refused):
server refused to accept the client
@ 0x7fd569fc2d60  google::LogMessage::Fail()
@ 0x7fd569fc2cb9  google::LogMessage::SendToLog()
@ 0x7fd569fc2697  google::LogMessage::Flush()
@ 0x7fd569fc561f  google::LogMessageFatal::~LogMessageFatal()
@   0x97fe94  _CheckFatal::~_CheckFatal()
@   0xc10e55
 
mesos::internal::tests::UserCgroupIsolatorTest_ROOT_CGROUPS_UserCgroup_Test::TestBody()
@  0x10e5fa5
 testing::internal::HandleSehExceptionsInMethodIfSupported()
@  0x10e0abe
 testing::internal::HandleExceptionsInMethodIfSupported()
@  0x10c783e  testing::Test::Run()
@  0x10c8078  testing::TestInfo::Run()
@  0x10c86ac  testing::TestCase::Run()
@  0x10cda39  testing::internal::UnitTestImpl::RunAllTests()
@  0x10e71f9
 testing::internal::HandleSehExceptionsInMethodIfSupported()
@  0x10e1899
 testing::internal::HandleExceptionsInMethodIfSupported()
@  0x10cc5c1  testing::UnitTest::Run()
@   0xc9fe96  main
@   0x38d141ed1d  (unknown)
@   0x85ba69  (unknown)
I0602 10:13:52.680408  5947 exec.cpp:450] Slave exited, but framework has
checkpointing enabled. Waiting 15mins to reconnect with slave
20150602-101045-2574952640-43322-4279-S0
make[3]: *** [check-local] Aborted (core dumped)
make[3]: Leaving directory `/root/mesos-0.22.1/build/src'
make[2]: *** [check-am] Error 2
make[2]: Leaving directory `/root/mesos-0.22.1/build/src'
make[1]: *** [check] Error 2
make[1]: Leaving directory `/root/mesos-0.22.1/build/src'
make: *** [check-recursive] Error 1


Re: EXECUTOR_SIGNAL_ESCALATION_TIMEOUT vs EXECUTOR_SHUTDOWN_GRACE_PERIOD vs docker_stop_timeout

2015-06-01 Thread zhou weitao
+1, I'd like to know that also.

2015-06-01 23:36 GMT+08:00 Maciej Strzelecki 
maciej.strzele...@crealytics.com:

  Hi,


  EXECUTOR_SIGNAL_ESCALATION_TIMEOUT is set to 3 seconds, hard-coded.

 EXECUTOR_SHUTDOWN_GRACE_PERIOD  has a default of 5, and can be configured

 docker_stop_timeout - default of 0, configurable as well


  I am running a jobsystem app that needs to clean up and write back some
 data before it dies.  Its run by mesos through docker and, preferably, it
 needs more than 3 seconds (15 would be safe)


  For testing, I have set:


  docker_stop_timeout = 20 secs


  and


  executor_shutdown_grace_period = 30secs


  How do the above two play with EXECUTOR_SIGNAL_ESCALATION_TIMEOUT (which
 is 3 seconds) ? Could someone explain the logic and order in which those
 params are enforced?




   Maciej Strzelecki
 Operations Engineer
 Tel: +49 30 6098381-50
 Fax: +49 851-213728-88
 E-mail: mstrzele...@crealytics.de
 www.crealytics.com
 blog.crealytics.com

 crealytics GmbH - Semantic PPC Advertising Technology

 Brunngasse 1 - 94032 Passau - Germany
 Oranienstraße 185 - 10999 Berlin - Germany

 Managing directors: Andreas Reiffen, Christof König, Dr. Markus Kurch
 Register court: Amtsgericht Passau, HRB 7466
 Geschäftsführer: Andreas Reiffen, Christof König, Daniel Trost
 Reg.-Gericht: Amtsgericht Passau, HRB 7466