Re: Compute event at Twitter HQ - 03/31

2016-03-30 Thread Ian Downes
streaming live but are working to get the talks recorded. > > On Wed, Mar 16, 2016 at 9:45 AM, haosdent <haosd...@gmail.com> wrote: > >> Would it have youtube live link? >> >> On Thu, Mar 17, 2016 at 12:38 AM, Ian Downes <ian.dow...@gmail.com> wrote: >> >

Compute event at Twitter HQ - 03/31

2016-03-19 Thread Ian Downes
Hello everyone, I'd like to call attention to an event the Compute group at Twitter is holding at the end of the month where there will be a few Aurora/Mesos-related talks: 1. David Robinson, one of our SREs, will talk about how our small team of SREs manages what is possibly the largest Mesos

Re: Selectively enable/disable oom killer on tasks

2015-12-11 Thread Ian Downes
e agents will be one way >> to go. >> >> Thanks again for your prompt response. >> >> On Fri, Dec 11, 2015 at 12:04 PM, Ian Downes <idow...@twitter.com> wrote: >> >>> The OOM killer is for memory isolation. The kernel will kill processes >>

Re: Selectively enable/disable oom killer on tasks

2015-12-11 Thread Ian Downes
mesos slaves with > "cgroups_enable_cfs" true and false. Is there a downside to doing this? > > On Fri, Dec 11, 2015 at 11:10 AM, Ian Downes <idow...@twitter.com> wrote: > >> No. Assuming you're using the MesosContainerizer and the cgroups/mem >> isolator then all co

Re: Is there a limit of the number of tasks that can be launched by Mesos on a slave?

2015-09-21 Thread Ian Downes
`man pthread_create` indicates either insufficient resources (memory) or a kernel imposed limit from either RLIMIT_NPROC or /proc/sys/kernel/threads-max. On my system: $ cat /proc/sys/kernel/threads-max 191114 which is a lot less than your `ulimit -u`. How many tasks are you able to launch on a

Re: [VOTE] Release Apache Mesos 0.23.0 (rc2)

2015-07-09 Thread Ian Downes
The ExamplesTest.PythonFramework test fails differently for me on CentOS5 with python 2.6.6. I presume we don't require 2.7? [idownes@hostname build]$ MESOS_VERBOSE=1 ./bin/mesos-tests.sh --gtest_filter=ExamplesTest.PythonFramework Source directory: /home/idownes/workspace/mesos Build directory:

Re: [VOTE] Release Apache Mesos 0.23.0 (rc2)

2015-07-09 Thread Ian Downes
://github.com/google/protobuf/issues/9), but that's probably another story. *Marco Massenzio* *Distributed Systems Engineer* On Thu, Jul 9, 2015 at 3:21 PM, Ian Downes idow...@twitter.com wrote: The ExamplesTest.PythonFramework test fails differently for me on CentOS5 with python 2.6.6. I presume we

Re: [VOTE] Release Apache Mesos 0.23.0 (rc1)

2015-07-07 Thread Ian Downes
-1 Failing tests: https://issues.apache.org/jira/browse/MESOS-2199 https://issues.apache.org/jira/browse/MESOS-3000 On Tue, Jul 7, 2015 at 8:52 AM, CCAAT cc...@tampabay.rr.com wrote: {++1} Non binding. Gentoo works great on x64. Mostly working on arm8v. Besides the more frequently release

Re: Failed to make check and run example framework

2015-06-01 Thread Ian Downes
Correct, this is because you don't have perf installed on your host. It is only needed for a particular isolator (perf_event) so you can install perf if you want to use it or simple skip these tests using GTEST_FILTER=-Perf* make check if you don't need it. I've filed

Re: Failed to make check and run example framework

2015-06-01 Thread Ian Downes
Perf is specific to the kernel version and different versions have different flags and output formats. Specifically, the code requires a kernel release = 2.6.39 but you're running a 2.6.32 kernel: your version of perf is not currently supported and you should skip those tests. The only effect of

Re: How slaves calculate resources

2015-05-21 Thread Ian Downes
You can specify the resources a slave should offer using the --resources flag. If unspecified, the slave determines (guesses) appropriate values. For memory it will call os::memory() as Alexander stated and, assuming memory is at least 2 GB then it will leave 1 GB for the system and offer the

Re: Problem using cgroups/mem isolator

2015-05-04 Thread Ian Downes
The link from Tim suggests that not all kernels enable this. Could you please file a bug report and we'll improve the handling of this in Mesos. ian On Sat, May 2, 2015 at 2:20 PM, CCAAT cc...@tampabay.rr.com wrote: On 05/02/2015 02:17 PM, Tim Chen wrote: Hi Arunabha, Which linux

Re: group memory limits are always 'soft' . how do I ensure info-pid.isNone() ?

2015-04-28 Thread Ian Downes
The line of code you cite is so the hard limit is not decreased on a running container because we can't (easily) reclaim anonymous memory from running processes. See the comment above the code. The info-pid.isNone() is for when cgroup is being configured (see the update() call at the end of

Re: group memory limits are always 'soft' . how do I ensure info-pid.isNone() ?

2015-04-28 Thread Ian Downes
Ahh, my bad I should have looked more closely at your version. This was a bug that was introduced when the memsw functionality came in and then fixed in 0.22.0. See: https://issues.apache.org/jira/browse/MESOS-2128 commit 24cb10a2d68 I suggest upgrading to = 0.22.0 or, if that's not desirable,

Re: CPU resource allocation: ignore?

2015-03-12 Thread Ian Downes
care at all about accounting usage of that resource then you should be able to set it to 0.0. As Ian mentioned, this won't be enforced with the cpu isolator disabled. -- Connor On Mar 11, 2015, at 08:43, Ian Downes idow...@twitter.com wrote: The --isolation flag for the slave determines how

Re: CPU resource allocation: ignore?

2015-03-11 Thread Ian Downes
The --isolation flag for the slave determines how resources are *isolated*, i.e., by not specifying any cpu isolator there will be no isolation between executors for cpu usage; the Linux scheduler will try to balance their execution. Cpu and memory are considered required resources for executors

Re: Difficulties building mesos-21.1-rc2 {rc1, .1 , .0}. Linux

2015-02-04 Thread Ian Downes
What version of glog are you trying to us? It appears the CheckOpMessageBuilder class was introduced in 0.3.3 http://upstream.rosalinux.ru/compat_reports/glog/0.3.2_to_0.3.3/abi_compat_report.html On Wed, Feb 4, 2015 at 4:52 PM, Alexander Gallego agall...@rbonut.com wrote: I did a little more

Removing slave --cgroups_subsystems flag

2015-01-26 Thread Ian Downes
The flag was deprecated in 0.18.0; it was still accepted but ignored. The flag will be removed in 0.22 and it will no longer be accepted by the slave. Please remove this flag from any configuration. https://issues.apache.org/jira/browse/MESOS-2184

Re: Accessing stdout/stderr of a task programmattically?

2015-01-21 Thread Ian Downes
The final component is the container_id. Take a look in src/slave/paths.hpp to see the directory layout. On Wed, Jan 21, 2015 at 8:50 AM, David Greenberg dsg123456...@gmail.com wrote: So, I've looked into this more, and the UUID in runs doesn't appear appear to be the task-id, executor-id, or

Re: Accessing stdout/stderr of a task programmattically?

2015-01-21 Thread Ian Downes
executor_ids (which would mean a new container for each run). On Wed, Jan 21, 2015 at 11:52 AM, David Greenberg dsg123456...@gmail.com wrote: Is it possible to know the container_id prior when you submit the TaskInfo? If not, how can you find it out? On Wed, Jan 21, 2015 at 1:17 PM, Ian Downes

Re: DockerContainerizer error on two slaves

2014-12-16 Thread Ian Downes
Can you also please post the output of these commands for a working and a non-working host? $ cat /proc/cgroups $ cat /proc/mounts Are you running inside a Docker or systemd container? On Tue, Dec 16, 2014 at 11:22 AM, Benjamin Mahler benjamin.mah...@gmail.com wrote: +Tim Chen (please chime

Re: Issue with cgroups_limit_swap

2014-11-19 Thread Ian Downes
This code was added a few months ago by Anton Lindström who I presume is using it? We don't use swap so it's never been tested in production here. I'll take a look at it shortly and get a fix out. Ian On Tue, Nov 18, 2014 at 6:08 PM, Bjoern Metzdorf bjo...@metzdorf.de wrote: Hi, we just ran

Fwd: Proposed changes to memory usage in ResourceStatistics [MESOS-2104]

2014-11-19 Thread Ian Downes
-- Forwarded message -- From: Ian Downes idow...@twitter.com Date: Tue, Nov 18, 2014 at 4:27 PM Subject: Proposed changes to memory usage in ResourceStatistics [MESOS-2104] To: d...@mesos.apache.org d...@mesos.apache.org I'd like to propose changes to the protobuf: 1) Correct

[RESULT][VOTE] Release Apache Mesos 0.21.0 (rc3)

2014-11-17 Thread Ian Downes
released to: https://repository.apache.org The website (http://mesos.apache.org) will be updated shortly to reflect this release. Thanks, Ian Downes

Re: OOM not always detected by Mesos Slave

2014-11-13 Thread Ian Downes
In reply to your original issue: It is possible to influence the kernel OOM killer in its decision on which process to kill to free memory. An OOM score is computed for each process and it depends on age (tends to kill shortest living) and usage (tends to kill larger memory users), i.e., this

Re: OOM not always detected by Mesos Slave

2014-11-13 Thread Ian Downes
in the reporting of TASK_FAILED when an OOM is involved. If any OOM happens I'd rather the entire process tree always be taken out and that it be reliably reported as such. On Thu, Nov 13, 2014 at 1:03 PM, Ian Downes ian.dow...@gmail.com wrote: In reply to your original issue: It is possible

[VOTE] Release Apache Mesos 0.21.0 (rc1)

2014-11-05 Thread Ian Downes
this package as Apache Mesos 0.21.0 [ ] -1 Do not release this package because ... Thanks, Ian Downes

Re: mesos isolation

2014-07-14 Thread Ian Downes
: Is there any reason preventing you from using the cgroups cpu and memory isolators? yes -- the reason was ignorance. I was not aware cgroups are not enabled by default. I will enable cgroups now. Thanks for your response! Regards, Asim On Fri, Jul 11, 2014 at 2:12 PM, Ian Downes ian.dow

Re: mesos isolation

2014-07-11 Thread Ian Downes
The posix/cpu isolator doesn't actually do any isolation - it only is useful to report cpu utilization. If you want to constrain the amount of cpu available to each container you must use the cgroups/cpu isolator. Is there any reason preventing you from using the cgroups cpu and memory isolators?

Re: cgroups OOM handler causing lockups?

2014-07-01 Thread Ian Downes
how this goes! Ian On Tue, Jul 1, 2014 at 10:17 AM, Vinod Kone vinodk...@gmail.com wrote: Hey Whitney, I'll let Ian Downes comment on the specific patches you linked, but at a high level the bug in MESOS-662 was due to Mesos trying to handle OOM situations in user space instead of letting

Re: cgroup cpu isolation policy

2014-01-22 Thread Ian Downes
Mesos supports all three methods; I don't really know which one is most frequently used but, if you don't have specific requirements, then I suggest cpu.shares (the default, provides fair proportional sharing of CPU) or cpu.cfs_quota_us (provides a hard upper bound on CPU). Use the