Re: hostname in task

2019-08-03 Thread James Peach
> On Aug 3, 2019, at 10:59 PM, Marc Roos wrote: > > > I read you can add a hostname option to the container in this issue[0], > however I still have the uuid. Is this in available in mesos 1.8? Yep. > Can I > somewhere read all these options? Like here[1] The Mesos API is defined in the

Re: On adding a debug endpoint for Mesos containerizer

2019-06-05 Thread James Peach
I really like this proposal and I think that it would help opertional teams a lot. Let’s make sure that it is well documented :) > On Jun 5, 2019, at 1:05 AM, Andrei Budnik wrote: > > Hi folks, > > We have been encountering container stuck issues for quite a long time. Some > of these issues

Re: ssl mesos-executor not using /etc/default/mesos

2019-02-18 Thread James Peach
> On Feb 16, 2019, at 9:46 AM, Marc Roos wrote: > > > > Looks like the mesos-executor is not using /etc/default/mesos > environment variables Depending on your configuration, the executor runs inside the container, which means that /etc/default/mesos is probably not available. > > If I

Re: How is running 1.7.0 in production?

2018-11-13 Thread James Peach
> On Nov 13, 2018, at 5:45 PM, Stuart Elston wrote: > > Hi everyone, > > We are contemplating an upgrade to Mesos 1.7.0 but are generally a little > wary of running .0 releases. Has anyone encountered any showstoppers while > running 1.7.0? We'd be curious to hear your experiences! I’ve b

[ANNOUNCE] mesos_exporter 1.1.1 released

2018-10-25 Thread James Peach
Hi all, Just a quick note to say that mesos_exporter 1.1.1 has been released. This is a bug fix release that fixes a regression I introduced to v1.1.0. Source code an binaries are available on Github. https://github.com/mesos/mesos_exporter/releases/tag/v1.1.1 Thanks to Chase Sillevis who cont

Re: Propose to run debug container as the same user of its parent container by default

2018-10-25 Thread James Peach
> On Oct 23, 2018, at 7:47 PM, Qian Zhang wrote: > > Hi all, > > Currently when launching a debug container (e.g., via `dcos task exec` or > command health check) to debug a task, by default Mesos agent will use the > executor's user as the debug container's user. There are actually 2 cases:

Re: [VOTE] Release Apache Mesos 1.7.0 (rc3)

2018-09-14 Thread James Peach
+1 (binding) make check on Fedora 28 > On Sep 11, 2018, at 11:09 AM, Gastón Kleiman wrote: > > Hi all, > > Please vote on releasing the following candidate as Apache Mesos 1.7.0. > > > 1.7.0 includes the following: > ---

Re: Libevent bundling ahead.

2018-09-12 Thread James Peach
> On Sep 11, 2018, at 6:14 PM, Till Toenshoff wrote: > > Hey All, > > We are considering bundling/vendoring libevent 2.0.22 with upcoming releases > of Mesos. > > Let me explain the motivation and then go into some details. > > Due to https://issues.apache.org/jira/browse/MESOS-7076, SSL b

Re: make check failed, but mesos-tests.sh --gtest_filter="SVNTest.DiffPatch" tests passed

2018-09-04 Thread James Peach
This might be caused by inconsistent linking in Homebrew. Try forcing Homebrew to build svn from source, something like this: brew install --force --build-from-source subversion > On Sep 4, 2018, at 2:29 AM, Chang Shawn wrote: > > After 'make' succesfully on my macOS 10.13.6, I run 'make chec

Re: [VOTE] Release Apache Mesos 1.7.0 (rc2)

2018-08-29 Thread James Peach
+1 (binding) Built and tested on Fedora 28 (clang). > On Aug 24, 2018, at 4:42 PM, Chun-Hung Hsiao wrote: > > Hi all, > > Please vote on releasing the following candidate as Apache Mesos 1.7.0. > > > 1.7.0 includes the following: >

[ANNOUNCE] mesos_exporter 1.1.0 released

2018-08-23 Thread James Peach
unity, and to the following contributors: Alan Bover, Eric Lubow, Hector Fernandez, Jack Thomasson, James Peach, Jonathan Sokolowski, Philip Norman, Stephan Erb, Trevor Wood and Vinod Kone. cheers, James

Re: Volume ownership and permission

2018-08-16 Thread James Peach
gt;> >>> I'd argue that the "rw" on the sandbox path is analogous to the "rw" >> mount option. That is, it is mounted writeable, but says nothing about >> which credentials can write to it. >> >> Can you please elaborate a bit on this? W

Re: [VOTE] Move the project repos to gitbox

2018-07-17 Thread James Peach
> On Jul 17, 2018, at 7:58 AM, Vinod Kone wrote: > > Hi, > > As discussed in another thread and in the committers sync, there seem to be > heavy interest in moving our project repos ("mesos", "mesos-site") from the > "git-wip" git server to the new "gitbox" server to better avail GitHub >

implicit mesos-local support in scheduler drivers

2018-07-03 Thread James Peach
Hi all, I found recently, that the Mesos scheduler drivers will implicitly spin up a `mesos-local` cluster for testing if your scheduler uses the Mesos scheduler drivers, specifies “local” as the master, and exports “MESOS_" environment variables to configure the master. Do any scheduler author

Re: Support image and resource pre-fetching in Mesos

2018-06-20 Thread James Peach
> On Jun 20, 2018, at 4:02 PM, Zhitao Li wrote: > > Hi, > > We have been working on optimizing container launch latency in our Mesos > based stack, How are you measuring the launch latency? > and one of the optimization we are considering is to pre-fetch docker image > and any necessary re

Re: narrowing task sandbox permissions

2018-06-15 Thread James Peach
> On Jun 15, 2018, at 11:06 AM, Zhitao Li wrote: > > Sorry for getting back to this really late, but we got bit by this behavior > change in our environment. > > The broken scenario we had: > > 1. We are using Aurora to launch docker containerizer based tasks on > Mesos; > 2. Most of o

Re: Deprecating the Python bindings

2018-06-06 Thread James Peach
> On May 9, 2018, at 11:51 AM, Andrew Schwartzmeyer > wrote: > > Hi all, > > There are two parallel efforts underway that would both benefit from > officially deprecating (and then removing) the Python bindings. The first > effort is the move to the CMake system: adding support to generate

Re: Volume ownership and permission

2018-04-26 Thread James Peach
t it, what is our recommended > solution in the doc? > > > > Regards, > Qian Zhang > > On Fri, Apr 27, 2018 at 1:16 AM, James Peach wrote: > >> I commented on the doc, but at least some of the issues raised there I >> would not regard as issues.

Re: Volume ownership and permission

2018-04-26 Thread James Peach
I commented on the doc, but at least some of the issues raised there I would not regard as issues. Rather, they are about setting expectations correctly and ensuring that we are documenting (and maybe enforcing) sensible behavior. I'm not that keen on Mesos automatically "fixing" filesystem per

Re: Update the *Minimum Linux Kernel version* supported on Mesos

2018-04-05 Thread James Peach
> On Apr 5, 2018, at 5:00 AM, Andrei Budnik wrote: > > Hi All, > > We would like to update minimum supported Linux kernel from 2.6.23 to > 2.6.28. > Linux kernel supports cgroups v1 starting from 2.6.24, but `freezer` cgroup > functionality was merged into 2.6.28, which supports nested contain

Re: Support deadline for tasks

2018-03-23 Thread James Peach
> On Mar 23, 2018, at 9:57 AM, Renan DelValle wrote: > > Hi Zhitao, > > Since this is something that could potentially be handled by the executor > and/or framework, I was wondering if you could speak to the advantages of > making this a TaskInfo primitive vs having the executor (or even the

Re: Support deadline for tasks

2018-03-22 Thread James Peach
> On Mar 22, 2018, at 10:06 AM, Zhitao Li wrote: > > In our environment, we run a lot of batch jobs, some of which have tight > timeline. If any tasks in the job runs longer than x hours, it does not make > sense to run it anymore. > > For instance, a team would submit a job which builds a

Re: Build Failure

2018-03-19 Thread James Peach
> On Mar 19, 2018, at 4:38 PM, Shiv Deepak wrote: > > Thanks. I installed unzip. That worked. FWIW the test suite was fixed for 1.6 in 0da7b6cc37786df94465ae98948fd7be669a843e. > > On Mon, Mar 19, 2018 at 3:48 PM, Tomek Janiszewski wrote: > Do you have unzip installed? Can you try unzippin

Re: [VOTE] Release Apache Mesos 1.5.0 (rc2)

2018-02-07 Thread James Peach
+1 (binding) Tested on Fedora 27 > On Feb 1, 2018, at 5:36 PM, Gilbert Song wrote: > > Hi all, > > Please vote on releasing the following candidate as Apache Mesos 1.5.0. > > 1.5.0 includes the following: > > *

Re: [VOTE] Release Apache Mesos 1.5.0 (rc1)

2018-01-24 Thread James Peach
+1 Verified on CentOS 6 and Fedora 27 > On Jan 22, 2018, at 7:15 PM, Gilbert Song wrote: > > Hi all, > > Please vote on releasing the following candidate as Apache Mesos 1.5.0. > > 1.5.0 includes the following: >

Re: Doc-a-thon - January 11th, 2018

2018-01-09 Thread James Peach
Just a reminder that the Docathon is this Thursday :) > On Nov 21, 2017, at 4:14 PM, Judith Malnick wrote: > > Hi all, > > I'm excited to announce the next Apache Mesos doc-a-thon! > > *Date:* January 11th, 2018 > > Location: > > Mesosphere HQ > > 88 Stevenson Street > > San Francisco, CA

Re: Container user '27' is not supported

2017-12-25 Thread James Peach
ec? I’m not familiar with the Marathon API, but it looks to me like you would specify the “user” field in application: https://docs.mesosphere.com/1.9/deploying-services/marathon-api/#/apps/V2Apps3 > > > Dec 25 23:15:40 m02 mesos-slave[18569]: W1225 23:15:40.251715 18595 > runtime.cpp:

Re: Container user '27' is not supported

2017-12-24 Thread James Peach
> On Dec 24, 2017, at 5:20 AM, Marc Roos wrote: > > > I am seeing this in the logs: > > Container user '27' is not supported yet for container > d823196a-4ec3-41e3-a4c0-6680ba5cc99 > > I guess this means that the container requests to run under a specific > user id, and this is not yet ava

narrowing task sandbox permissions

2017-12-14 Thread James Peach
Hi all, In https://issues.apache.org/jira/browse/MESOS-8332, I'm proposing a change to narrow the permissions used for the task sandbox directory from 0755 to 0750. Note that this change also makes failure to chown this directory into a hard failure. I expect this is a safe change for well-beh

Re: Adding a new agent terminates existing executors?

2017-11-15 Thread James Peach
> On Nov 15, 2017, at 8:24 AM, Dan Leary wrote: > > Yes, as I said at the outset, the agents are on the same host, with different > ip's and hostname's and work_dir's. > If having separate work_dirs is not sufficient to keep containers separated > by agent, what additionally is required? You

Re: 1.4.1 release

2017-11-03 Thread James Peach
I think MESOS-8169 is a candidate, but I don't be able to get to it until next week > On Nov 3, 2017, at 1:48 AM, Qian Zhang wrote: > > And I will backport MESOS-8051 to 1.2.x, 1.3.x and 1.4.x. > > > Regards, > Qian Zhang > > On Fri, Nov 3, 2017 at 9:01 AM, Qian Zhang wrote: > We want to b

Re: clearing the executor authentication token from the task environment

2017-11-02 Thread James Peach
> On Nov 1, 2017, at 2:28 PM, James Peach wrote: > > Hi all, > > In https://issues.apache.org/jira/browse/MESOS-8140, I'm proposing that we > clear the MESOS_EXECUTOR_AUTHENTICATION_TOKEN environment variable > immediately after consuming it in the built-in e

clearing the executor authentication token from the task environment

2017-11-01 Thread James Peach
Hi all, In https://issues.apache.org/jira/browse/MESOS-8140, I'm proposing that we clear the MESOS_EXECUTOR_AUTHENTICATION_TOKEN environment variable immediately after consuming it in the built-in executors. This protects it from observation by other tasks in the same PID namespace, however I w

Re: Adding the limited resource to TaskStatus messages

2017-10-10 Thread James Peach
ld. I'm not planning to add structured information to any other failure reasons, but I'd support doing it if you have a specific suggestion. > On Mon, Oct 9, 2017, 3:50 PM James Peach wrote: > > > On Oct 9, 2017, at 1:27 PM, Vinod Kone wrote: > > > >> In

Re: Adding the limited resource to TaskStatus messages

2017-10-09 Thread James Peach
> On Oct 9, 2017, at 1:27 PM, Vinod Kone wrote: > >> In the case that a task is killed because it violated a resource >> constraint (ie. the reason field is REASON_CONTAINER_LIMITATION, >> REASON_CONTAINER_LIMITATION_DISK or REASON_CONTAINER_LIMITATION_MEMORY), >> this field may be populated wit

Adding the limited resource to TaskStatus messages

2017-10-09 Thread James Peach
Hi all, In https://reviews.apache.org/r/62644/, I am proposing to add an optional Resources field to the TaskStatus message named `limited_resources`. In the case that a task is killed because it violated a resource constraint (ie. the reason field is REASON_CONTAINER_LIMITATION, REASON_CONTAI

Re: RFC: Partition Awareness

2017-10-05 Thread James Peach
> On Jun 21, 2017, at 10:16 AM, Megha Sharma wrote: > > Thank you all for the feedback. > To summarize, not killing tasks for non-Partition Aware frameworks will make > the schedulers see a higher volume of non terminal updates for tasks for > which they have already received a TASK_LOST but n

Re: Are there any supported systems without O_CLOEXEC?

2017-09-29 Thread James Peach
s 3.19. Do we support anything older than that? > > On Fri, Sep 29, 2017 at 9:15 AM, James Peach wrote: > >> >>> On Sep 27, 2017, at 5:03 PM, James Peach wrote: >>> >>> Hi all, >>> >>> In MESOS-8027 and https://reviews.apache.org

Re: Are there any supported systems without O_CLOEXEC?

2017-09-29 Thread James Peach
> On Sep 27, 2017, at 5:03 PM, James Peach wrote: > > Hi all, > > In MESOS-8027 and https://reviews.apache.org/r/62638/, I'm claiming that, in > practice, we do not have any supported platforms that don't implement > O_CLOEXEC to open. All current Linu

Re: Collect feedbacks on TASK_FINISHED

2017-09-22 Thread James Peach
> On Sep 21, 2017, at 10:12 PM, Vinod Kone wrote: > > I think it makes sense for `TASK_KILLED` to be sent in response to a KILL > call irrespective of the exit status. IIRC, that was the original intention. Those are the semantics we implement and expect in our scheduler and executor. The only

Re: TASK_FAILED - Mesos Container Images

2017-09-06 Thread James Peach
> On Sep 6, 2017, at 4:41 AM, Thodoris Zois wrote: > > Hello, > > I am using the Mesos Containerizer with Docker Images. The problem is that > whenever a container exits my task gets TASK_FAILED because the container > exits with ‘1’. > My docker file invokes a shell script via CMD /script.s

Re: Deprecating `--disable-zlib` in libprocess

2017-08-08 Thread James Peach
> On Aug 8, 2017, at 10:57 AM, Chun-Hung Hsiao wrote: > > Hi all, > > In libprocess, we have an optional `--disable-zlib` flag, but it's > currently not used > for conditional compilation and we always use zlib in libprocess, > and there's a requirement check in Mesos to make sure that zlib exi

Re: Command Executor

2017-08-07 Thread James Peach
> On Aug 5, 2017, at 3:03 AM, Oeg Bizz wrote: > > I have a framework that relies on information sent by a custom Java Command > Executor; think of some sort of heartbeat. I start getting hearbeats after I > send a task to that mesos-slave, but never before that. That makes me assume > that

Re: Mesos-docker-executor understanding

2017-07-21 Thread James Peach
> On Jul 19, 2017, at 10:05 AM, Thomas HUMMEL wrote: > > Hello, > > I've read some books about Mesos, installed one multi-master cluster (for POC > purposes) with some frameworks (Marathon, Spark for instance) and watch some > talks. > > Everything works and my understanding of Mesos is beco

Re: Format for attributes with no value

2017-07-14 Thread James Peach
ve it in the /etc/mesos- slave> directory. For instance, if you want to enable authentication and > want to pass the --authenticate attribute then create an empty file called > /etc/mesos-master/?authenticate. > > Not sure if that is what you meant with your question, > > O

Re: Format for attributes with no value

2017-07-10 Thread James Peach
> On Jul 7, 2017, at 4:46 PM, Jeff Kubina wrote: > > When setting an attribute with no value of a mesos-agent is the colon needed, > optional, or must it be omitted? It's not clear from the documentation. For > example, which line or lines below are correct? > > att1:val1;att2;att3:val3 > >

Re: Dynamic reservations without a principal

2017-07-05 Thread James Peach
> On Jul 4, 2017, at 5:27 PM, Srikanth Viswanathan wrote: > > Hi folks, > > I am trying to have the Chronos framework consume dynamic reservations in > Mesos. However, it appears that Chronos is unable to do this because it does > not pass the framework principal to Mesos when launching tasks

Re: Framework change role

2017-07-05 Thread James Peach
> Framework without losing that TreeMap, and also how to set it with version > 1.3.0. > > Hope that everybody understands now…. > Thank you, and i am really sorry for the spam > > >> On 5 Jul 2017, at 12:24, James Peach wrote: >> >> >>> On Jul

Re: Framework change role

2017-07-05 Thread James Peach
> On Jul 5, 2017, at 12:54 AM, Thodoris Zois wrote: > > Hi, > > No, i would like my framework to be offered resources from agent with role > (e.g: thz) and after running the specific tasks change its role to (*) in > order to get offers from different agents, but it will run the same tasks

Re: ensuring a particular task is deployed to "all" Mesos Worker hosts

2017-07-01 Thread James Peach
> On Jul 1, 2017, at 11:14 AM, Erik Weathers wrote: > > Thanks for the info Kevin. Seems there's no JIRAs nor design docs floating > about yet for "admin tasks" or "daemon sets". > > Just FYI, this is the ticket in Storm for the problem I've been mentioning: > > https://issues.apache.org/jir

Re: Mesos-Metrics per task

2017-06-29 Thread James Peach
> On Jun 29, 2017, at 3:53 PM, Thodoris Zois wrote: > > Hello, i would like to get some metrics per task. E.g memory/cpu usage is > there any way? > > Thank you! You can use the GET_CONTAINERS agent API call to get resource

Re: Agent Working Directory Best Practices

2017-06-26 Thread James Peach
> On Jun 26, 2017, at 4:05 PM, Steven Schlansker > wrote: > > >> On Jun 25, 2017, at 11:24 PM, Benjamin Mahler wrote: >> >> As a data point, as far as I'm aware, most users are using a local work >> directory, not an NFS mounted one. Would love to hear from anyone on the >> list if they ar

Re: Work group on Community

2017-06-16 Thread James Peach
> On Jun 15, 2017, at 10:57 AM, Vinod Kone wrote: > > Hi folks, > > Seeing that our first official containerizer WG is off to a good start, we > want to use that momentum to start new WGs. > > I'm proposing that we start a new work group on community. The mission of > this work group would be

Re: How to filter GET_TASKS api result

2017-04-19 Thread James Peach
> On Apr 19, 2017, at 5:00 PM, Benjamin Mahler wrote: > > We can add a Call.GetTasks message to allow you to specify which task ids you > would like to retrieve. But this isn't supported yet, the code needs to be > written. E.g. > > message Call { > enum Type { > GET_TASKS = 13;

Re: Structured logging for Mesos (or c++ glog)

2016-12-19 Thread James Peach
> On Dec 19, 2016, at 2:54 PM, Zhitao Li wrote: > > Hi James, > > Stitching events together is only one possible use cases, and I'm not exactly > sure what you meant by directly event logging. > > Taking the hierarchical allocator for example. In a multi-framework cluster, > sometimes I want

Re: Structured logging for Mesos (or c++ glog)

2016-12-19 Thread James Peach
> On Dec 19, 2016, at 9:43 AM, Zhitao Li wrote: > > Hi, > > I'm looking at how to better utilize ElasticSearch to perform log analysis > for logs from Mesos. It seems like ElasticSearch would generally work better > for structured logging, but Mesos still uses glog thus all logs produced are

Re: MESOS-6233 Allow agents to re-register post a host reboot

2016-11-29 Thread James Peach
> On Nov 28, 2016, at 6:09 PM, Yan Xu wrote: > > So one thing that was brought up during offline conversations was that if the > host reboot is associated with hardware change (e.g., a new memory stick): > > • Currently: the agent would skip the recovery (and the chance of > running int

Re: Persistent volume ownership issue

2016-06-21 Thread James Peach
be to make the owner the creator of the volume, then use ACL inheritance to grant additional access to other users. You'd have to reflow the inheritance, but it could probably done. -- James Peach | jor...@gmail.com

Re: Persistent volume ownership issue

2016-06-21 Thread James Peach
t; chown, we do a non-recursive chown. That'll allow the new task to at least > create new files under the persistent volume, but do not change ownership of > files created by previous tasks. It should be a very simple fix which we can > ship in 1.0. We'll ship MESOS-4893 after 1.0. What do you guys think? > > Thanks, > - Jie -- James Peach | jor...@gmail.com

Re: How is the OS X environment created with Mesos

2016-05-18 Thread James Peach
e of the scripts >> normally run during login. This was a constant source of confusion with >> Jenkins. If one can state what exactly is done to create the user >> environment each platform and how it is different that others it will save >> countless hours of debugging IMO. I realize OSX is an odd system -- linux at >> times, Apple specific at times in areas that conflict with Linux but this >> will only get more complicated when Windows agents become available. >> >> >> >> Rinaldo > > > > > -- > Best Regards, > Haosdent Huang > > -- James Peach | jor...@gmail.com

Re: [Proposal] Remove the default value for agent work_dir

2016-04-12 Thread James Peach
> On Apr 12, 2016, at 3:58 PM, Greg Mann wrote: > > Hey folks! > A number of situations have arisen in which the default value of the Mesos > agent `--work_dir` flag (/tmp/mesos) has caused problems on systems in which > the automatic cleanup of '/tmp' deletes agent metadata. To resolve this,

Re: verbose logging with the docker executor

2016-03-19 Thread James Peach
> On Mar 17, 2016, at 10:09 AM, Clarke, Trevor wrote: > > Looking in the docker executor, the docker command line is logged with > VLOG(1) but I'm not sure how to generate that level of log output. Some > googling suggests it's used in the google logging library and verbose logging > would be

Re: OS X build

2015-09-27 Thread James Peach
T jpeach$ ./configure --help | grep apr --with-apr=[=DIR] specify where to locate the apr-1 library > > On Sat, Sep 26, 2015 at 9:26 PM, James Peach wrote: > > > On Sep 26, 2015, at 12:01 PM, Vaibhav Khanduja > > wrote: > > > > I am running into issu

Re: OS X build

2015-09-26 Thread James Peach
> On Sep 26, 2015, at 12:01 PM, Vaibhav Khanduja > wrote: > > I am running into issues with build on my MAC - OSX … the configure scripts > complaints about libapr-1 not present. I was able to find a workaround by > passing configure with —with-apr option. Looks like the script checks for >

Re: Building portable binaries

2015-09-17 Thread James Peach
> On Sep 17, 2015, at 4:33 PM, F21 wrote: > > Is there anyway to build portable binaries for mesos? > > Currently, I have tried building my own libsvn, libsasl2, libcurl, libapr and > then built mesos using the following: > > ../configure CC=gcc-4.8 CXX=g++-4.8 > LD_LIBRARY_PATH=/tmp/mesos-b

Re: Recommended way to discover current master

2015-08-31 Thread James Peach
> On Aug 31, 2015, at 10:25 AM, Philip Weaver wrote: > > My framework knows the list of zookeeper hosts and the list of mesos master > hosts. > > I can think of a few ways for the framework to figure out which host is the > current master. What would be the best? Should I check in zookeeper d

Re: Build 0.23 gcc Version

2015-07-29 Thread James Peach
till does that try removing config.cache. > > > John > > On Mon, Jul 27, 2015 at 10:56 AM, James Peach wrote: > > > On Jul 24, 2015, at 3:57 PM, Michael Park wrote: > > > > Hi John, > > > > I would first suggest trying CC="gcc" CXX="g++"

Re: Build 0.23 gcc Version

2015-07-27 Thread James Peach
> On Jul 24, 2015, at 3:57 PM, Michael Park wrote: > > Hi John, > > I would first suggest trying CC="gcc" CXX="g++" ../configure, and if that > works, try to find out what which cc and which c++ return and find out what > they symlink to. > I believe autotools uses cc and c++ rather than gcc