Re: Let MesosContainerizer support ramdisk.

2017-02-10 Thread Joris Van Remoortere
This looks interesting.

I would recommend creating a JIRA and attaching it to the review.

One preliminary question: Can we not probe the filesystem to identify
whether it is a RAM_FS? Why do we need to add administrator flags for this?



—
*Joris Van Remoortere*
Mesosphere

On Wed, Dec 28, 2016 at 6:32 AM, tommy xiao <xia...@gmail.com> wrote:

> Woo. Cool Feature.
>
> 2016-12-27 11:42 GMT+08:00 haosdent <haosd...@gmail.com>:
>
> > @bingqiang This patch looks may take a few time to review. Could you
> > create an associate ticket in https://issues.apache.org/
> jira/browse/MESOS
> > ? Thank you!
> >
> > On Tue, Dec 27, 2016 at 10:51 AM, pangbingqiang <
> pangbingqi...@huawei.com>
> > wrote:
> >
> >> Hi All:
> >>
> >>   As now mesoscontainer don’t support ramdisk, we have support this
> >> feature, please have a review, If have any question please let me know,
> >> thanks.
> >>
> >> https://reviews.apache.org/r/55042/
> >>
> >> [image: cid:image001.png@01D0E8C5.8D08F440]
> >>
> >>
> >>
> >> Bingqiang Pang(庞兵强)
> >>
> >>
> >>
> >> Distributed and Parallel Software Lab
> >>
> >> Huawei Technologies Co., Ltd.
> >>
> >> Email:pangbingqi...@huawei.com <sut...@huawei.com>
> >>
> >>
> >>
> >>
> >>
> >
> >
> >
> > --
> > Best Regards,
> > Haosdent Huang
> >
>
>
>
> --
> Deshi Xiao
> Twitter: xds2000
> E-mail: xiaods(AT)gmail.com
>


Re: MESOS-6233 Allow agents to re-register post a host reboot

2016-12-12 Thread Joris Van Remoortere
>
> So one thing that was brought up during offline conversations was that if
> the host reboot is associated with hardware change (e.g., a new memory
> stick):


>- With the change: the agent could run into incompatible agent info
>due to resource change and flap
>
> <https://github.com/apache/mesos/blob/58f63747f185995d7f9cbfca9d240e2d60053184/src/slave/slave.cpp#L5280>
>  indefinitely
>until the operator intervenes.
>
> Can you elaborate on this?

Would you run into this because you don't explicitly specify the memory
resource in the agent configuration? I think we highly recommend that you
do this in production to prevent accidental incompatibility of resources
even without an actual hardware change. Historically there were some issues
reported where the kernel reported a slightly different amount of memory
after reboot.

—
*Joris Van Remoortere*
Mesosphere

On Mon, Nov 28, 2016 at 6:09 PM, Yan Xu <xuj...@apple.com> wrote:

> So one thing that was brought up during offline conversations was that if
> the host reboot is associated with hardware change (e.g., a new memory
> stick):
>
>
>- Currently: the agent would skip the recovery (and the chance of
>running into incompatible agent info) and register as a new agent.
>- With the change: the agent could run into incompatible agent info
>due to resource change and flap
>
> <https://github.com/apache/mesos/blob/58f63747f185995d7f9cbfca9d240e2d60053184/src/slave/slave.cpp#L5280>
>indefinitely until the operator intervenes.
>
>
> To mitigate this and maintain the current behavior, we can have the agent
> remove `rm -f /meta/slaves/latest` automatically upon recovery
> failure but only after the host has rebooted. This way the agent can
> restart as a new agent without operator intervention.
>
> Any thoughts?
>
> BTW this speaks to the need for MESOS-1739.
>
> Yan
>
> On Tue, Nov 15, 2016 at 7:37 AM, Megha Sharma <mshar...@apple.com> wrote:
>
>> Hi All,
>>
>> We have been working on the design for Restartable tasks (
>> MESOS-3545) and allowing agents to recover and re-register post reboot is a
>> pre-requisite for that.
>> Agent today doesn’t recover its state that includes its SlaveID post a
>> host reboot, it short-circuits the recovery upon discovering the reboot and
>> registers with the master as a new agent. With Partition Awareness, the
>> mesos master even allows agents which have failed master’s health check
>> pings (unreachable agents) to re-register with it and reconcile the
>> tasks/executors. The executors on a rebooted host are anyway terminated so
>> there is no harm in letting such an agent recover and re-register with the
>> master using its old SlaveID.
>> Would like to hear from the folks here if you see any operational
>> concerns with letting the agents recover post a host reboot.
>>
>> MESOS JIRA: https://issues.apache.org/jira/browse/MESOS-6223
>>
>> Many Thanks
>> Megha Sharma
>>
>>
>>
>


Re: Duplicate task IDs

2016-12-12 Thread Joris Van Remoortere
It sounds like using a multi_hashmap for now allows you to clean up the
code and avoid some bugs, without changing the existing behavior.

I agree that we would want a deprecation period if we changed the behavior.
It would also be unfortunate if we said we were dis-allowing duplicate task
ids but only catch some of the manifestations.

—
*Joris Van Remoortere*
Mesosphere

On Mon, Dec 12, 2016 at 7:56 AM, Neil Conway <neil.con...@gmail.com> wrote:

> Hi Joris,
>
> Fair point: I didn't deliberately set out to change the behavior for
> duplicate task IDs. Rather, it was a consequence of switching from
> boost::circular_buffer to using a hashmap for managing completed
> tasks. Using a hashmap has a few minor advantages [1], but we can
> certainly continue using circular_buffer (or a multi-hashmap) if we
> want to keep the current behavior.
>
> I think we have the following options:
>
> (1) Keep the current behavior: reusing task IDs is discouraged but
> supported.
>
> (2) Per Alex's suggestion, we can say that frameworks are no longer
> allowed to reuse task IDs. Because the master only keeps a
> limited-size cache of completed tasks (which is not preserved across
> master restart or failover), we wouldn't be able to reject all
> situations in which frameworks attempt to reuse task IDs.
>
> If we pursue #2, we might need a deprecation period or master
> capability to give framework authors some time to migrate.
>
> For the moment, I'll avoid changing the behavior for duplicate task
> IDs; I've opened https://issues.apache.org/jira/browse/MESOS-6779 to
> track this issue. If you have an opinion in this change, please
> weigh-in, either on this thread or on JIRA.
>
> Neil
>
> [1] Specifically, making the management of completed and unreachable
> tasks more symmetric and avoiding some bugs/UBI in
> boost::circular_buffer. O(1) lookup of completed tasks might be useful
> in the future but isn't used right now.
>
> On Fri, Dec 9, 2016 at 2:13 PM, Joris Van Remoortere
> <jo...@mesosphere.io> wrote:
> > Hey Neil,
> >
> > I concur that using duplicate task IDs is bad practice and asking for
> > trouble.
> >
> > Could you please clarify *why* you want to use a hashmap? Is your goal to
> > remove duplicate task IDs or is this just a side-effect and you have a
> > different reason (e.g. performance) for using a hashmap?
> >
> > I'm wondering why a multi-hashmap is not sufficient. This would be clear
> if
> > you were explicitly *trying* to get rid of duplicates of course :-)
> >
> > Thanks,
> > Joris
> >
> > —
> > *Joris Van Remoortere*
> > Mesosphere
> >
> > On Fri, Dec 9, 2016 at 7:08 AM, Neil Conway <neil.con...@gmail.com>
> wrote:
> >
> >> Folks,
> >>
> >> The master stores a cache of metadata about recently completed tasks;
> >> for example, this information can be accessed via the "/tasks" HTTP
> >> endpoint or the "GET_TASKS" call in the new Operator API.
> >>
> >> The master currently stores this metadata using a list; this means
> >> that duplicate task IDs are permitted. We're considering [1] changing
> >> this to use a hashmap instead. Using a hashmap would mean that
> >> duplicate task IDs would be discarded: if two completed tasks have the
> >> same task ID, only the metadata for the most recently completed task
> >> would be retained by the master.
> >>
> >> If this behavior change would cause problems for your framework or
> >> other software that relies on Mesos, please let me know.
> >>
> >> (Note that if you do have two completed tasks with the same ID, you'd
> >> need an unambiguous way to tell them apart. As a recommendation, I
> >> would strongly encourage framework authors to never reuse task IDs.)
> >>
> >> Neil
> >>
> >> [1] https://reviews.apache.org/r/54179/
> >>
>


Re: Duplicate task IDs

2016-12-09 Thread Joris Van Remoortere
Hey Neil,

I concur that using duplicate task IDs is bad practice and asking for
trouble.

Could you please clarify *why* you want to use a hashmap? Is your goal to
remove duplicate task IDs or is this just a side-effect and you have a
different reason (e.g. performance) for using a hashmap?

I'm wondering why a multi-hashmap is not sufficient. This would be clear if
you were explicitly *trying* to get rid of duplicates of course :-)

Thanks,
Joris

—
*Joris Van Remoortere*
Mesosphere

On Fri, Dec 9, 2016 at 7:08 AM, Neil Conway <neil.con...@gmail.com> wrote:

> Folks,
>
> The master stores a cache of metadata about recently completed tasks;
> for example, this information can be accessed via the "/tasks" HTTP
> endpoint or the "GET_TASKS" call in the new Operator API.
>
> The master currently stores this metadata using a list; this means
> that duplicate task IDs are permitted. We're considering [1] changing
> this to use a hashmap instead. Using a hashmap would mean that
> duplicate task IDs would be discarded: if two completed tasks have the
> same task ID, only the metadata for the most recently completed task
> would be retained by the master.
>
> If this behavior change would cause problems for your framework or
> other software that relies on Mesos, please let me know.
>
> (Note that if you do have two completed tasks with the same ID, you'd
> need an unambiguous way to tell them apart. As a recommendation, I
> would strongly encourage framework authors to never reuse task IDs.)
>
> Neil
>
> [1] https://reviews.apache.org/r/54179/
>


Re: [VOTE] Release Apache Mesos 1.0.2 (rc2)

2016-11-01 Thread Joris Van Remoortere
-1

Based on my message in the 1.1.0 vote:

My understanding after speaking with BenM is that the fix for
https://issues.apache.org/jira/browse/MESOS-6457 should be straight forward.

I don't think the cost of debugging and fixing production systems if they
run into this is worth skipping an easy fix. I have re-targeted the JIRA to
1.1.0, and 1.0.2.

Joris

On Mon, Oct 31, 2016 at 4:35 PM, Vinod Kone  wrote:

> Hi all,
>
>
> Please vote on releasing the following candidate as Apache Mesos 1.0.2.
>
>
> This is a bug fix release.
>
>
> The CHANGELOG for the release is available at:
>
> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_
> plain;f=CHANGELOG;hb=1.0.2-rc2
>
> 
> 
>
>
> The candidate for Mesos 1.0.2 release is available at:
>
> https://dist.apache.org/repos/dist/dev/mesos/1.0.2-rc2/mesos-1.0.2.tar.gz
>
>
> The tag to be voted on is 1.0.2-rc2:
>
> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=commit;h=1.0.2-rc2
>
>
> The MD5 checksum of the tarball can be found at:
>
> https://dist.apache.org/repos/dist/dev/mesos/1.0.2-rc2/
> mesos-1.0.2.tar.gz.md5
>
>
> The signature of the tarball can be found at:
>
> https://dist.apache.org/repos/dist/dev/mesos/1.0.2-rc2/
> mesos-1.0.2.tar.gz.asc
>
>
> The PGP key used to sign the release is here:
>
> https://dist.apache.org/repos/dist/release/mesos/KEYS
>
>
> The JAR is up in Maven in a staging repository here:
>
> https://repository.apache.org/content/repositories/orgapachemesos-1164
>
>
> Please vote on releasing this package as Apache Mesos 1.0.2!
>
>
> The vote is open until Thu Nov  3 16:34:20 PDT 2016 and passes if a
> majority of at least 3 +1 PMC votes are cast.
>
>
> [ ] +1 Release this package as Apache Mesos 1.0.2
>
> [ ] -1 Do not release this package because ...
>
>
> Thanks,
>


Re: [VOTE] Release Apache Mesos 1.1.0 (rc2)

2016-11-01 Thread Joris Van Remoortere
-1

My understanding after speaking with BenM is that the fix for
https://issues.apache.org/jira/browse/MESOS-6457 should be straight forward.

I don't think the cost of debugging and fixing production systems if they
run into this is worth skipping an easy fix. I have re-targeted the JIRA to
1.1.0.

Joris

On Mon, Oct 31, 2016 at 7:00 AM, Till Toenshoff  wrote:

> Hi all,
>
> Please vote on releasing the following candidate as Apache Mesos 1.1.0.
>
>
> 1.1.0 includes the following:
> 
> 
>  * [MESOS-2449] - **Experimental** support for launching a group of tasks
> via a new `LAUNCH_GROUP` Offer operation. Mesos will guarantee that
> either
> all tasks or none of the tasks in the group are delivered to the
> executor.
> Executors receive the task group via a new `LAUNCH_GROUP` event.
>
>   * [MESOS-2533] - **Experimental** support for HTTP and HTTPS health
> checks.
> Executors may now use the updated `HealthCheck` protobuf to implement
> HTTP(S) health checks. Both default executors (command and docker)
> leverage
> `curl` binary for sending HTTP(S) requests and connect to `127.0.0.1`,
> hence a task must listen on all interfaces. On Linux, For BRIDGE and
> USER
> modes, docker executor enters the task's network namespace.
>
>   * [MESOS-3421] - **Experimental** Support sharing of resources across
> containers. Currently persistent volumes are the only resources
> allowed to
> be shared.
>
>   * [MESOS-3567] - **Experimental** support for TCP health checks.
> Executors
> may now use the updated `HealthCheck` protobuf to implement TCP health
> checks. Both default executors (command and docker) connect to
> `127.0.0.1`,
> hence a task must listen on all interfaces. On Linux, For BRIDGE and
> USER
> modes, docker executor enters the task's network namespace.
>
>   * [MESOS-4324] - Allow access to persistent volumes as read-only or
> read-write
> by tasks. Mesos doesn't allow persistent volumes to be created as
> read-only
> but in 1.1 it starts allow tasks to use the volumes as read-only. This
> is
> mainly motivated by shared persistent volumes but applies to regular
> persistent volumes as well.
>
>   * [MESOS-5275] - **Experimental** support for linux capabilities.
> Frameworks
> or operators now have fine-grained control over the capabilities that a
> container may have. This allows a container to run as root, but not
> have all
> the privileges associated with the root user (e.g., CAP_SYS_ADMIN).
>
>   * [MESOS-5344] -- **Experimental** support for partition-aware Mesos
> frameworks. In previous Mesos releases, when an agent is partitioned
> from
> the master and then reregisters with the cluster, all tasks running on
> the
> agent are terminated and the agent is shutdown. In Mesos 1.1,
> partitioned
> agents will no longer be shutdown when they reregister with the
> master. By
> default, tasks running on such agents will still be killed (for
> backward
> compatibility); however, frameworks can opt-in to the new
> PARTITION_AWARE
> capability. If they do this, their tasks will not be killed when a
> partition
> is healed. This allows frameworks to define their own policies for how
> to
> handle partitioned tasks. Enabling the PARTITION_AWARE capability also
> introduces a new set of task states: TASK_UNREACHABLE, TASK_DROPPED,
> TASK_GONE, TASK_GONE_BY_OPERATOR, and TASK_UNKNOWN. These new states
> are
> intended to eventually replace the TASK_LOST state.
>
>   * [MESOS-6077] - **Experimental** A new default executor is introduced
> which
> frameworks can use to launch task groups as nested containers. All the
> nested containers share resources likes cpu, memory, network and
> volumes.
>
>   * [MESOS-6014] - **Experimental** A new port-mapper CNI plugin, the
> `mesos-cni-port-mapper` has been introduced. For Mesos containers,
> with the
> CNI port-mapper plugin, users can now expose container ports through
> host
> ports using DNAT. This is especially useful when Mesos containers are
> attached to isolated CNI networks such as private bridge networks, and
> the
> services running in the container needs to be exposed outside these
> isolated networks.
>
>
> The CHANGELOG for the release is available at:
> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_
> plain;f=CHANGELOG;hb=1.1.0-rc2
> 
> 
>
> The candidate for Mesos 1.1.0 release is available at:
> https://dist.apache.org/repos/dist/dev/mesos/1.1.0-rc2/mesos-1.1.0.tar.gz
>
> The tag to be voted on is 1.1.0-rc2:
> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=commit;h=1.1.0-rc2
>
> The MD5 checksum of the tarball can be found at:
> https://dist.apache.org/repos/dist/dev/mesos/1.1.0-rc2/
> 

Re: Non-checkpointing frameworks

2016-10-15 Thread Joris Van Remoortere
I'm in favor of A & B. I find it provides a better "first experience" to
users.
>From my experience you usually have to have an explicit reason to not want
to checkpoint. Most people assume the semantics provided by the checkpoint
behavior is default and it can be a frustrating experience for them to find
out that is not the case.

—
*Joris Van Remoortere*
Mesosphere

On Fri, Oct 14, 2016 at 3:11 PM, Neil Conway <neil.con...@gmail.com> wrote:

> Hi folks,
>
> I'd like input from individuals who currently use frameworks but do
> not enable checkpointing.
>
> Background: "checkpointing" is a parameter that can be enabled in
> FrameworkInfo; if enabled, the agent will write the framework pid,
> executor PIDs, and status updates to disk for any tasks started by
> that framework. This checkpointed information means that these tasks
> can survive an agent crash: if the agent exits (whether due to
> crashing or as part of an upgrade procedure), a restarted agent can
> use this information to reconnect to executors started by the previous
> instance of the agent. The downside is that checkpointing requires
> some additional disk I/O at the agent.
>
> Checkpointing is not currently the default, but in my experience it is
> often enabled for production frameworks. As part of the work on
> supporting partition-aware Mesos frameworks (see MESOS-4049), we are
> considering:
>
> (a) requiring that partition-aware frameworks must also enable
> checkpointing, and/or
> (b) enabling checkpointing by default
>
> If you have intentionally decided to disable checkpointing for your
> Mesos framework, I'd be curious to hear more about your use-case and
> why you haven't enabled it.
>
> Thanks!
>
> Neil
>


Re: Persistent volume ownership issue

2016-06-21 Thread Joris Van Remoortere
For the case where a container drops down in privileges and still wants to
create a new file, this will result in an error if it is at the root of the
persistent volume right?

Is the recommended pattern then to always create a stub directory at the
root of the persistent volume, and then launch any lower privileged apps
underneath that? For example:

/ <- Root of persistent volume (Owned by framework user / root)
/Database/ <- Stub directory (Owned by lower privileged user)

All new files by the lower privileged app must be created under /Database/*
?
It would result in an error if the App tried to create /Database-backups/ ?
Only the framework as its original user would be able to do that?

—
*Joris Van Remoortere*
Mesosphere

On Tue, Jun 21, 2016 at 8:25 AM, Jie Yu <yujie@gmail.com> wrote:

> Hi folks,
>
> Currently, the ownership of the persistent volumes are set to be the same
> as the sandbox. In the implementation, we call `chown -R` on the persistent
> volume to match that of the sandbox each time before we mount it into the
> container.
>
> Recently, we realized that this behavior is not ideal. Especially, if a
> task created some files in the persistent volume, and the owner of those
> file might be different than the task's user. For instance, a task is
> running under root and it creates some database files under user 'database'
> and launch the database process under user 'database'. When the database
> process is restarted by the scheduler, the current behavior is that the
> we'll do a 'chown -R root.root' on the persistent volume, causes database
> files to be chown to 'root'.
>
> The true fix of this problem is to allow frameworks to explicit specify
> owner of persistent volumes during creation. THis is captured in this
> ticket:
> https://issues.apache.org/jira/browse/MESOS-4893
>
> In the short-term (for 1.0), I propose that, instead of doing a recursive
> chown, we do a non-recursive chown. That'll allow the new task to at least
> create new files under the persistent volume, but do not change ownership
> of files created by previous tasks. It should be a very simple fix which we
> can ship in 1.0. We'll ship MESOS-4893 after 1.0. What do you guys think?
>
> Thanks,
> - Jie
>


Re: Failed to shutdown socket with fd xxx

2016-06-20 Thread Joris Van Remoortere
>
> For "That indicates a transition from the old systemd lack of support to
> the new support. "
> >> lack of what support ? would explain more details, and how to fix this?
> or may have other cause ?


There were a few versions of Mesos where we were not yet aware of some of
the issues with running under systemd. There was a fix for the
LinuxLauncher in 0.25 (https://issues.apache.org/jira/browse/MESOS-3425)
and further fixes for the posix launcher and docker containerizer in 0.28
and some backports. See the systemd documentation at the bottom of this
page: http://mesos.apache.org/documentation/latest/agent-recovery/

It's possible that you have tasks left over from before we had this
support, which means they are not running under the executor slice. These
technically could lose their isolation (as mentioned in the warning). If
you care about the isolation (you likely do in production), then the only
remedy is to restart them.

—
*Joris Van Remoortere*
Mesosphere

On Mon, Jun 20, 2016 at 4:45 AM, Qiang Chen <qzsc...@gmail.com> wrote:

> Thanks @Haosdent for the link to explain the shutdown errors. so I can
> ignore this...
>
> @Joris,
>
> 1. I upgraded form 0.25.0 to 0.28.2 in centos 7 which  has systemd support.
> 2. I didn't make any OS / init system changes
>
> For "That indicates a transition from the old systemd lack of support to
> the new support. "
> >> lack of what support ? would explain more details, and how to fix this?
> or may have other cause ?
>
> Thanks great again!
>
>
> On 2016年06月17日 21:31, Joris Van Remoortere wrote:
>
> [image: Boxbe] <https://www.boxbe.com/overview> This message is eligible
> for Automatic Cleanup! (jo...@mesosphere.io) Add cleanup rule
> <https://www.boxbe.com/popup?url=https%3A%2F%2Fwww.boxbe.com%2Fcleanup%3Fkey%3DINo0V0shoF5SDDeFNLmOQcDrkM6vuyhBbTAdJ5Ek4fI%253D%26token%3D5pye7msFkBYF5q0SSLYtlGWaWu8a6Imv%252F0E2lgbtu%252BgVEFau%252BV9i3BQYfTGspspkIaoukz1oy8IOSGPyscO1GfcEZlPEs2k3hUGSvAHO6cSuBmHqxd7TnZwBy5RkAx7yt2on45nEbm4%253D_serial=25796382411_rand=1671551284_source=stf_medium=email_campaign=ANNO_CLEANUP_ADD_content=001>
> | More info
> <http://blog.boxbe.com/general/boxbe-automatic-cleanup?tc_serial=25796382411_rand=1671551284_source=stf_medium=email_campaign=ANNO_CLEANUP_ADD_content=001>
>
>
> The shutdown errors are not the issue.
> The concerning part is this warning:
>
>> W0615 15:01:43.285518  4182 linux_launcher.cpp:197] Couldn't find pid
>> '42322' in 'mesos_executors.slice'. This can lead to lack of proper
>> resource isolation
>
> That indicates a transition from the old systemd lack of support to the
> new support.
>
> —
> *Joris Van Remoortere*
> Mesosphere
>
> On Fri, Jun 17, 2016 at 2:35 PM, haosdent <haosd...@gmail.com> wrote:
>
>> Hi, @Qiang.
>>
>> @Joseph have a nice explain about at Shutdown failed on fd
>>
>> http://search-hadoop.com/m/0Vlr6pe7qb2MJX8B1=Re+Benign+Shutdown+failed+on+fd+error+messages
>> Those errors could be ignored.
>>
>> For
>> ```
>> I0615 15:01:43.324935  4172 mem.cpp:602] Started listening for OOM events
>> for container f50b4c7a-d1d2-4fc8-abb9-5ab549f168dc
>> ```
>>
>> These are normal info log, it happen when Mesos CgroupMemIsolator register
>> oom hooks for your containers.
>>
>> On Fri, Jun 17, 2016 at 8:22 PM, Joris Van Remoortere <
>> <jo...@mesosphere.io>jo...@mesosphere.io>
>> wrote:
>>
>> > Can you provide:
>> > 1. The version that you are upgrading from.
>> > 2. Whether you made any OS / init system changes alongside this upgrade
>> > (just to narrow the scope).
>> >
>> > It is possible that you are upgrading from a version that did not have
>> > systemd support to one that does. If so, the upgrade may require
>> restarting
>> > the tasks (either by themselves, or just starting a fresh agent). Please
>> > check out some of the work in MESOS-3007 to get a better understanding
>> of
>> > what the issue I am referring to is.
>> >
>> > If you can verify that you are making one of these transitions from a
>> bad
>> > world to a good world, then you can devise a plan for your upgrade.
>> >
>> > Joris
>> >
>> > —
>> > *Joris Van Remoortere*
>> > Mesosphere
>> >
>> > On Fri, Jun 17, 2016 at 8:28 AM, Qiang Chen < <qzsc...@gmail.com>
>> qzsc...@gmail.com> wrote:
>> >
>> > > Hi all,
>> > >
>> > > I met an issue when upgrading mesos-slave to 0.28.2.
>> > >
>> > > At the process of recoveri

Re: Failed to shutdown socket with fd xxx

2016-06-17 Thread Joris Van Remoortere
The shutdown errors are not the issue.
The concerning part is this warning:

> W0615 15:01:43.285518  4182 linux_launcher.cpp:197] Couldn't find pid
> '42322' in 'mesos_executors.slice'. This can lead to lack of proper
> resource isolation

That indicates a transition from the old systemd lack of support to the new
support.

—
*Joris Van Remoortere*
Mesosphere

On Fri, Jun 17, 2016 at 2:35 PM, haosdent <haosd...@gmail.com> wrote:

> Hi, @Qiang.
>
> @Joseph have a nice explain about at Shutdown failed on fd
>
> http://search-hadoop.com/m/0Vlr6pe7qb2MJX8B1=Re+Benign+Shutdown+failed+on+fd+error+messages
> Those errors could be ignored.
>
> For
> ```
> I0615 15:01:43.324935  4172 mem.cpp:602] Started listening for OOM events
> for container f50b4c7a-d1d2-4fc8-abb9-5ab549f168dc
> ```
>
> These are normal info log, it happen when Mesos CgroupMemIsolator register
> oom hooks for your containers.
>
> On Fri, Jun 17, 2016 at 8:22 PM, Joris Van Remoortere <jo...@mesosphere.io
> >
> wrote:
>
> > Can you provide:
> > 1. The version that you are upgrading from.
> > 2. Whether you made any OS / init system changes alongside this upgrade
> > (just to narrow the scope).
> >
> > It is possible that you are upgrading from a version that did not have
> > systemd support to one that does. If so, the upgrade may require
> restarting
> > the tasks (either by themselves, or just starting a fresh agent). Please
> > check out some of the work in MESOS-3007 to get a better understanding of
> > what the issue I am referring to is.
> >
> > If you can verify that you are making one of these transitions from a bad
> > world to a good world, then you can devise a plan for your upgrade.
> >
> > Joris
> >
> > —
> > *Joris Van Remoortere*
> > Mesosphere
> >
> > On Fri, Jun 17, 2016 at 8:28 AM, Qiang Chen <qzsc...@gmail.com> wrote:
> >
> > > Hi all,
> > >
> > > I met an issue when upgrading mesos-slave to 0.28.2.
> > >
> > > At the process of recovering mesos-slave / framework container stage,
> it
> > > produced the following errors.
> > >
> > >
> > > ```
> > > Log file created at: 2016/06/15 15:01:43
> > > Running on machine: mesos-slave-online005-xxx.cloud.xxx.domain
> > > Log line format: [IWEF]mmdd hh:mm:ss.uu threadid file:line] msg
> > > W0615 15:01:43.285518  4182 linux_launcher.cpp:197] Couldn't find pid
> > > '42322' in 'mesos_executors.slice'. This can lead to lack of proper
> > > resource isolation
> > > W0615 15:01:43.286182  4182 linux_launcher.cpp:197] Couldn't find pid
> > > '42312' in 'mesos_executors.slice'. This can lead to lack of proper
> > > resource isolation
> > > W0615 15:01:43.286669  4182 linux_launcher.cpp:197] Couldn't find pid
> > > '42309' in 'mesos_executors.slice'. This can lead to lack of proper
> > > resource isolation
> > > W0615 15:01:43.287144  4182 linux_launcher.cpp:197] Couldn't find pid
> > > '42304' in 'mesos_executors.slice'. This can lead to lack of proper
> > > resource isolation
> > > W0615 15:01:43.287636  4182 linux_launcher.cpp:197] Couldn't find pid
> > > '42300' in 'mesos_executors.slice'. This can lead to lack of proper
> > > resource isolation
> > > W0615 15:01:43.288120  4182 linux_launcher.cpp:197] Couldn't find pid
> > > '42317' in 'mesos_executors.slice'. This can lead to lack of proper
> > > resource isolation
> > > E0615 15:01:43.471676  4201 process.cpp:1958] Failed to shutdown socket
> > > with fd 24: Transport endpoint is not connected
> > > E0615 15:01:43.476007  4201 process.cpp:1958] Failed to shutdown socket
> > > with fd 24: Transport endpoint is not connected
> > > E0615 15:01:43.476143  4201 process.cpp:1958] Failed to shutdown socket
> > > with fd 24: Transport endpoint is not connected
> > > E0615 15:01:43.476272  4201 process.cpp:1958] Failed to shutdown socket
> > > with fd 24: Transport endpoint is not connected
> > > E0615 15:01:43.476483  4201 process.cpp:1958] Failed to shutdown socket
> > > with fd 24: Transport endpoint is not connected
> > > E0615 15:01:43.476618  4201 process.cpp:1958] Failed to shutdown socket
> > > with fd 24: Transport endpoint is not connected
> > >
> > > ```
> > >
> > > And it will also cause the OOM errors, such as:
> > >
> > > ```
> > > I0615 15:01:43.324935  4172 mem.cpp:602] Started listening for OOM
> events
> > > for container f50b4c7a-d1d2-4fc8-abb9-5ab549f168dc
> > > I0615 15:01:43.325469 4172 mem.cpp:722] Started listening on low memory
> > > pressure events for container f50b4c7a-d1d2-4fc8-abb9-5ab549f168dc
> > > I0615 15:01:43.326004  4172 mem.cpp:722] Started listening on medium
> > > memory pressure events for container
> f50b4c7a-d1d2-4fc8-abb9-5ab549f168dc
> > > I0615 15:01:43.326539  4172 mem.cpp:722] Started listening on critical
> > > memory pressure events for container
> f50b4c7a-d1d2-4fc8-abb9-5ab549f168dc
> > >
> > > ```
> > >
> > > Did someone suffer this? thanks.
> > >
> > > --
> > > Best Regards,
> > > Chen, Qiang
> > >
> > >
> >
>
>
>
> --
> Best Regards,
> Haosdent Huang
>


Re: Failed to shutdown socket with fd xxx

2016-06-17 Thread Joris Van Remoortere
Can you provide:
1. The version that you are upgrading from.
2. Whether you made any OS / init system changes alongside this upgrade
(just to narrow the scope).

It is possible that you are upgrading from a version that did not have
systemd support to one that does. If so, the upgrade may require restarting
the tasks (either by themselves, or just starting a fresh agent). Please
check out some of the work in MESOS-3007 to get a better understanding of
what the issue I am referring to is.

If you can verify that you are making one of these transitions from a bad
world to a good world, then you can devise a plan for your upgrade.

Joris

—
*Joris Van Remoortere*
Mesosphere

On Fri, Jun 17, 2016 at 8:28 AM, Qiang Chen <qzsc...@gmail.com> wrote:

> Hi all,
>
> I met an issue when upgrading mesos-slave to 0.28.2.
>
> At the process of recovering mesos-slave / framework container stage, it
> produced the following errors.
>
>
> ```
> Log file created at: 2016/06/15 15:01:43
> Running on machine: mesos-slave-online005-xxx.cloud.xxx.domain
> Log line format: [IWEF]mmdd hh:mm:ss.uu threadid file:line] msg
> W0615 15:01:43.285518  4182 linux_launcher.cpp:197] Couldn't find pid
> '42322' in 'mesos_executors.slice'. This can lead to lack of proper
> resource isolation
> W0615 15:01:43.286182  4182 linux_launcher.cpp:197] Couldn't find pid
> '42312' in 'mesos_executors.slice'. This can lead to lack of proper
> resource isolation
> W0615 15:01:43.286669  4182 linux_launcher.cpp:197] Couldn't find pid
> '42309' in 'mesos_executors.slice'. This can lead to lack of proper
> resource isolation
> W0615 15:01:43.287144  4182 linux_launcher.cpp:197] Couldn't find pid
> '42304' in 'mesos_executors.slice'. This can lead to lack of proper
> resource isolation
> W0615 15:01:43.287636  4182 linux_launcher.cpp:197] Couldn't find pid
> '42300' in 'mesos_executors.slice'. This can lead to lack of proper
> resource isolation
> W0615 15:01:43.288120  4182 linux_launcher.cpp:197] Couldn't find pid
> '42317' in 'mesos_executors.slice'. This can lead to lack of proper
> resource isolation
> E0615 15:01:43.471676  4201 process.cpp:1958] Failed to shutdown socket
> with fd 24: Transport endpoint is not connected
> E0615 15:01:43.476007  4201 process.cpp:1958] Failed to shutdown socket
> with fd 24: Transport endpoint is not connected
> E0615 15:01:43.476143  4201 process.cpp:1958] Failed to shutdown socket
> with fd 24: Transport endpoint is not connected
> E0615 15:01:43.476272  4201 process.cpp:1958] Failed to shutdown socket
> with fd 24: Transport endpoint is not connected
> E0615 15:01:43.476483  4201 process.cpp:1958] Failed to shutdown socket
> with fd 24: Transport endpoint is not connected
> E0615 15:01:43.476618  4201 process.cpp:1958] Failed to shutdown socket
> with fd 24: Transport endpoint is not connected
>
> ```
>
> And it will also cause the OOM errors, such as:
>
> ```
> I0615 15:01:43.324935  4172 mem.cpp:602] Started listening for OOM events
> for container f50b4c7a-d1d2-4fc8-abb9-5ab549f168dc
> I0615 15:01:43.325469 4172 mem.cpp:722] Started listening on low memory
> pressure events for container f50b4c7a-d1d2-4fc8-abb9-5ab549f168dc
> I0615 15:01:43.326004  4172 mem.cpp:722] Started listening on medium
> memory pressure events for container f50b4c7a-d1d2-4fc8-abb9-5ab549f168dc
> I0615 15:01:43.326539  4172 mem.cpp:722] Started listening on critical
> memory pressure events for container f50b4c7a-d1d2-4fc8-abb9-5ab549f168dc
>
> ```
>
> Did someone suffer this? thanks.
>
> --
> Best Regards,
> Chen, Qiang
>
>


Re: Rack awareness support for Mesos

2016-06-16 Thread Joris Van Remoortere
@Fan,

In the community meeting a question was raised around which frameworks
might be ready to use this.
Can you provide some more context for immediate use cases on the framework
side?

—
*Joris Van Remoortere*
Mesosphere

On Wed, Jun 15, 2016 at 5:04 PM, james <gar...@verizon.net> wrote:

> @Joris,
>
>
> OK. Now I understand where you are coming from. As soon as I get some
> time, I'll join that design discussion. Thanks for the clarifications.
>
> James
>
>
>
>
>
> On 06/15/2016 02:45 AM, Joris Van Remoortere wrote:
>
>> Since your interest is in the determination of the values, as
>> opposed to
>>
>> their propagation, I would just urge that you keep in mind that
>> we may
>>
>> (as a project) not want to support this information as the current
>>
>> string attributes.
>>
>>
>> Huh? Why not? If the attributes change, why can't this sub-project
>> just change with those changing string attributes? Maybe some
>> elaboration how this might not naturally be able to evolve is a
>> warranted detail of discussion?
>>
>>
>> Sorry, I should clarify what I meant by support. By support I mean that
>> we may not want to promise that those values will be there (support as a
>> feature), and what schemas are mangled into the random strings that we
>> currently call attributes. I did not mean that we wouldn't allow users
>> to inject their own values if they wanted to. We just wouldn't control
>> the standard or schema as a project and therefore couldn't support it.
>>
>> Any random collection of strings that has previously had no reserved
>> keywords is notoriously difficult to build new schemas in.
>> This is why we may want to instead introduce a typed structure that is
>> dedicated to fault domain information. This:
>>
>>   * Prevents us from colliding with current users' attributes.
>>   * Allows us to have more control over the types (YAY) and ranges of
>> values.
>>   * Allows us to introduce explicit structure such as dependency or
>> hierarchy.
>>
>> The fact that users have already encoded information in attributes is
>> not a reason for us to limit ourselves to that scope when better
>> structures may be available. This is why we shouldn't assume that the
>> project will *provide support for* (as opposed to allow users to) using
>> attributes.
>>
>> As your said, it is their prerogative to join the design discussion to
>> ensure that any formalized structure or schema we introduce is one that
>> they are agreeable with.
>>
>>
>>
>> —
>> *Joris Van Remoortere*
>> Mesosphere
>>
>> On Tue, Jun 14, 2016 at 6:31 PM, james <gar...@verizon.net
>> <mailto:gar...@verizon.net>> wrote:
>>
>> On 06/14/2016 08:14 AM, Joris Van Remoortere wrote:
>>
>> On the condition of compatible with existing framework which
>> already rely on parsing attributes for rack information.
>>
>> There is currently nothing in Mesos that specifies the format or
>> structure for rack information in attributes.
>> The fact that operators / frameworks have decided to add this
>> information out of band is their problem to solve.
>> We don't need to be backwards compatible with something we never
>> published to begin with. This is why it's ok for us to consider
>> adding a
>> typed form of failure domain information that is separate from the
>> typeless string attributes.
>>
>>
>> True. But you have to start somewhere, know that the schema and
>> codes will morph over time to maintain relevance  and usefulness. In
>> that vein, if folks have established interesting and useful
>> parameters for this work, then it is most beneficial that those
>> methods and codes are considered carefully.  AKA:: speak up now.
>> Diversity and inclusion are keenly beneficial, where practical.
>>
>>
>> Since your interest is in the determination of the values, as
>> opposed to
>> their propagation, I would just urge that you keep in mind that
>> we may
>> (as a project) not want to support this information as the current
>> string attributes.
>>
>>
>> Huh? Why not? If the attributes change, why can't this sub-project
>> just change with those changing string attributes? Maybe some
>> elaboration how this might not naturally 

Re: Rack awareness support for Mesos

2016-06-15 Thread Joris Van Remoortere
Since your interest is in the determination of the values, as opposed to

their propagation, I would just urge that you keep in mind that we may

(as a project) not want to support this information as the current

string attributes.


Huh? Why not? If the attributes change, why can't this sub-project just
> change with those changing string attributes? Maybe some elaboration how
> this might not naturally be able to evolve is a warranted detail of
> discussion?


Sorry, I should clarify what I meant by support. By support I mean that we
may not want to promise that those values will be there (support as a
feature), and what schemas are mangled into the random strings that we
currently call attributes. I did not mean that we wouldn't allow users to
inject their own values if they wanted to. We just wouldn't control the
standard or schema as a project and therefore couldn't support it.

Any random collection of strings that has previously had no reserved
keywords is notoriously difficult to build new schemas in.
This is why we may want to instead introduce a typed structure that is
dedicated to fault domain information. This:

   - Prevents us from colliding with current users' attributes.
   - Allows us to have more control over the types (YAY) and ranges of
   values.
   - Allows us to introduce explicit structure such as dependency or
   hierarchy.

The fact that users have already encoded information in attributes is not a
reason for us to limit ourselves to that scope when better structures may
be available. This is why we shouldn't assume that the project will
*provide support for* (as opposed to allow users to) using attributes.

As your said, it is their prerogative to join the design discussion to
ensure that any formalized structure or schema we introduce is one that
they are agreeable with.



—
*Joris Van Remoortere*
Mesosphere

On Tue, Jun 14, 2016 at 6:31 PM, james <gar...@verizon.net> wrote:

> On 06/14/2016 08:14 AM, Joris Van Remoortere wrote:
>
>> On the condition of compatible with existing framework which already rely
>>> on parsing attributes for rack information.
>>>
>> There is currently nothing in Mesos that specifies the format or
>> structure for rack information in attributes.
>> The fact that operators / frameworks have decided to add this
>> information out of band is their problem to solve.
>> We don't need to be backwards compatible with something we never
>> published to begin with. This is why it's ok for us to consider adding a
>> typed form of failure domain information that is separate from the
>> typeless string attributes.
>>
>
> True. But you have to start somewhere, know that the schema and codes will
> morph over time to maintain relevance  and usefulness. In that vein, if
> folks have established interesting and useful parameters for this work,
> then it is most beneficial that those methods and codes are considered
> carefully.  AKA:: speak up now. Diversity and inclusion are keenly
> beneficial, where practical.
>
>
> Since your interest is in the determination of the values, as opposed to
>> their propagation, I would just urge that you keep in mind that we may
>> (as a project) not want to support this information as the current
>> string attributes.
>>
>
> Huh? Why not? If the attributes change, why can't this sub-project just
> change with those changing string attributes? Maybe some elaboration how
> this might not naturally be able to evolve is a warranted detail of
> discussion?
>
>
> I would venture that both 'determination of the values and propagation
> (delays)' are inherently important in a cluster of many things:: hardware,
> resources, frameworks, security codes, etc etc. The author
> and others seem to be keenly aware that a tight focus is not going to
> work, at this stage, so a broad appeal to a multitude of needs is best.
> And in fact, until some idea is proven to be useless or too difficult to
> implement, the bigger the tent, the more useful the codes that define this
> project/idea become.  Personally, I'm very excited that someone has stepped
> up in this area; hoping they keep an open mind and flexibility geared
> toward multiplicative usage, in the future. Most mature hardware folks who
> build ideas into robust systems do exactly that, to motivate a
> multiplicative usage for organizing hardware, performance and state
> metrics, and timing signals, gregariously. All of this is routine semantics
> from a hardware perspective.
>
> At some point, folks will realize that kernel configuration, testing and
> tweaks are critical to cluster performance, regardless of the codes
> running on top of the cluster. So this project could easily use cgroups
> and such for achieve robustness in many areas of need.
>
>
>

Re: Rack awareness support for Mesos

2016-06-14 Thread Joris Van Remoortere
> On the condition of compatible with existing framework which already rely
on parsing attributes for rack information.
There is currently nothing in Mesos that specifies the format or structure
for rack information in attributes.
The fact that operators / frameworks have decided to add this information
out of band is their problem to solve.
We don't need to be backwards compatible with something we never published
to begin with. This is why it's ok for us to consider adding a typed form
of failure domain information that is separate from the typeless string
attributes.

Since your interest is in the determination of the values, as opposed to
their propagation, I would just urge that you keep in mind that we may (as
a project) not want to support this information as the current string
attributes.



—
*Joris Van Remoortere*
Mesosphere

On Tue, Jun 14, 2016 at 3:02 PM, Du, Fan <fan...@intel.com> wrote:

>
>
> On 2016/6/14 20:32, Joris Van Remoortere wrote:
>
>> #1. Stick with attributes for rack awareness
>>
>> I don't think this is the right approach; however, there seem to be 2
>> components to this discussion:
>>
>> 1. How the values are presented (Attributes vs. a new type-aware
>> structure)
>> 2. How the values are determined (scripts vs. automation vs. modules)
>>
>> It seems you are more interested in working on #2. If that's the case,
>> please make sure that you don't assume anything about #1, as we not
>> everyone agrees that we will use the existing attributes in the future.
>>
>
> On the condition of compatible with existing framework which already rely
> on parsing attributes for rack information.
>
> Quotes from my original statements:
> > For compatibility with existing framework, I tend to be ok with using
> > attributes to convey the rack information
>
> By all means, no matter what internal structures to use, current behavior
> should be honored. btw, I'm also thinking about #1, it's too earlier to
> bring up the details so far before the ticket got ACCEPTED.
>
> Any way, I'm always open to all kind of discussion, thanks for your
> comments! Joris.
>
> For #2, you should focus on an API (module or script results) that will
>> support all the different methods the community wants to use to generate
>> this data.
>>
>> As you mentioned, updating the values for a running agent is not
>> straightforward. A lot of design work will need to go into how these
>> values are propagated to frameworks that have made assumptions about
>> them, and which values are allowed to change vs. not.
>>
>> —
>> *Joris Van Remoortere*
>> Mesosphere
>>
>> On Tue, Jun 14, 2016 at 10:04 AM, Aaron Carey <aca...@ilm.com
>> <mailto:aca...@ilm.com>> wrote:
>>
>> #3 would be very helpful for us. Also related:
>>
>> https://issues.apache.org/jira/browse/MESOS-3059
>>
>> --
>>
>> Aaron Carey
>> Production Engineer - Cloud Pipeline
>> Industrial Light & Magic
>> London
>> 020 3751 9150
>>
>> 
>> From: Du, Fan [fan...@intel.com <mailto:fan...@intel.com>]
>> Sent: 14 June 2016 07:24
>> To: user@mesos.apache.org <mailto:user@mesos.apache.org>;
>> d...@mesos.apache.org <mailto:d...@mesos.apache.org>
>> Cc: Joris Van Remoortere; vinodk...@apache.org
>> <mailto:vinodk...@apache.org>
>>
>> Subject: Re: Rack awareness support for Mesos
>>
>> Hi everyone
>>
>> Let me summarize the discussion about Rack awareness in the community
>> so
>> far. First thanks for all the comments, advices or challenges! :)
>>
>> #1. Stick with attributes for rack awareness
>>
>> For compatibility with existing framework, I tend to be ok with using
>> attributes to convey the rack information, but with the goal to do it
>> automatically, easy to maintain and with good attributes schema. This
>> will bring up below question where the controversy starts.
>>
>> #2. Scripts vs programmatic way
>>
>> Both can be used to set attributes, I've made my arguments in the Jira
>> and the Design doc, I'm not gonna to argue more here. But please take
>> a
>> look discussion at MESOS-3366 before, which allow resources/attributes
>> discovery.
>>
>> A module to implement *slaveAttributesDecorator* hook will works like
>> a charm here in a static way. And need to justify attributes updating.
>>
>> #3. Allow updating attributes
>> 

Re: Rack awareness support for Mesos

2016-06-14 Thread Joris Van Remoortere
>
> #1. Stick with attributes for rack awareness

I don't think this is the right approach; however, there seem to be 2
components to this discussion:

1. How the values are presented (Attributes vs. a new type-aware structure)
2. How the values are determined (scripts vs. automation vs. modules)

It seems you are more interested in working on #2. If that's the case,
please make sure that you don't assume anything about #1, as we not
everyone agrees that we will use the existing attributes in the future.

For #2, you should focus on an API (module or script results) that will
support all the different methods the community wants to use to generate
this data.

As you mentioned, updating the values for a running agent is not
straightforward. A lot of design work will need to go into how these values
are propagated to frameworks that have made assumptions about them, and
which values are allowed to change vs. not.

—
*Joris Van Remoortere*
Mesosphere

On Tue, Jun 14, 2016 at 10:04 AM, Aaron Carey <aca...@ilm.com> wrote:

> #3 would be very helpful for us. Also related:
>
> https://issues.apache.org/jira/browse/MESOS-3059
>
> --
>
> Aaron Carey
> Production Engineer - Cloud Pipeline
> Industrial Light & Magic
> London
> 020 3751 9150
>
> 
> From: Du, Fan [fan...@intel.com]
> Sent: 14 June 2016 07:24
> To: user@mesos.apache.org; d...@mesos.apache.org
> Cc: Joris Van Remoortere; vinodk...@apache.org
> Subject: Re: Rack awareness support for Mesos
>
> Hi everyone
>
> Let me summarize the discussion about Rack awareness in the community so
> far. First thanks for all the comments, advices or challenges! :)
>
> #1. Stick with attributes for rack awareness
>
> For compatibility with existing framework, I tend to be ok with using
> attributes to convey the rack information, but with the goal to do it
> automatically, easy to maintain and with good attributes schema. This
> will bring up below question where the controversy starts.
>
> #2. Scripts vs programmatic way
>
> Both can be used to set attributes, I've made my arguments in the Jira
> and the Design doc, I'm not gonna to argue more here. But please take a
> look discussion at MESOS-3366 before, which allow resources/attributes
> discovery.
>
> A module to implement *slaveAttributesDecorator* hook will works like
> a charm here in a static way. And need to justify attributes updating.
>
> #3. Allow updating attributes
> Several cases need to be covered here:
>
> a). Mesos runs inside VMs or container, where live migration happens, so
> rack information need to be updated.
>
> b). LLDP packets are broadcasted by the interval 10s~30s, a vendor
> specific implementation, and rack information are usually stored in LLDP
> daemon to be queried. Worst cases(nodes fresh reboot, or daemon restart)
> would be: Mesos slave have to wait 10s~30s for a valid rack information
> before register to master. Allow updating attributes will mitigate this
> problem.
>
> c). Framework affinity
>
> Framework X prefers to run on the same nodes with another framwork Y.
> For example, it's desirable for Shark or Spark-SQL to reside on the
> *worker* node where Alluxio(former Tachyon) to gain more performance
> boosting as SPARK-6707 ticket message {tachyon=true;us-east-1=false}
>
> If framework could advertise agent attributes in the ResourcesOffer
> process, awesome!
>
>
> #4. Rearrange agents in a more scalable manner, like per rack basis
>
> Randomly offering agents resource to framework does not improve data
> locality, imagine the likelihood of a framework getting resources
> underneath the same rack, at the scale of +3 nodes. Moreover time to
> randomly shuffle the agents also grows.
>
> How about rearranging the agent in a per rack basis, and a minor change
> to the way how resources are allocated will fix this.
>
>
> I might not see the whole picture here, so comments are welcomed!
>
>
> On 2016/6/6 17:17, Du, Fan wrote:
> > Hi, Mesos folks
> >
> > I’ve been thinking about Mesos rack awareness support for a while,
> >
> > it’s a common interest for lots of data center applications to provide
> > data locality,
> >
> > fault tolerance and better task placement. Create MESOS-5545 to track
> > the story,
> >
> > and here is the initial design doc [1] to support rack awareness in
> Mesos.
> >
> > Looking forward to hear any comments from end user and other developers,
> >
> > Thanks!
> >
> > [1]:
> >
> https://docs.google.com/document/d/1rql_LZSwtQzBPALnk0qCLsmxcT3-zB7X7aJp-H3xxyE/edit?usp=sharing
> >
>


Re: Mesos 0.24.1 on Raspberry Pi 3

2016-06-07 Thread Joris Van Remoortere
All versions of mesos *should* work without systemd. The intent was to add
*support* for systemd, not make it a requirement.
If specific version of mesos *don't* work without systemd then that is a
bug, and it would be awesome if you could share specific issues (we can
make JIRAs).

The purpose of the `systemd_enable_support`flag was to prevent mesos from
thinking it should use systemd utilities when systemd was available on the
system (and therefore Mesos assumes it's being launched as a systemd unit).

I want to make it very clear that there is no intent to make systemd a
requirement :-) We would need to have a significant conversation in the
community first if that were the case.

I really enjoyed hearing this progress, so please do ping me on any JIRAs
where systemd made this project more difficult!

Joris

—
*Joris Van Remoortere*
Mesosphere

On Tue, Jun 7, 2016 at 1:01 PM, james <gar...@verizon.net> wrote:

> Just the opposite, I'm mostly interested in mesos without systemd
> on bard metal, minimized linux systems. So with that temporal requirement,
> what is the latest version of mesos that one can run
> without systemd?
>
> James
>
>
> On 06/07/2016 10:35 AM, Joris Van Remoortere wrote:
>
>> It should be straightforward to apply the patch that adds the
>> `systemd_enable_support` flag to older releases.
>> Let me know if you need help!
>>
>> —
>> *Joris Van Remoortere*
>> Mesosphere
>>
>> On Tue, Jun 7, 2016 at 11:28 AM, haosdent <haosd...@gmail.com
>> <mailto:haosd...@gmail.com>> wrote:
>>
>> No, it is mandatory in 0.25. `systemd_enable_support` is added since
>> 0.27 https://issues.apache.org/jira/browse/MESOS-4675
>>
>> On Tue, Jun 7, 2016 at 11:21 PM, Jan Schlicht <j...@mesosphere.io
>> <mailto:j...@mesosphere.io>> wrote:
>>
>> It's not mandatory. There's the `systemd_enable_support` flag to
>> enable some systemd related features on an agent but it can be
>> disabled.
>>
>> Cheers,
>> Jan
>>
>> On Tue, Jun 7, 2016 at 3:55 PM, james <gar...@verizon.net
>> <mailto:gar...@verizon.net>> wrote:
>>
>>
>> I thought systemd was not mandatory in version 0.25 and later?
>>
>> James
>>
>>
>> On 06/07/2016 07:42 AM, tommy xiao wrote:
>>
>> only 0.24 can work on it. 0.25 use systemd and can't
>> ignore it.
>>
>> 2016-06-07 7:50 GMT+08:00 Benjamin Mahler
>> <bmah...@apache.org <mailto:bmah...@apache.org>
>> <mailto:bmah...@apache.org <mailto:bmah...@apache.org>>>:
>>
>>  Cool stuff Andrew, thanks for sharing!
>>
>>  On Thu, Jun 2, 2016 at 11:50 AM, Andrew Spyker
>>  <aspy...@netflix.com.invalid>
>>  wrote:
>>
>>   > FYI, based on the work others have done in the
>> past, Netflix was
>>  able to
>>   > get Mesos agent building and running on
>> Raspberry Pi natively and
>>  under
>>   > Docker containers.  Please see this blog for the
>> information:
>>   >
>>   > bit.ly/TitusOnPi <http://bit.ly/TitusOnPi>
>> <http://bit.ly/TitusOnPi>
>>   >
>>   > --
>>   > Andrew Spyker (aspy...@netflix.com
>> <mailto:aspy...@netflix.com> <mailto:aspy...@netflix.com
>> <mailto:aspy...@netflix.com>>)
>>   > Twitter:  @aspyker  Blog: ispyker.blogspot.com
>> <http://ispyker.blogspot.com>
>>  <http://ispyker.blogspot.com>
>>   >
>>
>>
>>
>>
>> --
>> Deshi Xiao
>> Twitter: xds2000
>> E-mail: xiaods(AT)gmail.com <http://gmail.com>
>> <http://gmail.com>
>>
>>
>>
>>
>>
>> --
>> *Jan Schlicht*
>> Distributed Systems Engineer, Mesosphere
>>
>>
>>
>>
>> --
>> Best Regards,
>> Haosdent Huang
>>
>>
>>
>


Re: Rack awareness support for Mesos

2016-06-07 Thread Joris Van Remoortere
+dev.

@Fan, I responded on the JIRA with some next steps.
Thanks for bringing this up!

—
*Joris Van Remoortere*
Mesosphere

On Tue, Jun 7, 2016 at 12:58 PM, james <gar...@verizon.net> wrote:

> On 06/07/2016 09:57 AM, Du, Fan wrote:
>
>>
>>
>> On 2016/6/6 21:27, james wrote:
>>
>>> Hello,
>>>
>>>
>>> @Stephen::I guess Stephen is bringing up the 'security' aspect of who
>>> get's access to the information, particularly cluster/cloud devops,
>>> customers or interlopers?
>>>
>>
>> ACLs should play in this part to address security concern.
>>
>
> YES, and so much more! I know folks that their primary (in house cluster)
> usage is deep packet inspection on  the cluster
> With a cluster (inside) there is no limit to new tools that can be
> judiciously altered to benefit from cluster codes
>
>
>>
>>> @Fan:: As a consultant, most of my customers either have  or are
>>> planning hybrid installations, where some codes run on a local cluster
>>> or using 'the cloud' for dynamic load requirements. I would think your
>>> proposed scheme needs to be very flexible, both in application to a
>>> campus or Metropolitan Area Network, if not massively distributed around
>>> the globe. What about different resouce types (racks of arm64, gpu
>>> centric hardware, DSPs, FPGA etc etc. Hardware diversity bring many
>>> benefits to the cluster/cloud capabilities.
>>>
>>>
>>> This also begs the quesion of hardware management (boot/config/online)
>>> of the various hardware, such as is built into coreOS. Are several
>>> applications going to be supported? Standards track? Just Mesos DC/OS
>>> centric?
>>>
>>
>> It depends whether this proposal is accepted by Mesos, if you think
>> this feature is useful, let's discuss detailed requirement under
>> MESOS-5545.
>>
>
> OK. Take a look at 'Rackview' on sourceforge::
> 'http://rackview.sourceforge.net/'
>
>
> Do I have access to the jira system by default joining this list,
> or do I have to request permission somewhere? (sorry jira is new to me
> so recommendations on jira, per mesos, in a document, would be keen.)
>
>
>> btw, I have limited knowledge of CoreOS, will look into it.
>>
>
> CoreOS has some great ideas. But many of their codes are not current
> (when compared to the gentoo portage tree) and thus many are suspect
> for security/function.
>
> I thought the purpose was to get more folks involved here in discussions
> and then better formulated ideas  can migrate to the ticket (5545)  and
> repos.
>
>
>>
>>> TIMING DATA:: This is the main issue I see. Once you start 'vectoring
>>> in resources' you need to add timing (latency) data to encourage robust
>>> and diversified use of of this data. For HPC, this could be very
>>> valuable for rDMA abusive algorithms where memory constrained workloads
>>> not only need the knowledge of additional nearby memory resources, but
>>> the approximated (based on previous data collected) latency and
>>> bandwidth constraints to use those additional resources.
>>>
>>
>> Out of curiosity, which open sourced Mesos framework do you/your
>> customer run MPI?
>>
>
> Easy dude.Most of this work in tightly help and nothing to publish
> or open up yet. It's a mess (my professional opinion) right now and
> I'm testing a variety of tools just be able to have better instrumentation
> on these codes. Still rDMA is very attractive so it does warrant much
> attention and extreme, internal, excitement.
>
>
>
>
> Mesos can support MPI framework, but AFIK, it's immature [1][2].
>>
>
> YEP.
>
> I think this part of work should be investigated in future.
>>
>> [1]: https://github.com/apache/mesos/tree/master/mpi   <- mpd ring
>> version
>> [2]:https://github.com/mesosphere/mesos-hydra <- hydra version
>>
>
> Many codes floating around. Much excitement on new compiler features. Lots
> of hard work and testing going on. That said, the point I was try to make
> is "Vectoring in" resources, with a variety of parameters as a companion to
> your idea, is warranted for these aforementioned use cases
> and other opportunities.
>
>
>>
>>> Great idea. I do like it very much.
>>>
>>> hth,
>>> James
>>>
>>>
>>> On 06/06/2016 05:06 AM, Stephen Gran wrote:
>>>
>>>> Hi,
>>>>
>>>> This looks potentially interesting.  How does it work in a public c

Re: Mesos 0.24.1 on Raspberry Pi 3

2016-06-07 Thread Joris Van Remoortere
It should be straightforward to apply the patch that adds the
`systemd_enable_support` flag to older releases.
Let me know if you need help!

—
*Joris Van Remoortere*
Mesosphere

On Tue, Jun 7, 2016 at 11:28 AM, haosdent <haosd...@gmail.com> wrote:

> No, it is mandatory in 0.25. `systemd_enable_support` is added since 0.27
> https://issues.apache.org/jira/browse/MESOS-4675
>
> On Tue, Jun 7, 2016 at 11:21 PM, Jan Schlicht <j...@mesosphere.io> wrote:
>
>> It's not mandatory. There's the `systemd_enable_support` flag to enable
>> some systemd related features on an agent but it can be disabled.
>>
>> Cheers,
>> Jan
>>
>> On Tue, Jun 7, 2016 at 3:55 PM, james <gar...@verizon.net> wrote:
>>
>>>
>>> I thought systemd was not mandatory in version 0.25 and later?
>>>
>>> James
>>>
>>>
>>> On 06/07/2016 07:42 AM, tommy xiao wrote:
>>>
>>>> only 0.24 can work on it. 0.25 use systemd and can't ignore it.
>>>>
>>>> 2016-06-07 7:50 GMT+08:00 Benjamin Mahler <bmah...@apache.org
>>>> <mailto:bmah...@apache.org>>:
>>>>
>>>> Cool stuff Andrew, thanks for sharing!
>>>>
>>>> On Thu, Jun 2, 2016 at 11:50 AM, Andrew Spyker
>>>> <aspy...@netflix.com.invalid>
>>>> wrote:
>>>>
>>>>  > FYI, based on the work others have done in the past, Netflix was
>>>> able to
>>>>  > get Mesos agent building and running on Raspberry Pi natively and
>>>> under
>>>>  > Docker containers.  Please see this blog for the information:
>>>>  >
>>>>  > bit.ly/TitusOnPi <http://bit.ly/TitusOnPi>
>>>>  >
>>>>  > --
>>>>  > Andrew Spyker (aspy...@netflix.com <mailto:aspy...@netflix.com>)
>>>>  > Twitter:  @aspyker  Blog: ispyker.blogspot.com
>>>> <http://ispyker.blogspot.com>
>>>>  >
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Deshi Xiao
>>>> Twitter: xds2000
>>>> E-mail: xiaods(AT)gmail.com <http://gmail.com>
>>>>
>>>
>>>
>>
>>
>> --
>> *Jan Schlicht*
>> Distributed Systems Engineer, Mesosphere
>>
>
>
>
> --
> Best Regards,
> Haosdent Huang
>


Re: [VOTE] Release Apache Mesos 0.24.2 (rc1)

2016-03-04 Thread Joris Van Remoortere
+1 (binding)

On Mon, Feb 29, 2016 at 3:24 PM, Greg Mann  wrote:

> +1 (non-binding)
>
> `sudo make check` on Ubuntu 14.04, using gcc with libevent and SSL enabled.
> All tests pass except MemoryPressureMesosTest.CGROUPS_ROOT_Statistics,
> which is a known failure in 0.24.
>
> Cheers,
> Greg
>
>
> On Mon, Feb 29, 2016 at 11:20 AM, Kapil Arya  wrote:
>
> > +1 (binding)
> >
> > Successful CI builds for the following distros:
> >
> > amd64/centos/6
> > amd64/centos/7
> > amd64/debian/jessie
> > amd64/ubuntu/precise
> > amd64/ubuntu/trusty
> > amd64/ubuntu/vivid
> >
> > Kapil
> >
> > On Sat, Feb 27, 2016 at 1:12 AM, Michael Park  wrote:
> >
> >> Hi all,
> >>
> >> Please vote on releasing the following candidate as Apache Mesos 0.24.2.
> >>
> >>
> >> 0.24.2 includes the following:
> >>
> >>
> 
> >>
> >>- Improvements
> >>   - Allocator filter performance
> >>   - Port Ranges performance
> >>   - UUID performance
> >>   - `/state` endpoint performance
> >>   - GLOG performance
> >>   - Configurable task/framework history
> >>   - Offer filter timeout fix for backlogged allocator
> >>
> >>
> >>- Bugs
> >>- SSL
> >>   - Libevent
> >>   - Fixed point resources math
> >>- HDFS
> >>   - Agent upgrade compatibility
> >>   - Health checks
> >>
> >> The CHANGELOG for the release is available at:
> >>
> >>
> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=0.24.2-rc1
> >>
> >>
> 
> >>
> >> The candidate for Mesos 0.24.2 release is available at:
> >>
> >>
> https://dist.apache.org/repos/dist/dev/mesos/0.24.2-rc1/mesos-0.24.2.tar.gz
> >>
> >> The tag to be voted on is 0.24.2-rc1:
> >>
> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=commit;h=0.24.2-rc1
> >>
> >> The MD5 checksum of the tarball can be found at:
> >>
> >>
> https://dist.apache.org/repos/dist/dev/mesos/0.24.2-rc1/mesos-0.24.2.tar.gz.md5
> >>
> >> The signature of the tarball can be found at:
> >>
> >>
> https://dist.apache.org/repos/dist/dev/mesos/0.24.2-rc1/mesos-0.24.2.tar.gz.asc
> >>
> >> The PGP key used to sign the release is here:
> >> https://dist.apache.org/repos/dist/release/mesos/KEYS
> >>
> >> The JAR is up in Maven in a staging repository here:
> >> https://repository.apache.org/content/repositories/orgapachemesos-1110
> >>
> >> Please vote on releasing this package as Apache Mesos 0.24.2!
> >>
> >> The vote is open until Wed Mar 2 23:59:59 PST 2016 and passes if a
> >> majority of at least 3 +1 PMC votes are cast.
> >>
> >> [ ] +1 Release this package as Apache Mesos 0.24.2
> >> [ ] -1 Do not release this package because ...
> >>
> >> Thanks,
> >>
> >> Joris, Kapil, MPark
> >>
> >
> >
>


Re: [VOTE] Release Apache Mesos 0.26.1 (rc1)

2016-03-04 Thread Joris Van Remoortere
+1 (binding)
Greg's upgrade scripts & CI results

On Fri, Mar 4, 2016 at 11:30 AM, Vinod Kone  wrote:

> +1 (binding)
>
> On Tue, Mar 1, 2016 at 5:03 PM, Kevin Klues  wrote:
>
> > I committed a fix for this in:
> >
> >
> https://github.com/apache/mesos/commit/42f746937233349660c687ea7a66cc0a78871663
> >
> > Looks like that's post 0.26 though, so maybe it should be included in the
> > .1 rc
> >
> > On Mon, Feb 29, 2016 at 2:27 PM, Vinod Kone 
> wrote:
> >
> >> Looks like the ASF CI builds for CentOS7 are failing because they are
> >> unable to find JAVA_HOME. Couldn't tell if it's an issue with the docker
> >> build script or something in the configure script.
> >>
> >>
> >> checking for svn_txdelta in -lsvn_delta-1... yes
> >> checking for sasl_done in -lsasl2... yes
> >> checking SASL CRAM-MD5 support... yes
> >> checking for javac... /usr/bin/javac
> >> checking for java... /usr/bin/java
> >> checking value of Java system property 'java.home'...
> /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.71-2.b15.el7_2.x86_64/jre
> >> configure: error: could not guess JAVA_HOME
> >>
> >>
> >>
> >> *Revision*: a05261dbed1c2577676b11235380de95d586aeeb
> >>
> >>- refs/tags/0.26.1-rc1
> >>
> >> Configuration Matrix gcc clang
> >> centos:7 --verbose --enable-libevent --enable-ssl
> >> [image: Failed]
> >> <
> https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Release/8/COMPILER=gcc,CONFIGURATION=--verbose%20--enable-libevent%20--enable-ssl,OS=centos%3A7,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)/
> >
> >> [image: Not run]
> >> --verbose
> >> [image: Failed]
> >> <
> https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Release/8/COMPILER=gcc,CONFIGURATION=--verbose,OS=centos%3A7,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)/
> >
> >> [image: Not run]
> >> ubuntu:14.04 --verbose --enable-libevent --enable-ssl
> >> [image: Success]
> >> <
> https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Release/8/COMPILER=gcc,CONFIGURATION=--verbose%20--enable-libevent%20--enable-ssl,OS=ubuntu%3A14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)/
> >
> >> [image: Success]
> >> <
> https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Release/8/COMPILER=clang,CONFIGURATION=--verbose%20--enable-libevent%20--enable-ssl,OS=ubuntu%3A14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)/
> >
> >> --verbose
> >> [image: Success]
> >> <
> https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Release/8/COMPILER=gcc,CONFIGURATION=--verbose,OS=ubuntu%3A14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)/
> >
> >> [image: Success]
> >> <
> https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Release/8/COMPILER=clang,CONFIGURATION=--verbose,OS=ubuntu%3A14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)/
> >
> >>
> >> On Mon, Feb 29, 2016 at 11:21 AM, Kapil Arya 
> wrote:
> >>
> >>> +1 (binding)
> >>>
> >>> Successful CI builds for the following distros:
> >>>
> >>> amd64/centos/6
> >>> amd64/centos/7
> >>> amd64/debian/jessie
> >>> amd64/ubuntu/precise
> >>> amd64/ubuntu/trusty
> >>> amd64/ubuntu/vivid
> >>>
> >>> Kapil
> >>>
> >>> On Sat, Feb 27, 2016 at 12:26 AM, Michael Park 
> wrote:
> >>>
> >>> > Hi all,
> >>> >
> >>> > Please vote on releasing the following candidate as Apache Mesos
> >>> 0.26.1.
> >>> >
> >>> >
> >>> > 0.26.1 includes the following:
> >>> >
> >>> >
> >>>
> 
> >>> >
> >>> >- Improvements
> >>> >   - `/state` endpoint performance
> >>> >   - systemd integration
> >>> >   - GLOG performance
> >>> >   - Configurable task/framework history
> >>> >   - Offer filter timeout fix for backlogged allocator
> >>> >
> >>> >
> >>> >- Bugs
> >>> >- SSL
> >>> >   - Libevent
> >>> >   - Fixed point resources math
> >>> >- HDFS
> >>> >   - Agent upgrade compatibility
> >>> >
> >>> > The CHANGELOG for the release is available at:
> >>> >
> >>> >
> >>>
> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=0.26.1-rc1
> >>> >
> >>> >
> >>>
> 
> >>> >
> >>> > The candidate for Mesos 0.26.1 release is available at:
> >>> >
> >>>
> https://dist.apache.org/repos/dist/dev/mesos/0.26.1-rc1/mesos-0.26.1.tar.gz
> >>> >
> >>> > The tag to be voted on is 0.26.1-rc1:
> >>> >
> >>>
> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=commit;h=0.26.1-rc1
> >>> >
> >>> > The MD5 checksum of the tarball can be found at:
> >>> >
> >>> >
> >>>
> https://dist.apache.org/repos/dist/dev/mesos/0.26.1-rc1/mesos-0.26.1.tar.gz.md5
> >>> >
> >>> > The signature of the tarball can be found at:
> >>> >
> >>> >
> >>>
> https://dist.apache.org/repos/dist/dev/mesos/0.26.1-rc1/mesos-0.26.1.tar.gz.asc
> >>> >
> >>> > The PGP key used to sign the release is here:
> >>> > https://dist.apache.org/repos/dist/release/mesos/KEYS
> 

Re: [VOTE] Release Apache Mesos 0.27.2 (rc1)

2016-03-01 Thread Joris Van Remoortere
@Michael Browning:
>
> MasterTest.MaxCompletedTasksPerFrameworkFlag [flaky, tracked in
> MESOS-4518]

This is supposed to be fixed in this release. It is concerning that this
came up.
Can you verify this and provide logs to Kevin Klues?


—
*Joris Van Remoortere*
Mesosphere

On Tue, Mar 1, 2016 at 2:00 PM, Michael Browning <invitapri...@gmail.com>
wrote:

> +1 (non-binding)
>
> Fedora 23: `make check` non-root OK
> OS X: `make check` non-root OK
> Ubuntu 14.04: `make check` non-root, three failures:
> ContainerLoggerTest.DefaultToSandbox [flaky, tracked in MESOS-4615]
> MasterQuotaTest.AvailableResourcesAfterRescinding [flaky, tracked in
> MESOS-4542]
> MasterTest.MaxCompletedTasksPerFrameworkFlag [flaky, tracked in MESOS-4518]
>
> On Mon, Feb 29, 2016 at 10:40 PM, Greg Mann <g...@mesosphere.io> wrote:
>
> > +1 (non-binding)
> >
> > `sudo make check` on Ubuntu 14.04 using gcc, with libevent and SSL
> enabled.
> >
> > All tests pass except MemoryPressureMesosTest.CGROUPS_ROOT_Statistics,
> > which seems to be due to the issue found here:
> > https://issues.apache.org/jira/browse/MESOS-4053
> >
> >
> > On Mon, Feb 29, 2016 at 2:17 PM, Michael Park <mp...@apache.org> wrote:
> >
> > > Vinod, we've only committed the CHANGELOGs to the specific tags. I
> didn't
> > > realize that I should commit those to master as well, but it makes
> total
> > > sense to do so. I'll do that. Thanks.
> > >
> > > On 29 February 2016 at 13:50, Vinod Kone <vinodk...@apache.org> wrote:
> > >
> > >> I don't see CHANGELOGs for these versions on the master branch?
> > >>
> > >> On Mon, Feb 29, 2016 at 1:39 PM, Neil Conway <neil.con...@gmail.com>
> > >> wrote:
> > >>
> > >> > As described (briefly) in the release emails, 0.27.2, 0.26.1,
> 0.25.1,
> > >> > and 0.24.2 contains a new feature: "reliable floating point for
> scalar
> > >> > resources" (MESOS-4687).
> > >> >
> > >> > To elaborate on that slightly, Mesos now only supports scalar
> resource
> > >> > values with three decimal digits of precision (e.g., reserving
> "5.001
> > >> > CPUs" for a task). As a result of this change, frameworks that do
> > >> > their own resource math may see slightly different results;
> > >> > furthermore, if any frameworks were trying to manage extremely
> > >> > fine-grained resource values (> 3 decimal digits of precision), that
> > >> > will no longer be supported.
> > >> >
> > >> > For more information, please see:
> > >> >
> > >> >
> > >> >
> > >>
> >
> https://mail-archives.apache.org/mod_mbox/mesos-user/201602.mbox/%3CCAOW5sYZJn5caBOwZyPV008JgL1F2FYFxL_bM5CtYA2PF2OG7Bw%40mail.gmail.com%3E
> > >> >
> > >> >
> > >>
> >
> https://docs.google.com/document/d/14qLxjZsfIpfynbx0USLJR0GELSq8hdZJUWw6kaY_DXc/edit?usp=sharing
> > >> > https://issues.apache.org/jira/browse/MESOS-4687
> > >> >
> > >> > Neil
> > >> >
> > >> >
> > >> > On Fri, Feb 26, 2016 at 8:54 PM, Michael Park <mcyp...@gmail.com>
> > >> wrote:
> > >> > > Hi all,
> > >> > >
> > >> > > Please vote on releasing the following candidate as Apache Mesos
> > >> 0.27.2.
> > >> > >
> > >> > >
> > >> > > 0.27.2 includes the following:
> > >> > >
> > >> >
> > >>
> >
> 
> > >> > >
> > >> > > MESOS-4693 - Variable shadowing in
> > >> HookManager::slavePreLaunchDockerHook.
> > >> > > MESOS-4711 - Race condition in libevent poll implementation causes
> > >> crash.
> > >> > > MESOS-4754 - The "executors" field is exposed under a backwards
> > >> > incompatible
> > >> > > schema.
> > >> > > MESOS-4687 - Implement reliable floating point for scalar
> resources.
> > >> > >
> > >> > >
> > >> > > The CHANGELOG for the release is available at:
> > >> > >
> > >> >
> > >>
> >
> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=0.27.2-rc1
> > >>

Re: Precision of scalar resources

2016-02-14 Thread Joris Van Remoortere
+1
Thanks for taking this on Neil!

—
*Joris Van Remoortere*
Mesosphere

On Fri, Feb 12, 2016 at 11:25 PM, Neil Conway <neil.con...@gmail.com> wrote:

> tl;dr:
>
> If you use resource values with more than three decimal digits of
> precision (e.g., you are launching a task that uses 2.5001 CPUs),
> please speak up!
>
> 
>
> Mesos uses floating point to represent scalar resource values, such as
> the number of CPUs in a resource offer or dynamic reservation. The
> master does resource math in floating point, which leads to a few
> problems:
>
> * due to roundoff error, frameworks can receive offers that have
> unexpected resource values (e.g., MESOS-3990)
> * various internal assertions in the master can fail due to roundoff
> error (e.g., MESOS-3552).
>
> In the long term, we can solve these problems by switching to a
> fixed-point representation for scalar values. However, that will
> require a long deprecation cycle.
>
> In the short term, we should make floating point behavior more
> reliable. To do that, I propose:
>
> (1) Resource values will support AT MOST three decimal digits of
> precision. Additional precision in resource values will be discarded
> (via rounding).
>
> (2) The master will internally used a fixed-point representation to
> avoid unpredictable roundoff behavior.
>
> For more details, please see the design doc here:
>
> https://docs.google.com/document/d/14qLxjZsfIpfynbx0USLJR0GELSq8hdZJUWw6kaY_DXc
> -- comments welcome!
>
> Thanks,
> Neil
>


Fwd: [VOTE] Release Apache Mesos 0.26.0 (rc5)

2015-12-15 Thread Joris Van Remoortere
+1 (binding)

From: Till Toenshoff 
Date: Thu, Dec 10, 2015 at 2:55 PM
Subject: [VOTE] Release Apache Mesos 0.26.0 (rc5)
To: user@mesos.apache.org, dev 


Hi friends,

we did unfortunately, once again run into an issue that needed immediate
attention (see vote on rc4), hence we have to ask for another round of
testing and voting of this newest release-candidate.

The issue leading to this new release candidate was
https://issues.apache.org/jira/browse/MESOS-4106 <
https://issues.apache.org/jira/browse/MESOS-4106>. Apart from that, we also
pulled in a fix for https://issues.apache.org/jira/browse/MESOS-4015 <
https://issues.apache.org/jira/browse/MESOS-4015> as we believe it has
minimal additional risk while being very useful for some of us.

Please vote on releasing the following candidate as Apache Mesos 0.26.0.

The CHANGELOG for the release is available at:
https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=0.26.0-rc5


The candidate for Mesos 0.26.0 release is available at:
https://dist.apache.org/repos/dist/dev/mesos/0.26.0-rc5/mesos-0.26.0.tar.gz

The tag to be voted on is 0.26.0-rc5:
https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=commit;h=0.26.0-rc5

The MD5 checksum of the tarball can be found at:
https://dist.apache.org/repos/dist/dev/mesos/0.26.0-rc5/mesos-0.26.0.tar.gz.md5

The signature of the tarball can be found at:
https://dist.apache.org/repos/dist/dev/mesos/0.26.0-rc5/mesos-0.26.0.tar.gz.asc

The PGP key used to sign the release is here:
https://dist.apache.org/repos/dist/release/mesos/KEYS

The JAR is up in Maven in a staging repository here:
https://repository.apache.org/content/repositories/orgapachemesos-1095

Please vote on releasing this package as Apache Mesos 0.26.0!

The vote is open until Tue Dec 15 22:35:22 CET 2015 and passes if a
majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Mesos 0.26.0
[ ] -1 Do not release this package because ...

Thanks,
Till & Bernd


Re: Mesos and Zookeeper TCP keepalive

2015-11-10 Thread Joris Van Remoortere
Hi Jeremy,

Can you read the description of these
<https://github.com/apache/mesos/blob/249bc26306574d9db0527c04b7a83a1f1e75f71b/src/master/flags.cpp#L393-L422>
parameters on the master, and possibly share your values for these flags?

It seems from the re-registration attempt on the agent, that the master has
already treated the agent as "failed", and so will tell it to shut down on
any re-registration attempt.

I'm curious if there is a conflict (or too narrow of a time gap) of
timeouts in your environment to allow re-registration by the agent after
the agent notices it needs to re-establish the connection.

—
*Joris Van Remoortere*
Mesosphere

On Tue, Nov 10, 2015 at 5:02 AM, Jeremy Olexa <jol...@spscommerce.com>
wrote:

> Hi Tommy, Erik, all,
>
>
> You are correct in your assumption that I'm trying to solve for a one hour
> session expire time on a firewall. For some more background info, our
> master cluster is in datacenter X, the slaves in X will stay "up" for days
> and days. The slaves in a different datacenter, Y, connected to that master
> cluster will stay "up" for about a few days and restart. The master cluster
> is healthy, with a stable leader for months (no flapping), same for the ZK
> "leader". There are about 35 slaves in datacenter Y. Maybe the firewall
> session timer is a red herring because the slave restart is seemingly
> random (the slave with the highest uptime is 6 days, but a handful only
> have uptime of a day)
>
>
> I've started debugging this awhile ago, and the gist of the logs is here:
> https://gist.github.com/jolexa/1a80e26a4b017846d083 I've posted this back
> in October seeking help and Benjamin suggested network issues in both
> directions, so I thought firewall.
>
>
> Thanks for any hints,
>
> Jeremy
>
> --
> *From:* tommy xiao <xia...@gmail.com>
> *Sent:* Tuesday, November 10, 2015 3:07 AM
>
> *To:* user@mesos.apache.org
> *Subject:* Re: Mesos and Zookeeper TCP keepalive
>
> same here , same question with Erik. could you please input more
> background info, thanks
>
> 2015-11-10 15:56 GMT+08:00 Erik Weathers <eweath...@groupon.com>:
>
>> It would really help if you (Jeremy) explained the *actual* problem you
>> are facing.  I'm *guessing* that it's a firewall timing out the sessions
>> because there isn't activity on them for whatever the timeout of the
>> firewall is?   It seems likely to be unreasonably short, given that mesos
>> has constant activity between master and
>> slave/agent/whatever-it-is-being-called-nowadays-but-not-really-yet-maybe-someday-for-reals.
>>
>> - Erik
>>
>> On Mon, Nov 9, 2015 at 10:00 PM, Jojy Varghese <j...@mesosphere.io>
>> wrote:
>>
>>> Hi Jeremy
>>>  Its great that you are making progress but I doubt if this is what you
>>> intend to achieve since network failures are a valid state in distributed
>>> systems. If you think there is a special case you are trying to solve, I
>>> suggest proposing a design document for review.
>>>   For ZK client code, I would suggest asking the zookeeper mailing list.
>>>
>>> thanks
>>> -Jojy
>>>
>>> On Nov 9, 2015, at 7:56 PM, Jeremy Olexa <jol...@spscommerce.com> wrote:
>>>
>>> Alright, great, I'm making some progress,
>>>
>>> I did a simple copy/paste modification and recompiled mesos. The
>>> keepalive timer is set from slave to master so this is an improvement for
>>> me. I didn't test the other direction yet -
>>> https://gist.github.com/jolexa/ee9e152aa7045c558e02 - I'd like to file
>>> an enhancement request for this since it seems like an improvement for
>>> other people as well, after some real world testing
>>>
>>> I'm having some harder time figuring out the zk client code. I started
>>> by modifying build/3rdparty/zookeeper-3.4.5/src/c/zookeeper.c but either a)
>>> my change wasn't correct or b) I'm modifying a wrong file, since I
>>> just assumed using the c client. Is this the correct place?
>>>
>>> Thanks much,
>>> Jeremy
>>>
>>>
>>> --
>>> *From:* Jojy Varghese <j...@mesosphere.io>
>>> *Sent:* Monday, November 9, 2015 2:09 PM
>>> *To:* user@mesos.apache.org
>>> *Subject:* Re: Mesos and Zookeeper TCP keepalive
>>>
>>> Hi Jeremy
>>>  The “network” code is at
>>> "3rdparty/libprocess/include/process/network.hpp” ,
>>> "3rdparty/libprocess/src/poll_socket.hpp/cpp”.
>>>
>>> thanks
>>> joj

Re: unable to start mesos-slave as non-root user after 0.25 upgrade

2015-10-28 Thread Joris Van Remoortere
This may be related to the systemd support we added in 0.25.
If the agent detects it is running on systemd it will try to launch a
systemd slice under which to run the executors. If your non-root user does
not have sufficient permissions to perform these operations that will be a
problem.
Can you share the agent logs to verify this? You should be able to access
them using journalctl.

Joris

—
*Joris Van Remoortere*
Mesosphere

On Wed, Oct 28, 2015 at 12:33 PM, haosdent <haosd...@gmail.com> wrote:

> does mesos slave have any log?
>
> On Wed, Oct 28, 2015 at 11:42 PM, Rodrick Brown <rodr...@orchard-app.com>
> wrote:
>
>> After I upgraded the first thing I notice is that permissions on wrapper
>> script
>>
>> # ls -al /usr/bin/mesos-init-wrapper
>> -rwxr-x---. 1 root root 5202 Oct 12 21:08 /usr/bin/mesos-init-wrapper
>>
>> So systemd was unable to EXEC this script
>>
>> So I changed the perms on this wrapper
>> # chmod a+x  /usr/bin/mesos-init-wrapper
>>
>>
>> However I’m still unable to bring up the process via systemd
>>
>> Oct 28 15:39:27 prod-mesos-s-1.aws.orchardplatform.com systemd[1]:
>> Started Mesos Slave.
>> Oct 28 15:39:27 prod-mesos-s-1.aws.orchardplatform.com systemd[1]:
>> Starting Mesos Slave...
>> Oct 28 15:39:27 prod-mesos-s-1.aws.orchardplatform.com systemd[1]:
>> mesos-slave.service: main process exited, code=exited, status=126/n/a
>> Oct 28 15:39:27 prod-mesos-s-1.aws.orchardplatform.com systemd[1]: Unit
>> mesos-slave.service entered failed state.
>> Oct 28 15:39:27 prod-mesos-s-1.aws.orchardplatform.com systemd[1]:
>> mesos-slave.service failed.
>> Oct 28 15:39:47 prod-mesos-s-1.aws.orchardplatform.com systemd[1]:
>> mesos-slave.service holdoff time over, scheduling restart.
>> Oct 28 15:39:47 prod-mesos-s-1.aws.orchardplatform.com systemd[1]:
>> Started Mesos Slave.
>> Oct 28 15:39:47 prod-mesos-s-1.aws.orchardplatform.com systemd[1]:
>> Starting Mesos Slave...
>> Oct 28 15:39:47 prod-mesos-s-1.aws.orchardplatform.com systemd[1]:
>> mesos-slave.service: main process exited, code=exited, status=126/n/a
>> Oct 28 15:39:47 prod-mesos-s-1.aws.orchardplatform.com systemd[1]: Unit
>> mesos-slave.service entered failed state.
>> Oct 28 15:39:47 prod-mesos-s-1.aws.orchardplatform.com systemd[1]:
>> mesos-slave.service failed.
>>
>> # cat /usr/lib/systemd/system/mesos-slave.service
>> [Unit]
>> Description=Mesos Slave
>> After=network.target
>> Wants=network.target
>>
>> [Service]
>> User=mesos
>> ExecStart=/usr/bin/mesos-init-wrapper slave
>> KillMode=process
>> Restart=always
>> RestartSec=20
>> LimitNOFILE=16384
>> CPUAccounting=true
>> MemoryAccounting=true
>>
>> [Install]
>> WantedBy=multi-user.target
>>
>> The only change I made to the unit file was add User=mesos this worked in
>> previous versions of mesos.
>>
>> If remove User=mesos and have systemd bring the process up as root the
>> slave joins the cluster and everything works as designed.
>> Was something changed in 0.24.1 and 0.25 ?
>>
>> Thanks.
>>
>>
>> --
>>
>> [image: Orchard Platform] <http://www.orchardplatform.com/>
>>
>> Rodrick Brown / DevOPs Engineer
>> +1 917 445 6839 / rodr...@orchardplatform.com
>> <char...@orchardplatform.com>
>>
>> Orchard Platform
>> 101 5th Avenue, 4th Floor, New York, NY 10003
>> http://www.orchardplatform.com
>>
>> Orchard Blog <http://www.orchardplatform.com/blog/> | Marketplace
>> Lending Meetup <http://www.meetup.com/Peer-to-Peer-Lending-P2P/>
>>
>>
>> *NOTICE TO RECIPIENTS*: This communication is confidential and intended
>> for the use of the addressee only. If you are not an intended recipient of
>> this communication, please delete it immediately and notify the sender
>> by return email. Unauthorized reading, dissemination, distribution or
>> copying of this communication is prohibited. This communication does not 
>> constitute
>> an offer to sell or a solicitation of an indication of interest to purchase
>> any loan, security or any other financial product or instrument, nor is it
>> an offer to sell or a solicitation of an indication of interest to purchase
>> any products or services to any persons who are prohibited from receiving
>> such information under applicable law. The contents of this communication
>> may not be accurate or complete and are subject to change without notice.
>> As such, Orchard App, Inc. (including its subsidiaries and affiliates,
>> "Orchard") makes no representation regarding the accuracy or
>> completeness of the information contained herein. The intended recipient is
>> advised to consult its own professional advisors, including those
>> specializing in legal, tax and accounting matters. Orchard does not
>> provide legal, tax or accounting advice.
>>
>
>
>
> --
> Best Regards,
> Haosdent Huang
>


Re: [VOTE] Release Apache Mesos 0.25.0 (rc2)

2015-10-07 Thread Joris Van Remoortere
+1 (binding)

On Mon, Oct 5, 2015 at 11:12 PM, Niklas Nielsen 
wrote:

> Hi all,
>
> Please vote on releasing the following candidate as Apache Mesos 0.25.0.
>
>
>
> 0.25.0 includes the following:
>
>
> 
>
>  * [MESOS-1474] - Experimental support for maintenance primitives.
>
>  * [MESOS-2600] - Added master endpoints /reserve and /unreserve for
> dynamic reservations.
>
>  * [MESOS-2044] - Extended Module APIs to enable IP per container
> assignment, isolation and resolution.
>
>
> ** Bug fixes
>
>   * [MESOS-2635] - Web UI Display Bug when starting lots of tasks with
> small cpu value.
>
>   * [MESOS-2986] - Docker version output is not compatible with Mesos.
>
>   * [MESOS-3046] - Stout's UUID re-seeds a new random generator during
> each call to UUID::random.
>
>   * [MESOS-3051] - performance issues with port ranges comparison.
>
>   * [MESOS-3052] - Allocator performance issue when using a large number
> of filters.
>
>   * [MESOS-3136] - COMMAND health checks with Marathon 0.10.0 are broken.
>
>   * [MESOS-3169] - FrameworkInfo should only be updated if the
> re-registration is valid.
>
>   * [MESOS-3185] - Refactor Subprocess logic in linux/perf.cpp to use
> common subroutine.
>
>   * [MESOS-3239] - Refactor master HTTP endpoints help messages such that
> they cannot be out of sync.
>
>   * [MESOS-3245] - The comments of DRFSorter::dirty is not correct.
>
>   * [MESOS-3254] - Cgroup CHECK fails test harness.
>
>   * [MESOS-3258] - Remove Frameworkinfo capabilities on re-registration.
>
>   * [MESOS-3261] - Move QoS plug-ins to a specified folder like
> resource_estimator.
>
>   * [MESOS-3269] - The comments of Master::updateSlave() is not correct.
>
>   * [MESOS-3282] - Web UI no longer shows Tasks information.
>
>   * [MESOS-3344] - Add more comments for strings::internal::fmt.
>
>   * [MESOS-3351] - duplicated slave id in master after master failover.
>
>   * [MESOS-3387] - Refactor MesosContainerizer to accept namespace
> dynamically.
>
>   * [MESOS-3408] - Labels field of FrameworkInfo should be added into v1
> mesos.proto.
>
>   * [MESOS-3411] - ReservationEndpointsTest.AvailableResources appears to
> be faulty.
>
>   * [MESOS-3423] - Perf event isolator stops performing sampling if a
> single timeout occurs.
>
>   * [MESOS-3426] - process::collect and process::await do not perform
> discard propagation.
>
>   * [MESOS-3430] -
> LinuxFilesystemIsolatorTest.ROOT_PersistentVolumeWithoutRootFilesystem
> fails on CentOS 7.1.
>
>   * [MESOS-3450] - Update Mesos C++ Style Guide for namespace usage.
>
>   * [MESOS-3451] - Failing tests after changes to
> Isolator/MesosContainerizer API.
>
>   * [MESOS-3458] - Segfault when accepting or declining inverse offers.
>
>   * [MESOS-3474] - ExamplesTest.{TestFramework, JavaFramework,
> PythonFramework} failed on CentOS 6.
>
>   * [MESOS-3489] - Add support for exposing Accept/Decline responses for
> inverse offers.
>
>   * [MESOS-3490] - Mesos UI fails to represent JSON entities.
>
>   * [MESOS-3512] - Don't retry close() on EINTR.
>
>   * [MESOS-3513] - Cgroups Test Filters aborts tests on Centos 6.6.
>
>   * [MESOS-3519] - Fix file descriptor leakage / double close in the code
> base.
>
>   * [MESOS-3538] -
> CgroupsNoHierarchyTest.ROOT_CGROUPS_NOHIERARCHY_MountUnmountHierarchy test
> is flaky.
>
>   * [MESOS-3575] - V1 API java/python protos are not generated.
>
>
> ** Improvements
>
>   * [MESOS-2719] - Deprecating '.json' extension in master endpoints urls.
>
>   * [MESOS-2757] - Add -> operator for Option, Try, Result,
> Future.
>
>   * [MESOS-2875] - Add containerId to ResourceUsage to enable QoS
> controller to target a container.
>
>   * [MESOS-2964] - libprocess io does not support peek().
>
>   * [MESOS-2983] - Deprecating '.json' extension in slave endpoints url.
>
>   * [MESOS-2984] - Deprecating '.json' extension in files endpoints url.
>
>   * [MESOS-3037] - Add a SUPPRESS call to the scheduler.
>
>   * [MESOS-3187] - Docker cli option support.
>
>   * [MESOS-3304] - Remove remnants of LIBPROCESS_STATISTICS_WINDOW.
>
>   * [MESOS-3312] - Factor out JSON to repeated protobuf conversion.
>
>   * [MESOS-3340] - Command-line flags should take precedence over OS Env
> variables.
>
>   * [MESOS-3347] - Remove dead code in src/linux/perf.cpp.
>
>   * [MESOS-3377] - mesos docker container with container_name as ENV
> variable.
>
>   * [MESOS-3457] - Add flag to disable hostname lookup.
>
>
> The full CHANGELOG for the release is available at:
>
>
> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=0.25.0-rc2
>
>
> 
>
>
> The candidate for Mesos 0.25.0 release is available at:
>
> https://dist.apache.org/repos/dist/dev/mesos/0.25.0-rc2/mesos-0.25.0.tar.gz
>
>
> The tag to be voted on is 0.25.0-rc2:
>
> 

Re: mesos-containerizer: error while loading shared libraries: libmesos-0.24.0.so

2015-09-23 Thread Joris Van Remoortere
Can you run the slave and executor with GLOG_v=1 set on the environment and
try to provide some more context for this error:

> mesos-containerizer: error while loading shared libraries:
> libmesos-0.24.0.so: cannot open shared object file: No such file or
> directory

Are there any logs on the slave to provide more context?

On Fri, Sep 18, 2015 at 12:02 AM, F21  wrote:

> Hey haosdent, I was the one that opened the issue :)
>
>
> On 18/09/2015 1:52 PM, haosdent wrote:
>
> Hi, @F21 You problems is similar to
> 
> https://issues.apache.org/jira/browse/MESOS-3462? I would test it tonight
> and give you feedback later.
>
> On Fri, Sep 18, 2015 at 5:59 AM, F21  wrote:
>
>> I have that set using the environment variable:
>> MESOS_EXECUTOR_ENVIRONMENT_VARIABLES={"LD_LIBRARY_PATH":
>> "/path/to/mesos/lib"}
>>
>> However, it doesn't seem to have any effect.
>>
>> On 18/09/2015 12:27 AM, haosdent wrote:
>>
>>>
>>> MESOS_EXECUTOR_ENVIRONMENT_VARIABLES={"LD_LIBRARY_PATH":
>>> "/path/to/mesos/lib"}
>>>
>>
>>
>
>
> --
> Best Regards,
> Haosdent Huang
>
>
>


Re: Changing mesos slave configuration

2015-09-23 Thread Joris Van Remoortere
We are adding better support for systemd in 0.25. The ticket is MESOS-3425.
Naturally this is still somewhat experimental, but we would love your
feedback.
We will add some documentation on recommended setups on systemd.

With the changes going into 0.25 you should be able to launch your slave
with

> Killmode=control-groups
> Delegate=true


The slave should be able to be restarted while the executors remain running
un-interrupted.

This requires systemd version 218+, or a patched implementation of prior
version. You can see the related ticket for more information about
supported patched packages by RHEL, etc.: MESOS-3352

Joris

On Wed, Sep 23, 2015 at 8:11 AM, Brian Devins  wrote:

> Are you using systemd? There is a known issue with slave recovery on
> systemd. I'm on mobile or I would link you to the last thread around this
> but there is a line you can add to the config that is supposed to fix it.
> Whether it will fix it is another matter. I am fighting this issue at work
> myself.
> On Sep 23, 2015 7:53 AM, "Paul Bell"  wrote:
>
>> Hi Pradeep,
>>
>> Perhaps I am speaking to a slightly different point, but when I change
>> /etc/default/mesos-slave to add a new attribute, I have to remove file
>> /tmp/mesos/meta/slaves/latest.
>>
>> IIRC, mesos-slave itself, in failing to start after such a change, tells
>> me to do this:
>>
>> rm -f /tmp/mesos/meta/slaves/latest
>>
>>
>> But I know of no way to make such configuration changes without downtime.
>> And I'd very much like it if Mesos supported such dynamic changes. I
>> suppose this would require that the agent consult its default file on
>> demand, rather than once at start-up.
>>
>> Cordially,
>>
>> Paul
>>
>> On Wed, Sep 23, 2015 at 4:41 AM, Pradeep Chhetri <
>> pradeep.chhetr...@gmail.com> wrote:
>>
>>> Hello all,
>>>
>>> I have often faced this problem that whenever i try to add some
>>> configuration parameter to mesos-slave or change any configuration (eg. add
>>> a new attribute in mesos-slave), the mesos slave doesnt come up on restart.
>>> I have to delete the slave.info file and then restart the slave but it
>>> ends up killing all the docker containers started using mesos.
>>>
>>> I was trying to figure out the best way to make such changes without
>>> making any downtime.
>>>
>>> Thank you.
>>>
>>> --
>>> Pradeep Chhetri
>>>
>>
>>


Re: Is there a limit of the number of tasks that can be launched by Mesos on a slave?

2015-09-21 Thread Joris Van Remoortere
> I launched tasks with 0.1 cpus and 0.1 mems
The executor and task is likely taking more than this amount of memory.
Can you check htop to see if you've run out of memory to create more stacks
to launch new threads?

On Mon, Sep 21, 2015 at 1:34 AM, Dohyung Park  wrote:

> I am currently using Mesos + Marathon with 4 slaves for the test.
>
> When I launch a lot of tasks with command ping 8.8.8.8 or while true; do
> sleep 1; done;,
>
> at one point, slaves cannot launch a task any more. The states of tasks
> that are newly launched are 'Failed'.
>
> So I checked out stderr of sandbox, then it shows
>
> "Failed to initialize, pthread_create"
>
> I launched tasks with 0.1 cpus and 0.1 mems, so enough resources
> to launch a task remained in slaves.
>
> In a slave, ulimit -a shows
>
> *# ulimit -a*
>
> core file size  (blocks, -c) 0
> data seg size   (kbytes, -d) unlimited
> scheduling priority (-e) 0
> file size   (blocks, -f) unlimited
> pending signals (-i) 1545932
> max locked memory   (kbytes, -l) 64
> max memory size (kbytes, -m) unlimited
> *open files  (-n) 65535*
> pipe size(512 bytes, -p) 8
> POSIX message queues (bytes, -q) 819200
> real-time priority  (-r) 0
> stack size  (kbytes, -s) 8192
> cpu time   (seconds, -t) unlimited
> *max user processes  (-u) 1545932*
> virtual memory  (kbytes, -v) unlimited
> file locks  (-x) unlimited
>
> Is there any limit of the number of tasks that can be launched by Mesos on
> a slave?
>


Re: mesos 0.24 released?

2015-09-18 Thread Joris Van Remoortere
Mesos 0.24.0 was indeed released.
0.24.0 rc2 was voted in on september 4th. You can see the voting e-mail
chain on the dev list.


On Fri, Sep 18, 2015 at 7:51 AM, craig w  wrote:

> has mesos 0.24 been released? it's on the downloads page, but the tarball
> is from september 4
>
> http://mesos.apache.org/downloads/
>


Re: Apache Mesos Community Sync

2015-09-17 Thread Joris Van Remoortere
Youtube on-air: http://youtu.be/ZQT6-fw8Ito
Speakers channel:
https://plus.google.com/hangouts/_/hoaevent/AP36tYd59qP_P4ac-NwOI7LztI_hBsku54gXqk1DhFGsKkne_cmByA

On Mon, Sep 14, 2015 at 7:02 PM, Adam Bordelon <a...@mesosphere.io> wrote:

> We'll have the next community sync this Thursday (Sept. 17th) at 8:30am
> Pacific.
>
> Please add items to the agenda
> <https://docs.google.com/document/d/153CUCj5LOJCFAVpdDZC7COJDwKh9RDjxaTA0S7lzwDA/edit#heading=h.u1x3j7f3uixf>
> .
>
> We will try Hangouts on Air this time. We will post the video stream link
> shortly before the meeting, and only active participants (especially people
> on the agenda) should join the actual hangout. Others can watch the video
> stream and ask brief questions on #mesos on IRC. If you have something
> lengthier to discuss, put it on the agenda and ping us on email/IRC to get
> into the hangout. We hope this works better for everyone.
>
>
> On Wed, Sep 2, 2015 at 12:34 PM, Vinod Kone <vinodk...@apache.org> wrote:
>
>> We'll have the next community sync tomorrow (Sept 3rd) at 3 PM PST.
>>
>> Please add items to agenda
>> <https://docs.google.com/document/d/153CUCj5LOJCFAVpdDZC7COJDwKh9RDjxaTA0S7lzwDA/edit#heading=h.u1x3j7f3uixf>
>> .
>>
>>
>> On Wed, Aug 5, 2015 at 4:12 PM, Vinod Kone <vinodk...@gmail.com> wrote:
>>
>>> We'll have the next community sync tomorrow at 3 PM PST.
>>>
>>> Please add items to agenda
>>> <https://docs.google.com/document/d/153CUCj5LOJCFAVpdDZC7COJDwKh9RDjxaTA0S7lzwDA/edit#heading=h.u1x3j7f3uixf>
>>> .
>>>
>>> Thanks,
>>>
>>> On Thu, Jul 2, 2015 at 11:24 AM, Joris Van Remoortere <
>>> jo...@mesosphere.io> wrote:
>>>
>>>> Reminder: The Mesos Community Developer Sync will be happening today at
>>>> 3pm Pacific.
>>>>
>>>> To participate remotely, join the Google hangout:
>>>> https://plus.google.com/hangouts/_/twitter.com/mesos-sync
>>>>
>>>> On Thu, Jun 18, 2015 at 7:22 AM, Adam Bordelon <a...@mesosphere.io>
>>>> wrote:
>>>>
>>>>> Reminder: We're hosting a developer community sync at Mesosphere HQ
>>>>> this morning from 9-11am Pacific.
>>>>>
>>>>> The agenda is pretty bare, so please add more topics you would like to
>>>>> discuss:
>>>>>
>>>>> https://docs.google.com/document/d/153CUCj5LOJCFAVpdDZC7COJDwKh9RDjxaTA0S7lzwDA/edit
>>>>>
>>>>> If you want to join in person, just show up to 88 Stevenson St, ring
>>>>> the buzzer, take the elevator up to 2nd floor, and then you can take the
>>>>> stairs up to the 3rd floor dining room, or ask somebody to let you up the
>>>>> elevator to the 3rd floor.
>>>>>
>>>>> To participate remotely, join the Google hangout:
>>>>> https://plus.google.com/hangouts/_/mesosphere.io/mesos-developer
>>>>>
>>>>> On Mon, Jun 15, 2015 at 10:46 AM, Adam Bordelon <a...@mesosphere.io>
>>>>> wrote:
>>>>>
>>>>>> As previously mentioned, we would like to host additional Mesos
>>>>>> developer syncs at our new Mesosphere HQ at 88 Stevenson St (tucked 
>>>>>> behind
>>>>>> Market & 2nd), starting this Thursday from 9-11am Pacific. We opted for 
>>>>>> an
>>>>>> earlier slot so that the European developer community can participate.
>>>>>>
>>>>>> Now that we are having these more frequently, it would be great to
>>>>>> dive deeper into designs for upcoming features as well as discuss
>>>>>> longstanding issues. While high-level status updates are useful, they
>>>>>> should be a small part of these meetings so that we can address issues
>>>>>> currently facing our developers.
>>>>>>
>>>>>> Please add agenda items to the same doc we've been using for previous
>>>>>> meetings' Agenda/Notes:
>>>>>>
>>>>>> https://docs.google.com/document/d/153CUCj5LOJCFAVpdDZC7COJDwKh9RDjxaTA0S7lzwDA/edit
>>>>>>
>>>>>> Join in person if you can, or join remotely via hangout:
>>>>>> https://plus.google.com/hangouts/_/mesosphere.io/mesos-developer
>>>>>>
>>>>>> Thanks,
>>>>>> -Adam-
>>>>>>
>>>>>>
>>>>>> On Thu, May 28, 

Re: SSL in Mesos 0.23

2015-08-25 Thread Joris Van Remoortere
@Dharmit

If you want to be really sure that the communication is happening over SSL,
you can use a packet sniffing tool like wireshark, or depending on your
operating system you can dump the packet streams directly to a file. For
example TCP dump.
Another thing you can do is to try and hit the HTTP endpoints from curl
using http as opposed to https.

Remember that if you have SSL_SUPPORT_DOWNGRADE=true you should be able to
connect even without SSL. If it is false (the default) you will not be able
to connect.

On Mon, Aug 10, 2015 at 4:43 AM, Dharmit Shah shahdhar...@gmail.com wrote:

 Hi Jeff,

 Thanks for the suggestion.

 I modified the systemd service file to use
 `/etc/sysconfig/mesos-master` and `/etc/sysconfig/mesos-slave` as
 environment files for master and slave services respectively. In these
 files, I specified the environment variables that I used to specify on
 the command line.

 Now if I check `strings /proc/pid/environ | grep SSL` for pids of
 master and slave services, I see the environment variables that I set
 in the /etc/sysconfig/environment-file.

 Now that it looks like I have started the master and slave services
 with SSL enabled, how do I really confirm that communication between
 master and slaves is really happening over SSL?

 Also, how do I enable SSL communication for a framework like Marathon?

 Regards,
 Dharmit.

 On Fri, Aug 7, 2015 at 10:56 PM, Jeff Schroeder
 jeffschroe...@computer.org wrote:
  The sudo command defaults to envreset (look for that in the man page)
 which
  strips all env variables sans a select few. I'd almost bet that your
 SSL_*
  variables are not present and were not passed to the slave. Just sudo -i
 and
  start the slaves *as root* without sudo. There is no benefit to starting
  them with sudo. You can verify what I'm saying with something along the
  lines of:
 
  strings /proc/$(pidof mesos-slave)/environ | grep ^SSL_
 
 
  On Friday, August 7, 2015, Dharmit Shah shahdhar...@gmail.com wrote:
 
  Hello again,
 
  Thanks for your responses. I will share what I tried after your
  suggestions.
 
  1. `ldd /usr/sbin/mesos-master` and `ldd /usr/sbin/mesos-slave`
  returned similar output as one suggested by Craig. So, I guess, the
  Mesosphere repo binaries have SSL enabled. Right?
 
  2. I created SSL private key and cert on one system in my cluster by
  referring this guide on DO [1]. Admittedly, my knowledge of SSL is
  limited.
 
  3. Next, I copied the key and cert to all three mesos-master nodes and
  four mesos-slave nodes. Shouldn't slave nodes be provided only with
  the cert and not the private key? Whereas all master nodes may have
  the private key and cert both. Or am I understanding SSL incorrectly
  here?
 
  4. After copying the cert and key, I started the mesos-master service
  on master nodes with below command:
 
  $ sudo SSL_ENABLED=true SSL_KEY_FILE=~/ssl/mesos.key
  SSL_CERT_FILE=~/ssl/mesos.crt /usr/sbin/mesos-master
  --zk=zk://172.19.10.111:2181,172.19.10.112:2181,
 172.19.10.193:2181/mesos
  --port=5050 --log_dir=/var/log/mesos --acls=file:///root/acls.json
  --credentials=/home/isys/mesos --quorum=2 --work_dir=/var/lib/mesos
 
  I check web UI and things look good. I am not completely sure if
  https should have worked for mesos web UI but, it didn't.
 
  5. Next, I start slave nodes with below command:
 
$ sudo SSL_ENABLED=true SSL_CERT_FILE=~/mesos.crt
  SSL_KEY_FILE=~/mesos.key /usr/sbin/mesos-slave
 
  --master=zk://172.19.10.111:2181,172.19.10.112:2181,
 172.19.10.193:2181/mesos
  --log_dir=/var/log/mesos --containerizers=docker,mesos
  --executor_registration_timeout=15mins
 
  Mesos web UI reported four mesos-slave nodes in Activated mode. So
  far so good. I am still wondering how I should verify if communication
  is happening over SSL.
 
  6. To check if SSL is indeed working, I stopped one slave node and
  started it without SSL using `systemctl start mesos-slave`. I was
  expecting it to not get into Activated state on Mesos web UI but it
  did. So, I think SSL is not configured properly by me.
 
  I am attaching logs from the master nodes. These logs were generated
  after starting masters with command specified in point 4.
 
  Let me know if I am doing something wrong or if you need more logs or
  need me to execute some specific commands.
 
  [1]
 
 https://www.digitalocean.com/community/tutorials/openssl-essentials-working-with-ssl-certificates-private-keys-and-csrs
 
  Regards,
  Dharmit.
 
  On Fri, Aug 7, 2015 at 2:52 AM, Michael Park mcyp...@gmail.com wrote:
   Hi Dharmit,
  
   I'm not certain whether the Mesosphere deb packages have SSL enabled
 or
   not,
   although based on Craig's observation it looks like it is.
  
   I think the correct way to enable SSL is to set the SSL_ENABLED
   environment
   variable, rather than /etc/mesos-master/ssl_enabled. Of course, along
   with
   the rest of the SSL_ environment variables.
  
   e.g. SSL_ENABLED=true SSL_KEY_FILE=path-to-your-private-key
  

Re: SSL in Mesos 0.23

2015-08-25 Thread Joris Van Remoortere
@carlos
Are you building 0.23.0 from source?
Just so we don't miss anything: Can you make sure to run ./bootstrap, and
build in a clean directory with your configuration similar to this:

../configure --enable-libevent --enable-ssl

Here http://mesos.apache.org/documentation/latest/mesos-ssl/ is the
document I am using as a reference

When you start up a master, if you just specify SSL_ENABLED=true it should
error out and notify you that other required flags such as SSL_KEY_FILE are
not provided. Can you verify this? If that is not happening, then the 2
options are:
1. Your environment variables are not making it to the binary: See Jeff
Schroeder's comments
2. The binary is not actually the one you expect. Double check the checksum
with the binary you built after configuring with SSL.



On Fri, Aug 14, 2015 at 12:55 PM, Carlos Sanchez car...@apache.org wrote:

 looking forward to it, thanks!
 running out of ideas here on what am I doing wrong

 On Fri, Aug 14, 2015 at 6:53 PM, Marco Massenzio ma...@mesosphere.io
 wrote:
  FYI - Joris is out this week, he'll be probably able to get back to you
  early next (modulo MesosCon craziness :)
 
  Marco Massenzio
  Distributed Systems Engineer
 
  On Fri, Aug 14, 2015 at 9:14 AM, Carlos Sanchez car...@apache.org
 wrote:
 
  no suggestions?
 
  On Tue, Aug 11, 2015 at 6:47 PM, Vinod Kone vinodk...@apache.org
 wrote:
   @joris, can you help out here?
  
   On Tue, Aug 11, 2015 at 9:43 AM, Carlos Sanchez car...@apache.org
   wrote:
  
   I have tried to enable SSL with no success, even compiling from
 source
   with the ssl flags --enable-libevent --enable-ssl
  
   export SSL_ENABLED=true
   export SSL_SUPPORT_DOWNGRADE=false
   export SSL_REQUIRE_CERT=true
   export SSL_CERT_FILE=/etc/mesos/...
   export SSL_KEY_FILE=/etc/mesos/...
   export SSL_CA_FILE=/etc/mesos/...
  
  
   /home/ubuntu/mesos-deb-packaging/mesos-repo/build/src/mesos-master
   --work_dir=/var/lib/mesos
  
   Port 5050 is still served as plain http, no SSL
  
   Nothing about ssl shows up in the logs, any ideas?
  
   Thanks
  
  
   
From: Dharmit Shah shahdhar...@gmail.com
To: user@mesos.apache.org
Cc:
Date: Mon, 10 Aug 2015 14:13:04 +0530
Subject: Re: SSL in Mesos 0.23
Hi Jeff,
   
Thanks for the suggestion.
   
I modified the systemd service file to use
`/etc/sysconfig/mesos-master` and `/etc/sysconfig/mesos-slave` as
environment files for master and slave services respectively. In
these
files, I specified the environment variables that I used to specify
on
the command line.
   
Now if I check `strings /proc/pid/environ | grep SSL` for pids of
master and slave services, I see the environment variables that I
 set
in the /etc/sysconfig/environment-file.
   
Now that it looks like I have started the master and slave services
with SSL enabled, how do I really confirm that communication
 between
master and slaves is really happening over SSL?
   
Also, how do I enable SSL communication for a framework like
Marathon?
   
Regards,
Dharmit.
   
On Fri, Aug 7, 2015 at 10:56 PM, Jeff Schroeder
jeffschroe...@computer.org wrote:
 The sudo command defaults to envreset (look for that in the man
 page)
 which
 strips all env variables sans a select few. I'd almost bet that
 your
 SSL_*
 variables are not present and were not passed to the slave. Just
 sudo
 -i and
 start the slaves *as root* without sudo. There is no benefit to
 starting
 them with sudo. You can verify what I'm saying with something
 along
 the
 lines of:

 strings /proc/$(pidof mesos-slave)/environ | grep ^SSL_


 On Friday, August 7, 2015, Dharmit Shah shahdhar...@gmail.com
 wrote:

 Hello again,

 Thanks for your responses. I will share what I tried after your
 suggestions.

 1. `ldd /usr/sbin/mesos-master` and `ldd /usr/sbin/mesos-slave`
 returned similar output as one suggested by Craig. So, I guess,
 the
 Mesosphere repo binaries have SSL enabled. Right?

 2. I created SSL private key and cert on one system in my
 cluster
 by
 referring this guide on DO [1]. Admittedly, my knowledge of SSL
 is
 limited.

 3. Next, I copied the key and cert to all three mesos-master
 nodes
 and
 four mesos-slave nodes. Shouldn't slave nodes be provided only
 with
 the cert and not the private key? Whereas all master nodes may
 have
 the private key and cert both. Or am I understanding SSL
 incorrectly
 here?

 4. After copying the cert and key, I started the mesos-master
 service
 on master nodes with below command:

 $ sudo SSL_ENABLED=true SSL_KEY_FILE=~/ssl/mesos.key
 SSL_CERT_FILE=~/ssl/mesos.crt /usr/sbin/mesos-master


 --zk=zk://172.19.10.111:2181,172.19.10.112:2181,
 172.19.10.193:2181/mesos
 --port=5050 

Re: SSL in Mesos 0.23

2015-08-25 Thread Joris Van Remoortere
@Carlos
Mesosphere currently doesn't build packages with ssl enabled.

On Tue, Aug 25, 2015 at 3:12 PM, Carlos Sanchez car...@apache.org wrote:

 Hi Joris,

 I did build from sources, following instructions in
 http://mesos.apache.org/gettingstarted/

 Is the mesosphere binary compiled with libevent and ssl enabled as
 mentioned previously? would make debugging easier if I don't have to rebuild



 On Tue, Aug 25, 2015 at 8:52 PM, Joris Van Remoortere jo...@mesosphere.io
  wrote:

 @carlos
 Are you building 0.23.0 from source?
 Just so we don't miss anything: Can you make sure to run ./bootstrap,
 and build in a clean directory with your configuration similar to this:

 ../configure --enable-libevent --enable-ssl

 Here http://mesos.apache.org/documentation/latest/mesos-ssl/ is the
 document I am using as a reference

 When you start up a master, if you just specify SSL_ENABLED=true it
 should error out and notify you that other required flags such as 
 SSL_KEY_FILE
 are not provided. Can you verify this? If that is not happening, then the
 2 options are:
 1. Your environment variables are not making it to the binary: See Jeff
 Schroeder's comments
 2. The binary is not actually the one you expect. Double check the
 checksum with the binary you built after configuring with SSL.



 On Fri, Aug 14, 2015 at 12:55 PM, Carlos Sanchez car...@apache.org
 wrote:

 looking forward to it, thanks!
 running out of ideas here on what am I doing wrong

 On Fri, Aug 14, 2015 at 6:53 PM, Marco Massenzio ma...@mesosphere.io
 wrote:
  FYI - Joris is out this week, he'll be probably able to get back to you
  early next (modulo MesosCon craziness :)
 
  Marco Massenzio
  Distributed Systems Engineer
 
  On Fri, Aug 14, 2015 at 9:14 AM, Carlos Sanchez car...@apache.org
 wrote:
 
  no suggestions?
 
  On Tue, Aug 11, 2015 at 6:47 PM, Vinod Kone vinodk...@apache.org
 wrote:
   @joris, can you help out here?
  
   On Tue, Aug 11, 2015 at 9:43 AM, Carlos Sanchez car...@apache.org
   wrote:
  
   I have tried to enable SSL with no success, even compiling from
 source
   with the ssl flags --enable-libevent --enable-ssl
  
   export SSL_ENABLED=true
   export SSL_SUPPORT_DOWNGRADE=false
   export SSL_REQUIRE_CERT=true
   export SSL_CERT_FILE=/etc/mesos/...
   export SSL_KEY_FILE=/etc/mesos/...
   export SSL_CA_FILE=/etc/mesos/...
  
  
   /home/ubuntu/mesos-deb-packaging/mesos-repo/build/src/mesos-master
   --work_dir=/var/lib/mesos
  
   Port 5050 is still served as plain http, no SSL
  
   Nothing about ssl shows up in the logs, any ideas?
  
   Thanks
  
  
   
From: Dharmit Shah shahdhar...@gmail.com
To: user@mesos.apache.org
Cc:
Date: Mon, 10 Aug 2015 14:13:04 +0530
Subject: Re: SSL in Mesos 0.23
Hi Jeff,
   
Thanks for the suggestion.
   
I modified the systemd service file to use
`/etc/sysconfig/mesos-master` and `/etc/sysconfig/mesos-slave` as
environment files for master and slave services respectively. In
these
files, I specified the environment variables that I used to
 specify
on
the command line.
   
Now if I check `strings /proc/pid/environ | grep SSL` for pids
 of
master and slave services, I see the environment variables that
 I set
in the /etc/sysconfig/environment-file.
   
Now that it looks like I have started the master and slave
 services
with SSL enabled, how do I really confirm that communication
 between
master and slaves is really happening over SSL?
   
Also, how do I enable SSL communication for a framework like
Marathon?
   
Regards,
Dharmit.
   
On Fri, Aug 7, 2015 at 10:56 PM, Jeff Schroeder
jeffschroe...@computer.org wrote:
 The sudo command defaults to envreset (look for that in the man
 page)
 which
 strips all env variables sans a select few. I'd almost bet that
 your
 SSL_*
 variables are not present and were not passed to the slave.
 Just
 sudo
 -i and
 start the slaves *as root* without sudo. There is no benefit to
 starting
 them with sudo. You can verify what I'm saying with something
 along
 the
 lines of:

 strings /proc/$(pidof mesos-slave)/environ | grep ^SSL_


 On Friday, August 7, 2015, Dharmit Shah shahdhar...@gmail.com
 
 wrote:

 Hello again,

 Thanks for your responses. I will share what I tried after
 your
 suggestions.

 1. `ldd /usr/sbin/mesos-master` and `ldd
 /usr/sbin/mesos-slave`
 returned similar output as one suggested by Craig. So, I
 guess,
 the
 Mesosphere repo binaries have SSL enabled. Right?

 2. I created SSL private key and cert on one system in my
 cluster
 by
 referring this guide on DO [1]. Admittedly, my knowledge of
 SSL is
 limited.

 3. Next, I copied the key and cert to all three mesos-master
 nodes
 and
 four mesos-slave nodes. Shouldn't slave nodes be provided only
 with
 the cert

Re: Apache Mesos Community Sync

2015-07-02 Thread Joris Van Remoortere
Reminder: The Mesos Community Developer Sync will be happening today at 3pm
Pacific.

To participate remotely, join the Google hangout:
https://plus.google.com/hangouts/_/twitter.com/mesos-sync

On Thu, Jun 18, 2015 at 7:22 AM, Adam Bordelon a...@mesosphere.io wrote:

 Reminder: We're hosting a developer community sync at Mesosphere HQ this
 morning from 9-11am Pacific.

 The agenda is pretty bare, so please add more topics you would like to
 discuss:

 https://docs.google.com/document/d/153CUCj5LOJCFAVpdDZC7COJDwKh9RDjxaTA0S7lzwDA/edit

 If you want to join in person, just show up to 88 Stevenson St, ring the
 buzzer, take the elevator up to 2nd floor, and then you can take the stairs
 up to the 3rd floor dining room, or ask somebody to let you up the elevator
 to the 3rd floor.

 To participate remotely, join the Google hangout:
 https://plus.google.com/hangouts/_/mesosphere.io/mesos-developer

 On Mon, Jun 15, 2015 at 10:46 AM, Adam Bordelon a...@mesosphere.io
 wrote:

 As previously mentioned, we would like to host additional Mesos developer
 syncs at our new Mesosphere HQ at 88 Stevenson St (tucked behind Market 
 2nd), starting this Thursday from 9-11am Pacific. We opted for an earlier
 slot so that the European developer community can participate.

 Now that we are having these more frequently, it would be great to dive
 deeper into designs for upcoming features as well as discuss longstanding
 issues. While high-level status updates are useful, they should be a small
 part of these meetings so that we can address issues currently facing our
 developers.

 Please add agenda items to the same doc we've been using for previous
 meetings' Agenda/Notes:

 https://docs.google.com/document/d/153CUCj5LOJCFAVpdDZC7COJDwKh9RDjxaTA0S7lzwDA/edit

 Join in person if you can, or join remotely via hangout:
 https://plus.google.com/hangouts/_/mesosphere.io/mesos-developer

 Thanks,
 -Adam-


 On Thu, May 28, 2015 at 10:08 AM, Vinod Kone vinodk...@gmail.com wrote:

 Cool.

 Here's the agenda doc
 
 https://docs.google.com/document/d/153CUCj5LOJCFAVpdDZC7COJDwKh9RDjxaTA0S7lzwDA/edit#
 
 for next week that folks can fill in.

 On Thu, May 28, 2015 at 9:52 AM, Adam Bordelon a...@mesosphere.io
 wrote:

  Looks like next week, Thursday June 4th on my calendar.
  I thought it was always the first Thursday of the month.
 
  On Thu, May 28, 2015 at 9:33 AM, Vinod Kone vinodk...@gmail.com
 wrote:
 
   Do we have community sync today or next week? I'm a bit confused.
  
   @vinodkone
  
On Apr 1, 2015, at 3:18 AM, Adam Bordelon a...@mesosphere.io
 wrote:
   
Reminder: We're having another Mesos Developer Community Sync this
Thursday, April 2nd from 3-5pm Pacific.
   
Agenda:
   
  
 
 https://docs.google.com/document/d/153CUCj5LOJCFAVpdDZC7COJDwKh9RDjxaTA0S7lzwDA/edit?usp=sharing
To Join: follow the BlueJeans instructions from the recurring
 meeting
invite at the start of this thread.
   
On Fri, Mar 6, 2015 at 11:11 AM, Vinod Kone vinodk...@apache.org
 
   wrote:
   
Hi folks,
   
We are planning to do monthly Mesos community meetings.
 Tentatively
   these
are scheduled to occur on 1st Thursday of every month at 3 PM
 PST. See
below for details to join the meeting remotely.
   
This is a forum to ask questions/discuss about upcoming features,
   process
etc. Everyone is welcome to join. Feel free to add items to the
 agenda
   for
the next meeting here

   
  
 
 https://docs.google.com/document/d/153CUCj5LOJCFAVpdDZC7COJDwKh9RDjxaTA0S7lzwDA/edit?usp=sharing
.
   
Cheers,
   
On Thu, Mar 5, 2015 at 11:23 AM, Vinod Kone via Blue Jeans
 Network 
inv...@bluejeans.com wrote:
   
   [image: Blue Jeans] http://bluejeans.com   Vinod Kone
vi...@twitter.com has invited you to a video meeting.
  Meeting
Title: Apache Mesos Community Sync
 Meeting Time: Every 4th week on Thursday • from March 5, 2015 •
 3
  p.m.
PST / 2 hrs  Join Meeting

   
  
 https://bluejeans.com/272369669?ll=eng=mrsxmqdnmvzw64zomfygcy3imuxg64th
  
--
 Connecting directly from a room system?
   
1) Dial: 199.48.152.152 or bjn.vc
2) Enter Meeting ID: 272369669 -or- use the pairing code
   
   
Just want to dial in? (all numbers 
   http://bluejeans.com/premium-numbers
)
1) Direct-dial with my iPhone +14087407256,,#272369669%23,%23
 or
+1 408 740 7256 +1%20408%20740%207256+1 408 740 7256
+1 888 240 2560 +1%20888%20240%202560+1 888 240 2560 (US Toll
  Free)
+1 408 317 9253 +1%20408%20317%209253+1 408 317 9253
 (Alternate
   Number)
   
2) Enter Meeting ID: 272369669
   
 --
 Description:
We will try BlueJeans VC this time for our monthly community
 sync.
   
If BlueJeans *doesn't* work out we will use the Google Hangout
 link
(https://plus.google.com/hangouts/_/twitter.com/mesos-sync)
 instead.
*Note:* No moderator is required to start this 

Re: SEGV in 'make check'

2015-05-04 Thread Joris Van Remoortere
If you do have perf installed, are you running on a VM that is not exposing
the `cycles` and `task-clock` events?

On Thu, Apr 30, 2015 at 2:11 PM, Benjamin Mahler benjamin.mah...@gmail.com
wrote:

 This message can be a bit misleading, do you have perf installed?

 On Thu, Apr 30, 2015 at 11:18 AM, Brian Topping brian.topp...@gmail.com
 wrote:

 Getting closer. After finding
 http://garyzhu.net/notes/CentOS7-Systemd-Mesos-Marathon.html, I set up
 another new CentOS 7 machine, got a lot further on the compile this time,
 symbols too! This is with 0.22.1-RC6, CentOS Linux release 7.1.1503,
 kernel 3.10.0-229.1.2.el7.x86_64.

 Output from the last test during a make check.

 [--] 1 test from PerfEventIsolatorTest
 [ RUN  ] PerfEventIsolatorTest.ROOT_CGROUPS_Sample
 F0430 14:03:34.169455 13504 isolator_tests.cpp:710] CHECK_SOME(isolator):
 Failed to create PerfEvent isolator, invalid events: { cycles, task-clock }
 *** Check failure stack trace: ***
 @ 0x7f4c7ecea4ca  google::LogMessage::Fail()
 @ 0x7f4c7ecea429  google::LogMessage::SendToLog()
 @ 0x7f4c7ece9e3a  google::LogMessage::Flush()
 @ 0x7f4c7ececb6e  google::LogMessageFatal::~LogMessageFatal()
 @   0xa265b2  _CheckFatal::~_CheckFatal()
 @
   0xc93e88  
 mesos::internal::tests::PerfEventIsolatorTest_ROOT_CGROUPS_Sample_Test::TestBody()
 @
   0x1135c4f  testing::internal::HandleSehExceptionsInMethodIfSupported()
 @
   0x1130e0a  testing::internal::HandleExceptionsInMethodIfSupported()
 @  0x1119233  testing::Test::Run()
 @  0x1119956  testing::TestInfo::Run()
 @  0x1119ede  testing::TestCase::Run()
 @  0x111ec5a  testing::internal::UnitTestImpl::RunAllTests()
 @
   0x1136ac1  testing::internal::HandleSehExceptionsInMethodIfSupported()
 @
   0x1131afb  testing::internal::HandleExceptionsInMethodIfSupported()
 @  0x111db0a  testing::UnitTest::Run()
 @   0xd28155  main
 @ 0x7f4c7a741af5  __libc_start_main
 @   0x8fae89  (unknown)


 Full results and machine configuration at 
 *https://gist.github.com/briantopping/ac4f320bcc24e14328cd
 https://gist.github.com/briantopping/ac4f320bcc24e14328cd.*

 Not sure where to go with this, any insight appreciated!


 On Apr 30, 2015, at 11:33 PM, Brian Topping brian.topp...@gmail.com
 wrote:

 Also, I just checked this in the 0.22.1-RC6 and had the same problem.

 On Apr 30, 2015, at 9:27 PM, Brian Topping brian.topp...@gmail.com
 wrote:

 Greetings all, I'm having a problem with my first attempts building
 Mesos. I started the other day with CentOS 7 and quickly realized it was
 far better to be using 6.6. I've created a machine, but it's crashing in
 'make check'.

 Till Toenshoff was kind enough to give me some leads in JIRA on what to
 do next, but it didn't change stack trace to include symbols.

 https://gist.github.com/briantopping/51197bad452dd3b3277c has the dump
 of what I've done, the first file shows everything done in an empty build
 directory and the crash at the end. The second file there is a dump of the
 machine configuration -- the uname output, /proc/cpuinfo and all the
 installed RPM packages.

 I guess the first question is why didn't the ../configure
 --enable-debug work to generate the proper symbolics on the stack trace
 generated? Anyone have suggestions on what I can try?

 Cheers, Brian







Re: Changing Mesos Minimum Compiler Version

2015-04-21 Thread Joris Van Remoortere
Re: GCC 5.x, specifically section [2]
https://gcc.gnu.org/gcc-5/changes.html#offload

Although these changes are great, I'm not sure we currently need them for
Mesos itself.
I agree with you that they could make lots of frameworks rock, and I don't
think the gcc version for mesos prevents that:

   - We use protobufs to communicate between services which allows:
   - Executors can be compiled using a totally different compiler from
   Apache Mesos
   - Frameworks can be compiled using a totally different compiler from
   Apache Mesos
   - This means you can have a super optimized custom executor that takes
   advantage of all the benefits of GCC 5.X running on a Mesos built on GCC
   4.8 or Clang 3.5!

Hopefully this clarifies why this is not actually crucial, and why you
won't be missing out on any benefits!

Joris

On Tue, Apr 21, 2015 at 10:27 AM, Cody Maloney c...@mesosphere.io wrote:

 The main holdup at the moment is simply cycles I have to convert our
 internal infrastructure for packaging mesos on all the distributions to use
 newer compilers on all those distributions. I want the infrastructure for
 supporting the change in place before we make it. I have about 1/3 of the
 work done (Can build on all distros except Debain Wheezy). I've gotta add
 the packaging steps (Shouldn't be too bad), and some glue code still
 though.

 On Tue, Apr 21, 2015 at 3:07 AM, Alex Rukletsov a...@mesosphere.com
 wrote:

  Folks, let's summarize and move on here.
 
  Proposal out on April 9, 2015. Current status (as of April 21, 2015):
 
 
  +1 (Binding)
  --
  Vinod Kone
  Timothy Chen
  Yan Xu
  Brenden Matthews
 
  +1 (Non-binding)
  --
  Cody Maloney
  Joris Van Remoortere
  Jeff Schroeder
  Jörg Schad
  Elizabeth Lingg
  Alexander Rojas
  Alex Rukletsov
  Michael Park
  Haosdent Huang
  Bernd Mathiske
 
  0 (Non-binding)
  --
  Nikolaos Ballas
 
  There were no -1 votes.
 
  Cody, let's convert MESOS-2604 to an epic and bump the version in 0.23.
 
  Thanks,
  Alex
 
 
  On Mon, Apr 13, 2015 at 12:46 PM, Bernd Mathiske be...@mesosphere.io
  wrote:
 
  +1
 
   On Apr 10, 2015, at 6:02 PM, Michael Park mcyp...@gmail.com wrote:
  
   +1
  
   On 9 April 2015 at 17:33, Alexander Gallego agall...@concord.io
  wrote:
  
   This is amazing for native devs/frameworks.
  
   Sent from my iPhone
  
   On Apr 9, 2015, at 5:16 PM, Joris Van Remoortere 
 jo...@mesosphere.io
  
   wrote:
  
   +1
  
   On Thu, Apr 9, 2015 at 2:14 PM, Cody Maloney c...@mesosphere.io
   wrote:
   As discussed in the last community meeting, we'd like to bump the
   minimum required compiler version from GCC 4.4 to GCC 4.8.
  
   The overall goals are to make Mesos development safer, faster, and
   reduce the maintenance burden. Currently a lot of stout has different
   codepaths for Pre-C++11 and Post-C++11compilers.
  
   Progress will be tracked in the JIRA: MESOS-2604
  
   The resulting supported compiler versions will be:
   GCC 4.8, GCC 4.9
   Clang 3.5, Clang 3.6
  
   For reference
   Compilers by Distribution Version: http://goo.gl/p1t1ls
  
   C++11 features supported by each compiler:
   https://gcc.gnu.org/projects/cxx0x.html
   http://clang.llvm.org/cxx_status.html