Re: [mesos-mail] Re: Update the *Minimum Linux Kernel version* supported on Mesos

2018-04-10 Thread Gilbert Song
Hi all, FYI we landed this patch  to
avoid nested freezer cgroup support check for old kernel versions. Please
reply to this thread if you had concerns about this update.

- Gilbert

On Sun, Apr 8, 2018 at 12:18 AM, Alex Rukletsov  wrote:

> This does not seem to me as a disruptive change, so I'm +1.
>
> On Thu, Apr 5, 2018 at 6:36 PM, Jie Yu  wrote:
>
>> User namespaces require >= 3.12 (November 2013). Can we make that the
>>> minimum?
>>
>>
>> No, we need to support CentOS7 which uses 3.10 (some variant)
>>
>> - Jie
>>
>> On Thu, Apr 5, 2018 at 8:56 AM, James Peach  wrote:
>>
>>>
>>>
>>> > On Apr 5, 2018, at 5:00 AM, Andrei Budnik 
>>> wrote:
>>> >
>>> > Hi All,
>>> >
>>> > We would like to update minimum supported Linux kernel from 2.6.23 to
>>> > 2.6.28.
>>> > Linux kernel supports cgroups v1 starting from 2.6.24, but `freezer`
>>> cgroup
>>> > functionality was merged into 2.6.28, which supports nested containers.
>>>
>>> User namespaces require >= 3.12 (November 2013). Can we make that the
>>> minimum?
>>>
>>> J
>>
>>
>>
> --
> You received this message because you are subscribed to the Google Groups
> "Apache Mesos Mail Lists" group.
> Visit this group at https://groups.google.com/a/mesosphere.io/group/mesos-
> mail/.
> For more options, visit https://groups.google.com/a/mesosphere.io/d/optout
> .
>


Re: Volume ownership and permission

2018-04-10 Thread Qian Zhang
Hi Marc,

I have shared the design doc to ensure anyone (no sign-in required) with
the link can comment, can you try again?



Regards,
Qian Zhang

On Tue, Apr 10, 2018 at 1:04 PM, Marc Roos  wrote:

>
> Cannot access it
>
>
>
> -Original Message-
> From: Qian Zhang [mailto:zhq527...@gmail.com]
> Sent: dinsdag 10 april 2018 12:16
> To: mesos; user
> Subject: Volume ownership and permission
>
> Hi Folks,
>
> I am working on MESOS-8767
>   to improve Mesos
> volume support regarding volume ownership and permission, here is the
> design doc  v4KWwjmnCR0l8V4Tq2U/edit?usp=sharing>
> . Please feel free to let me know if you have any comments/feedbacks,
> you can reply this mail or comment on the design doc directly. Thanks!
>
>
> Regards,
> Qian Zhang
>
>
>


Proposal: Constrained upgrades from Mesos 1.6

2018-04-10 Thread Greg Mann
Hi all,
We are currently working on patches to implement the new GROW_VOLUME and
SHRINK_VOLUME operations [1]. In order to make it into Mesos 1.6, we're
pursuing a workaround which affects the way these operations are accounted
for in the Mesos master. These operations will be marked as *experimental* in
Mesos 1.6.

As a result of this workaround, upgrades from Mesos 1.6 to later versions
would be affected. Specifically, 1.6 masters would not be able to properly
account for the resources of failed GROW/SHRINK operations on 1.7+ agents.
This means that when upgrading from Mesos 1.6, if GROW_VOLUME or
SHRINK_VOLUME operations are being used during the upgrade, the masters
*must* be upgraded first. If we follow this proposal, this constraint would
be clearly spelled out in our upgrade documentation.

Since, in general, we guarantee compatibility between Mesos masters and
agents of the same major version, we wanted to check with the community to
see if this constraint on 1.6 upgrades would be acceptable. Please let us
know what you think!

Cheers,
Greg


[1] https://issues.apache.org/jira/browse/MESOS-4965


Re: Reason of cascaded kill in a group

2018-04-10 Thread Zhitao Li
Hi Benjamin,

Yes that's what I meant: adding a new reason for such cascaded kill.

On Tue, Apr 10, 2018 at 1:17 PM, Benjamin Mahler  wrote:

> Are you saying that there was no reason previously, and there would be a
> reason after the change? If so, adding a reason where one did not exist is
> safe from a backwards compatibility perspective.
>
> On Mon, Apr 9, 2018 at 10:32 AM, Zhitao Li  wrote:
>
>> Hi,
>>
>> We are considering adding a new reason to StatusUpdate::Reason
>> ,
>> to reflect the case when a task in a task group is killed cascaded:
>>
>> Currently, if a task fails in a task group, other active tasks in the
>> same group will see *TASK_KILLED* without any custom reason. We would
>> like to provide a custom reason like *REASON_TASK_GROUP_KILLED* to
>> distinguish whether the task is killed upon request of scheduler or upon a
>> cascaded failure.
>>
>>
>> Question to framework maintainer: does any framework depends the value
>> of this reason? If not, we probably can just change the reason without a
>> opt-in mechanism from framework (i.e, a new framework capability).
>>
>> Please let me know if your framework as such a dependency.
>>
>> Thanks!
>>
>>
>> --
>> Cheers,
>>
>> Zhitao Li
>>
>
>


-- 
Cheers,

Zhitao Li


Re: Reason of cascaded kill in a group

2018-04-10 Thread Benjamin Mahler
Are you saying that there was no reason previously, and there would be a
reason after the change? If so, adding a reason where one did not exist is
safe from a backwards compatibility perspective.

On Mon, Apr 9, 2018 at 10:32 AM, Zhitao Li  wrote:

> Hi,
>
> We are considering adding a new reason to StatusUpdate::Reason
> ,
> to reflect the case when a task in a task group is killed cascaded:
>
> Currently, if a task fails in a task group, other active tasks in the same
> group will see *TASK_KILLED* without any custom reason. We would like to
> provide a custom reason like *REASON_TASK_GROUP_KILLED* to distinguish
> whether the task is killed upon request of scheduler or upon a cascaded
> failure.
>
>
> Question to framework maintainer: does any framework depends the value of
> this reason? If not, we probably can just change the reason without a
> opt-in mechanism from framework (i.e, a new framework capability).
>
> Please let me know if your framework as such a dependency.
>
> Thanks!
>
>
> --
> Cheers,
>
> Zhitao Li
>


Re: Troubleshooting Mesos SSL setup

2018-04-10 Thread Benjamin Mahler
Are there bugs here? Is there anything that mesos could have logged /
handled better?

On Fri, Mar 16, 2018 at 11:46 AM, Renan DelValle 
wrote:

> Follow up,  we weren't able to get our wildcard certificate working but we
> did get it to work when we used a certificate for a single hostname.
>
> Also our hostname was too long (over 64 bytes).
>
> Hope that helps someone else who runs into this issue.
>
> -Renan
>
> On Fri, Mar 16, 2018 at 10:36 AM, Renan DelValle  > wrote:
>
>> Hi all,
>>
>> We're trying to set up Mesos with SSL. We've compiled Mesos with SSL
>> support and deployed it to the right boxes.
>>
>> Unfortunately, after setting up all the correct environmental variables,
>> we get the following error:
>>
>> I0315 17:48:30.54186520 libevent_ssl_socket.cpp:1105] Could not
>>> determine hostname of peer: Unknown error
>>> I0315 17:48:30.54193720 libevent_ssl_socket.cpp:1120] Failed accept,
>>> verification error: Cannot verify peer certificate: peer hostname unknown
>>> * GnuTLS recv error (-110): The TLS connection was non-properly
>>> terminated.
>>> * Closing connection 0
>>> curl: (56) GnuTLS recv error (-110): The TLS connection was non-properly
>>> terminated.
>>
>>
>> Any chance someone knows what these errors mean and how we can fix the
>> underlying issue?
>>
>> Thanks!
>>
>> -Renan
>>
>
>


Re: Release policy and 1.6 release schedule

2018-04-10 Thread Greg Mann
Thanks for the reviews, y'all! I've got a few "Ship-Its" - I'll commit this
later today unless I hear any objections.

Cheers,
Greg

On Wed, Apr 4, 2018 at 11:49 AM, Greg Mann  wrote:

> Hey folks,
> I've posted a proposed update to our documented release schedule:
> https://reviews.apache.org/r/66454/
>
> Please take a look and comment!
>
> Cheers,
> Greg
>
>
> On Mon, Mar 26, 2018 at 11:34 AM, Greg Mann  wrote:
>
>> +1 for quarterly. I would also say that we should support 3 releases at
>> any given time, regardless of the duration that implies. If there are no
>> objections, I'll submit a patch to update our docs to this effect. I think
>> that slowing down our documented cadence a bit will give us a chance to
>> faithfully adhere to our stated policy.
>>
>> Alex, I agree that releasing monthly would be great if we had better
>> automation. This is something we can work toward in the future I hope :)
>>
>> Cheers,
>> Greg
>>
>> On Mon, Mar 26, 2018 at 6:49 AM, Alex Rukletsov 
>> wrote:
>>
>>> I would like us to do monthly releases and support 10 branches at a time.
>>> Ideally, releasing that often reduces the burden for the release manager,
>>> because there are less changes and less new features. However, we lack
>>> automation to support this pace: our release guide [1] is several pages
>>> long and includes quite a few non-trivial steps. It would be great to
>>> find
>>> some time (maybe during the next Mesos hackathon?) and revisit our
>>> release
>>> procedures, but until then I'm +1 for quarterly.
>>>
>>> [1] https://mesos.apache.org/documentation/latest/release-guide/
>>>
>>> On Sat, Mar 24, 2018 at 5:48 AM, Vinod Kone  wrote:
>>>
>>> > I’m +1 for quarterly.
>>> >
>>> > Most importantly I want us to adhere to a predictable cadence.
>>> >
>>> > Sent from my phone
>>> >
>>> > On Mar 23, 2018, at 9:21 PM, Jie Yu  wrote:
>>> >
>>> > It's a burden for supporting multiple releases.
>>> >
>>> > 1.2 was released March, 2017 (1 year ago), and I know that some users
>>> are
>>> > still on that version
>>> > 1.3 was released June, 2017 (9 months ago), and we're still
>>> maintaining it
>>> > (still backport patches
>>> > >> 2660eef6f6940128c106> several
>>> > days ago, which some users asked)
>>> > 1.4 was released Sept, 2017 (6 months ago).
>>> > 1.5 was released Feb, 2018 (1 month ago).
>>> >
>>> > As you can see, users expect a release to be supported 6-9 months
>>> (e.g.,
>>> > backports are still needed for 1.3 release, which is 9 months old). If
>>> we
>>> > were to do monthly minor release, we'll probably need to maintain 6-9
>>> > release branches? That's too much of an ask for committers and
>>> maintainers.
>>> >
>>> > I also agree with folks that there're benefits doing releases more
>>> > frequently. Given the historical data, I'd suggest we do quarterly
>>> > releases, and maintain three release branches.
>>> >
>>> > - Jie
>>> >
>>> > On Fri, Mar 23, 2018 at 10:03 AM, Greg Mann 
>>> wrote:
>>> >
>>> >> The best motivation I can think of for a shorter release cycle is
>>> this: if
>>> >> the release cadence is fast enough, then developers will be less
>>> likely to
>>> >> rush a feature into a release. I think this would be a real benefit,
>>> since
>>> >> rushing features in hurts stability. *However*, I'm not sure if every
>>> two
>>> >> months is fast enough to bring this benefit. I would imagine that a
>>> >> two-month wait is still long enough that people wouldn't want to wait
>>> an
>>> >> entire release cycle to land their feature. Just off the top of my
>>> head, I
>>> >> might guess that a release cadence of 1 month or shorter would be
>>> often
>>> >> enough that it would always seem reasonable for a developer to wait
>>> until
>>> >> the next release to land a feature. What do y'all think?
>>> >>
>>> >> Other motivating factors that have been raised are:
>>> >> 1) Many users upgrade on a longer timescale than every ~2 months. I
>>> think
>>> >> that this doesn't need to affect our decision regarding release
>>> timing -
>>> >> since we guarantee compatibility of all releases with the same major
>>> >> version number, there is no reason that a user needs to upgrade minor
>>> >> releases one at a time. It's fine to go from 1.N to 1.(N+3), for
>>> example.
>>> >> 2) Backporting will be a burden if releases are too short. I think
>>> that in
>>> >> practice, backporting will not take too much longer. If there was a
>>> >> conflict back in the tree somewhere, then it's likely that after
>>> resolving
>>> >> that conflict once, the same diff can be used to backport the change
>>> to
>>> >> previous releases as well.
>>> >> 3) Adhering strictly to a time-based release schedule will help users
>>> plan
>>> >> their deployments, since they'll be able to rely on features being
>>> >> released
>>> >> on-schedule. However, if we do strict time-based releases, then it
>>> will be
>>> >> less certain that a particular feature will land 

[GSoC] Google Summer of Code

2018-04-10 Thread Tomek Janiszewski
Hi

It looks like Apache Foundation was selected to Google Summer of Code
https://summerofcode.withgoogle.com/organizations/5718432427802624/
Do we plan to submit any project related to Mesos. I was thinking about a
project to refresh Mesos UI (catch up with features, upgrade to latest
Angular (or rethink the framework)).
What do you think?

Best
Tomek


Volume ownership and permission

2018-04-10 Thread Qian Zhang
Hi Folks,

I am working on MESOS-8767
 to improve Mesos volume
support regarding volume ownership and permission, here is the design doc
.
Please feel free to let me know if you have any comments/feedbacks, you can
reply this mail or comment on the design doc directly. Thanks!


Regards,
Qian Zhang