Feb 21 Performance WG Meeting Canceled

2018-02-20 Thread Benjamin Mahler
Hi folks, since there's nothing on the agenda for this month's meeting. I
will cancel it and plan to meet next month. If there are any topics folks
would like to discuss let me know and we can schedule one sooner!


Re: Surfacing additional issues on agent host to schedulers

2018-02-20 Thread James Peach

> On Feb 20, 2018, at 11:11 AM, Zhitao Li  wrote:
> 
> Hi,
> 
> In one of recent Mesos meet up, quite a couple of cluster operators had
> expressed complaints that it is hard to model host issues with Mesos at the
> moment.
> 
> For example, in our environment, the only signal scheduler would know is
> whether Mesos agent has disconnected from the cluster. However, we have a
> family of other issues in real production which makes the hosts (sometimes
> "partially") unusable. Examples include:
> - traffic routing software malfunction (i.e, haproxy): Mesos agent does not
> require this so scheduler/deployment system is not aware, but actual
> workload on the cluster will fail;
> - broken disk;
> - other long running system agent issues.
> 
> This email is looking at how can Mesos recommend best practice to surface
> these issues to scheduler, and whether we need additional primitives in
> Mesos to achieve such goal.

In the K8s world the node can publish "conditions" that describe its status

https://kubernetes.io/docs/concepts/architecture/nodes/#condition

The condition can automatically taint the node, which could cause pods to 
automatically be evicted (ie. if they can't tolerate that specific taint).

J

Surfacing additional issues on agent host to schedulers

2018-02-20 Thread Zhitao Li
Hi,

In one of recent Mesos meet up, quite a couple of cluster operators had
expressed complaints that it is hard to model host issues with Mesos at the
moment.

For example, in our environment, the only signal scheduler would know is
whether Mesos agent has disconnected from the cluster. However, we have a
family of other issues in real production which makes the hosts (sometimes
"partially") unusable. Examples include:
- traffic routing software malfunction (i.e, haproxy): Mesos agent does not
require this so scheduler/deployment system is not aware, but actual
workload on the cluster will fail;
- broken disk;
- other long running system agent issues.

This email is looking at how can Mesos recommend best practice to surface
these issues to scheduler, and whether we need additional primitives in
Mesos to achieve such goal.

Any comment/suggestion/question is highly welcomed.

Thanks!

-- 
Cheers,

Zhitao Li


Re: Follow up on providing `--reconfiguration_policy=any` in future versions

2018-02-20 Thread Zhitao Li
Hi Benno,

Thanks for the the diff. I took a quick look and cannot anticipate issues
with it in our environment. I'll talk to our release manager to try it out
in our environment.

w.r.t. to the health check issue, is there meeting notes or a JIRA issue
capturing existing discussions?

On Mon, Feb 19, 2018 at 9:59 AM, Benno Evers  wrote:

> Hi Zhitao,
>
> great to see that there's interest in this.
>
> The most specific concern that we had at the time was that we were not
> sure about the best way to handle health checks on agents where
> the hostname changed. (together with a general feeling
> that we needed a bit more time to think through possible failure
> scenarios)
>
> If you're willing to blaze trail, you could apply
>
> https://reviews.apache.org/r/64384/
>
> and see if this causes any observable issues.
>
> Of course, I'm also up for a follow-up meeting.
>
> Best regards,
> Benno
>
>
> On Thu, Feb 15, 2018 at 9:03 PM, Zhitao Li  wrote:
>
> > Hi Vinod/Benno,
> >
> > This is a follow up from MESOS-1739. We have recently discovered some
> > previous unknown use cases which a fully allowed
> > `--reconfiguration_policy-any` from the design doc
> >  > KxwU4lLtr53SrE5U3Q/edit#>
> > will
> > really help our operation.
> >
> > Do we want to have a follow up meeting to see what's blockers to fully
> > implement that?
> >
> > Thanks.
> >
> >
> > --
> > Cheers,
> >
> > Zhitao Li
> >
>
>
>
> --
> Benno Evers
> Software Engineer, Mesosphere
>



-- 
Cheers,

Zhitao Li


[GitHub] mesos issue #263: Allow nested containers in pods to have separate namespace...

2018-02-20 Thread Gilbert88
Github user Gilbert88 commented on the issue:

https://github.com/apache/mesos/pull/263
  
A quick note that we could have a followup patch to add documents here: 
http://mesos.apache.org/documentation/latest/containerizer-internals/#linux-namespaces


---


Re: API working group

2018-02-20 Thread Judith Malnick
Hi everyone,

Please vote for your favorite WG times in this doodle poll
. Greg and Vinod, please let me
know ASAP if you need me to add poll options.

Best!
Judith

On Wed, Feb 14, 2018 at 6:00 PM, Greg Mann  wrote:

> Sounds good to me. Thanks, Judith!!
>
> Greg
>
> On Wed, Feb 14, 2018 at 11:14 AM, Judith Malnick 
> wrote:
>
> > I'm glad there's so much interest.
> >
> > Greg, can I put you down as the leader and Vinod can I put you down as
> the
> > co-leader? If that's ok, I'll follow up with a doodle poll to find a
> date.
> >
> > Best!
> > Judith
> >
> > On Tue, Feb 13, 2018 at 8:21 PM, Chun-Hung Hsiao 
> > wrote:
> >
> > > I'm in. Especially, I'd like to continue the work of adapting gRPC into
> > > libprocess,
> > > so we could have a gRPC-based API!
> > >
> >
> >
> >
> > --
> > Judith Malnick
> > Community Manager
> > 310-709-1517 <(310)%20709-1517>
> >
>



-- 
Judith Malnick
Community Manager
310-709-1517


Save the date: ApacheCon North America, September 24-27 in Montréal

2018-02-20 Thread Rich Bowen

Dear Apache Enthusiast,

(You’re receiving this message because you’re subscribed to a user@ or 
dev@ list of one or more Apache Software Foundation projects.)


We’re pleased to announce the upcoming ApacheCon [1] in Montréal, 
September 24-27. This event is all about you — the Apache project community.


We’ll have four tracks of technical content this time, as well as lots 
of opportunities to connect with your project community, hack on the 
code, and learn about other related (and unrelated!) projects across the 
foundation.


The Call For Papers (CFP) [2] and registration are now open. Register 
early to take advantage of the early bird prices and secure your place 
at the event hotel.


Important dates
March 30: CFP closes
April 20: CFP notifications sent
	August 24: Hotel room block closes (please do not wait until the last 
minute)


Follow @ApacheCon on Twitter to be the first to hear announcements about 
keynotes, the schedule, evening events, and everything you can expect to 
see at the event.


See you in Montréal!

Sincerely, Rich Bowen, V.P. Events,
on behalf of the entire ApacheCon team

[1] http://www.apachecon.com/acna18
[2] https://cfp.apachecon.com/conference.html?apachecon-north-america-2018


[GitHub] mesos issue #263: Allow nested containers in pods to have separate namespace...

2018-02-20 Thread jdef
Github user jdef commented on the issue:

https://github.com/apache/mesos/pull/263
  
What's the high level use case that's driving this change request? One of 
the major goals of task-groups (pods) is to allow containers to share 
networking and storage. What point is there in launching a nested container 
that DOES NOT share these things with the other containers in the pod?


---