[GitHub] mesos issue #303: Expose new metrics for memory usage in the container.

2018-07-20 Thread jieyu
Github user jieyu commented on the issue:

https://github.com/apache/mesos/pull/303
  
@fcuny We prefer reviewboard for non trivial changes.


---


[GitHub] mesos issue #303: Expose new metrics for memory usage in the container.

2018-07-20 Thread fcuny
Github user fcuny commented on the issue:

https://github.com/apache/mesos/pull/303
  
That sounds reasonable. I'll get to update the review with this.

Should I move the review to reviewboard or keep iterating here ?


---


[GitHub] mesos issue #303: Expose new metrics for memory usage in the container.

2018-07-20 Thread jieyu
Github user jieyu commented on the issue:

https://github.com/apache/mesos/pull/303
  
Looks like on CentOS 7 (3.10.0-693.5.2.el7.x86_64). If kmem accounting is 
not enabled, the `memory.kmem.limit_in_bytes` will always show 0. And reading 
`memory.kmem.slabinfo` will give `Input/output error`

I think we should probably add an agent flag to control the enabling of 
kmem accounting feature, and only report stats if kmem accounting is enabled.


---


Re: Operations Working Group - First Meeting

2018-07-20 Thread Zhitao Li
Please count me on. Looking forward to it.

Sent from my iPhone

> On Jul 20, 2018, at 4:05 PM, Gastón Kleiman  wrote:
> 
> Hi Abel,
> 
> I would love to learn more from people operating Mesos clusters of any
> size. We can discuss what is working great, what is on the roadmap, and
> what could be improved.
> 
> Some of us have been working on adding new per-framework metrics and extra
> logging to the Mesos master - I think that an operator would find them
> valuable to monitor/debug/troubleshoot a Mesos cluster, so I could also
> talk a bit about that.
> 
> The agenda (
> https://docs.google.com/document/d/1XjJfoksz70vbTvvT6FQ1t_J0SD1XIoipmYSvEHJfXt8/edit
> ) is still open and editable. I want to encourage everyone to add there the
> topics that interest you!
> 
> Looking forward to meeting you all over Zoom next week,
> 
> -Gastón
> 
>> On Tue, Jul 17, 2018 at 2:55 AM Abel Souza  wrote:
>> 
>> Thank you for setting this up Gaston,
>> 
>> Would you mind giving us a brief of what you have in mind for discussion?
>> 
>> Thank you,
>> 
>> Abel
>> 
>> On 07/17/2018 10:52 AM, Matt Jarvis wrote:
>> 
>> That's great news Gaston ! Let me know if you need any help from the
>> Community team.
>> 
>> Matt
>> 
>>> On Tue, 17 Jul 2018, 05:04 Gastón Kleiman,  wrote:
>>> 
>>> Hi all,
>>> 
>>> Thank you for responding to my previous emails - I think that we have
>>> quorum for a new working group!
>>> 
>>> Many of you who have expressed interest seem to be in Europe, so I tried
>>> schedule the first meeting at a time that I hope will be friendly for
>>> people in both GMT+1 and GMT-8:
>>> 
>>> *Date:* Wednesday July 25th from 9:00-10:00 AM PDT
>>> *Agenda:*
>>> https://docs.google.com/document/d/1XjJfoksz70vbTvvT6FQ1t_J0SD1XIoipmYSvEHJfXt8/
>>> *Zoom URL:* https://zoom.us/j/310132146
>>> 
>>> 
>>> You can also find the event in the Mesos Community Calendar.
>>> 
>>> Feel free to add more topics to the agenda.
>>> 
>>> Looking forward to meeting you all next week,
>>> 
>>> -Gastón
>>> 
>> 
>> 


[GitHub] mesos issue #303: Expose new metrics for memory usage in the container.

2018-07-20 Thread jieyu
Github user jieyu commented on the issue:

https://github.com/apache/mesos/pull/303
  
I am not sure if the kernel behavior has changed or not in newer kernels. 
For example:

https://github.com/opencontainers/runc/blob/7139b61f7fdb904d0acb8db825709aa8d2d2ef36/libcontainer/cgroups/fs/memory.go#L70

You'll have to write `memory.kmem.limit_in_bytes` to enable kmem accounting

So if kmem accounting is not enabled, i don't know what will happen if you 
read the data from `memory.kmem.usage_in_bytes`. Let me do some testing on my 
CentOS 7 default kernel (3.10)


---


[GitHub] mesos issue #303: Expose new metrics for memory usage in the container.

2018-07-20 Thread fcuny
Github user fcuny commented on the issue:

https://github.com/apache/mesos/pull/303
  
We've been running with this patch at Twitter since February and we're 
getting kmem metrics. 

The mentioned bug is interesting, we've only been running with 4.9 and 
later.



---


Re: Operations Working Group - First Meeting

2018-07-20 Thread Gastón Kleiman
Hi Abel,

I would love to learn more from people operating Mesos clusters of any
size. We can discuss what is working great, what is on the roadmap, and
what could be improved.

Some of us have been working on adding new per-framework metrics and extra
logging to the Mesos master - I think that an operator would find them
valuable to monitor/debug/troubleshoot a Mesos cluster, so I could also
talk a bit about that.

The agenda (
https://docs.google.com/document/d/1XjJfoksz70vbTvvT6FQ1t_J0SD1XIoipmYSvEHJfXt8/edit
) is still open and editable. I want to encourage everyone to add there the
topics that interest you!

Looking forward to meeting you all over Zoom next week,

-Gastón

On Tue, Jul 17, 2018 at 2:55 AM Abel Souza  wrote:

> Thank you for setting this up Gaston,
>
> Would you mind giving us a brief of what you have in mind for discussion?
>
> Thank you,
>
> Abel
>
> On 07/17/2018 10:52 AM, Matt Jarvis wrote:
>
> That's great news Gaston ! Let me know if you need any help from the
> Community team.
>
> Matt
>
> On Tue, 17 Jul 2018, 05:04 Gastón Kleiman,  wrote:
>
>> Hi all,
>>
>> Thank you for responding to my previous emails - I think that we have
>> quorum for a new working group!
>>
>> Many of you who have expressed interest seem to be in Europe, so I tried
>> schedule the first meeting at a time that I hope will be friendly for
>> people in both GMT+1 and GMT-8:
>>
>> *Date:* Wednesday July 25th from 9:00-10:00 AM PDT
>> *Agenda:*
>> https://docs.google.com/document/d/1XjJfoksz70vbTvvT6FQ1t_J0SD1XIoipmYSvEHJfXt8/
>> *Zoom URL:* https://zoom.us/j/310132146
>> 
>>
>> You can also find the event in the Mesos Community Calendar.
>>
>> Feel free to add more topics to the agenda.
>>
>> Looking forward to meeting you all next week,
>>
>> -Gastón
>>
>
>


[GitHub] mesos issue #303: Expose new metrics for memory usage in the container.

2018-07-20 Thread jieyu
Github user jieyu commented on the issue:

https://github.com/apache/mesos/pull/303
  
I don't think we turn on kmem accounting yet in Mesos containerizer.

I remember in old kernels, you need to set kmem.limit_in_bytes once to 
enable kmem accounting. I am not sure about the new kernel behavior.

Also, we need to be careful on bugs like this on old kernels if we turn on 
kmem accounting
https://github.com/opencontainers/runc/issues/1725




---


[GitHub] mesos pull request #303: Expose new metrics for memory usage in the containe...

2018-07-20 Thread fcuny
GitHub user fcuny opened a pull request:

https://github.com/apache/mesos/pull/303

Expose new metrics for memory usage in the container.

The metric "mem_kmem_usage_bytes" is the total kernel memory usage by
processes in the cgroup in bytes.

The metric "mem_kmem_tcp_usage_bytes" is the total memory usage for TCP
buffers in bytes.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/fcuny/mesos fcuny/MESOS-9102

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/mesos/pull/303.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #303


commit 5747f558fb487da1340787618285722d5cbcca50
Author: Franck Cuny 
Date:   2018-07-20T22:46:46Z

Expose new metrics for memory usage in the cgroup.

The metric "mem_kmem_usage_bytes" is the total kernel memory usage by
processes in the cgroup in bytes.

The metric "mem_kmem_tcp_usage_bytes" is the total memory usage for TCP
buffers in bytes.




---


[GitHub] mesos pull request #301: Document SUPPRESS HTTP call [MESOS-7211]

2018-07-20 Thread vinodkone
Github user vinodkone commented on a diff in the pull request:

https://github.com/apache/mesos/pull/301#discussion_r204166164
  
--- Diff: docs/scheduler-http-api.md ---
@@ -479,6 +479,32 @@ HTTP/1.1 202 Accepted
 
 ```
 
+### SUPPRESS
+Sent by the scheduler when it doesn't need offers for a given set of its 
roles. When Mesos master receives this request, it will stop sending offers for 
the given set of roles to the framework. As a special case, if roles are not 
specified, all subscribed roles of this framework are suppressed.
+
+Note that master continues to send offers to other subscribed roles of 
this framework that are not suppressed. Also, status updates about tasks, 
executors and agents are not affected by this call and tasks will continue 
running for `FrameworkInfo.failover_timeout`.
--- End diff --

I would remove `...and tasks will continue running for 
`FrameworkInfo.failover_timeout`` part. That timeout doesn't come into play at 
all during suppression. That will only come into play when a framework 
disconnects.


---


[GitHub] mesos pull request #301: Document SUPPRESS HTTP call [MESOS-7211]

2018-07-20 Thread vinodkone
Github user vinodkone commented on a diff in the pull request:

https://github.com/apache/mesos/pull/301#discussion_r204167169
  
--- Diff: docs/scheduler-http-api.md ---
@@ -479,6 +479,32 @@ HTTP/1.1 202 Accepted
 
 ```
 
+### SUPPRESS
+Sent by the scheduler when it doesn't need offers for a given set of its 
roles. When Mesos master receives this request, it will stop sending offers for 
the given set of roles to the framework. As a special case, if roles are not 
specified, all subscribed roles of this framework are suppressed.
+
+Note that master continues to send offers to other subscribed roles of 
this framework that are not suppressed. Also, status updates about tasks, 
executors and agents are not affected by this call and tasks will continue 
running for `FrameworkInfo.failover_timeout`.
+
+If the scheduler wishes to receive offers for the suppressed roles again 
(e.g., it needs to schedule new workloads), it can send `REVIVE` call.
+
+```
+SUPPRESS Request (JSON):
+POST /api/v1/scheduler  HTTP/1.1
+
+Host: masterhost:5050
+Content-Type: application/json
+Mesos-Stream-Id: 130ae4e3-6b13-4ef4-baa9-9f2e85c3e9af
+
+{
+  "framework_id" : {"value" : "12220-3440-12532-2345"},
+  "type" : "SUPPRESS",
+  "suppress" : {"role": }
--- End diff --

This should be `roles` and not `role` since Mesos at least 1.3.0!  And 
`roles` takes an array of strings (i.e., roles) and not a single role.


---


Re: [VOTE] Release Apache Mesos 1.3.3 (rc1)

2018-07-20 Thread Alex Rukletsov
MPark—

what's the decision regarding the 1.3.3 release?

On Mon, Jul 9, 2018 at 8:52 PM, Michael Park  wrote:

> I'm considering simply abandoning the 1.3.3 release and bringing the 1.3.x
> branch to end of life.
> If anyone really wants a 1.3.3, I'm certainly willing to finish the
> release portion of this
> but I don't have time to dig into the CI issue that Vinod pointed out. If
> someone feels compelled
> to investigate the issue and wants 1.3.3 released, please speak up.
>
> I'll wait for some time (say, a week) to gauge the interest and take
> corresponding action.
>
> Thanks,
>
> MPark
>
> On Thu, May 31, 2018 at 11:55 AM Vinod Kone  wrote:
>
>> -1 (binding).
>>
>>
>> Ran it in ASF CI and found an issue worth investigating. Other 3 issues
>> looks to be related to known flaky tests and/or known core dump issue (that
>> has been fixed in later versions).
>>
>> *Revision*: c78e56e4ea217878dd604de638623be166a18db0
>>
>>- refs/tags/1.3.3-rc1
>>
>> Configuration Matrix gcc clang
>> centos:7 --verbose --enable-libevent --enable-ssl autotools
>> [image: Failed]
>> 
>> [image: Not run]
>> cmake
>> [image: Success]
>> 
>> [image: Not run]
>> --verbose autotools
>> [image: Success]
>> 
>> [image: Not run]
>> cmake
>> [image: Success]
>> 
>> [image: Not run]
>> ubuntu:14.04 --verbose --enable-libevent --enable-ssl autotools
>> [image: Success]
>> 
>> [image: Failed]
>> 
>> cmake
>> [image: Success]
>> 
>> [image: Failed]
>> 
>> --verbose autotools
>> [image: Failed]
>> 
>> [image: Success]
>> 
>> cmake
>> [image: Success]
>> 
>> [image: Success]
>> 
>>
>>
>> 1) Segfault in HTTP Test.
>> 

[RESULT] [VOTE] Move the project repos to gitbox

2018-07-20 Thread Vinod Kone
Hi,

This vote has passed with 7 +1s and no 0s or -1s!

+1 (binding)
-
Vinod Kone
James Peach
Zhitao Li
Andrew Schwartzmeyer
Jie Yu
Greg Mann
Gaston Kleiman

I'll file an INFRA ticket to get the process in motion.

Thanks,
Vinod


On Tue, Jul 17, 2018 at 8:27 PM Gastón Kleiman  wrote:

> On Tue, Jul 17, 2018 at 7:59 AM Vinod Kone  wrote:
>
>> Hi,
>>
>> As discussed in another thread and in the committers sync, there seem to
>> be
>> heavy interest in moving our project repos ("mesos", "mesos-site") from
>> the
>> "git-wip" git server to the new "gitbox" server to better avail GitHub
>> integrations.
>>
>> Please vote +1, 0, -1 regarding the move to gitbox. The vote will close in
>> 3 business days.
>>
>
> +1
>