Re: 1.0.2 release

2016-10-05 Thread Vinod Kone
Release dashboard:
https://issues.apache.org/jira/secure/Dashboard.jspa?selectPageId=12329719

I'm waiting for 2 issues to be resolved. Once that's done, I'll start
prepping the release.

On Wed, Oct 5, 2016 at 4:11 PM, Vinod Kone  wrote:

> Hi,
>
> As the Release Manager for 1.0, I'm responsible for all subsequent patch
> releases.
>
> I'm planning to cut the next patch release (1.0.2) within a week. So, if
> you have any patches that need to get into 1.0.2 make sure that either it
> is already in the 1.0.x branch or the corresponding ticket has a target
> version set to 1.0.2.
>
> I'll send a link to the release dashboard shortly.
>
> Thanks,
> -- Vinod
>


1.0.2 release

2016-10-05 Thread Vinod Kone
Hi,

As the Release Manager for 1.0, I'm responsible for all subsequent patch
releases.

I'm planning to cut the next patch release (1.0.2) within a week. So, if
you have any patches that need to get into 1.0.2 make sure that either it
is already in the 1.0.x branch or the corresponding ticket has a target
version set to 1.0.2.

I'll send a link to the release dashboard shortly.

Thanks,
-- Vinod


Re: Mesos CLI patches

2016-10-05 Thread Kevin Klues
Sorry, meant to send this to Haris directly. Please disregard.

On Wed, Oct 5, 2016 at 1:19 PM Kevin Klues  wrote:

> Hey Haris,
>
> Now that the pods rush is over, we're finally ready to start pushing the
> CLI patches through. Will you have time in the next few weeks to work in
> any comments we have on them? If not, don't worry. I can take them over s d
> make sure they get submitted with you as the author. Just need to know so
> we can plan accordingly. We want this stuff in fir the 1.1.0 release in 10
> days.
>
> Thanks!
>
> Kevin
>
>
> --
> ~Kevin
>


Re: Resource Isolation in Mesos

2016-10-05 Thread haosdent
> These flags are used in agent - cgroups_limits_swap=true
--isolation=cgroups/cpu,cgroups/mem --cgroups_hierachy=/sys/fs/c group
In agent logs I can see updated memory limit to 33MB for container.

Not sure if there are typos or not, some flags name may incorrect. Add
according to

> "mem_limit_bytes": 1107296256,

I think mesos allocated 1107296256 bytes memory (1GB) to your task instead
of 33 MB.

For the status of `mem_rss_bytes` is zero, let me describe how I test it on
my machine, maybe helpful for you to troubleshoot the problem.

```
## Start the master
sudo ./bin/mesos-master.sh --ip=111.223.45.25 --hostname=111.223.45.25
--work_dir=/tmp/mesos
## Start the agent
sudo ./bin/mesos-agent.sh --ip=111.223.45.25 --hostname=111.223.45.25
--work_dir=/tmp/mesos --master=111.223.45.25:5050
--cgroups_hierarchy=/sys/fs/cgroup --isolation=cgroups/cpu,cgroups/mem
--cgroups_limit_swap=true
## Start the task
./src/mesos-execute --master=111.223.45.25:5050 --name="test-single-1"
--command="sleep 2000"
```

Then query the `/containers` endpoint to get the container id of the task

```
$ curl 'http://111.223.45.25:5051/containers' 2>/dev/null |jq .
[
  {
"container_id": "74fea157-100f-4bf8-b0d0-b65c6e17def1",
"executor_id": "test-single-1",
"executor_name": "Command Executor (Task: test-single-1) (Command: sh
-c 'sleep 2000')",
"framework_id": "db9f43ce-0361-4c65-b42f-4dbbefa75ff8-",
"source": "test-single-1",
"statistics": {
  "cpus_limit": 1.1,
  "cpus_system_time_secs": 3.69,
  "cpus_user_time_secs": 3.1,
  "mem_anon_bytes": 9940992,
  "mem_cache_bytes": 8192,
  "mem_critical_pressure_counter": 0,
  "mem_file_bytes": 8192,
  "mem_limit_bytes": 167772160,
  "mem_low_pressure_counter": 0,
  "mem_mapped_file_bytes": 0,
  "mem_medium_pressure_counter": 0,
  "mem_rss_bytes": 9940992,
  "mem_swap_bytes": 0,
  "mem_total_bytes": 10076160,
  "mem_total_memsw_bytes": 10076160,
  "mem_unevictable_bytes": 0,
  "timestamp": 1475686847.54635
},
"status": {
  "executor_pid": 2775
}
  }
]
```

As you see above, the container id is
`74fea157-100f-4bf8-b0d0-b65c6e17def1`, so I

```
$ cat
/sys/fs/cgroup/memory/mesos/74fea157-100f-4bf8-b0d0-b65c6e17def1/memory.stat
```

Mesos get the memory statistics from this file for the task. `total_rss`
would be parsed as the `"mem_rss_bytes"` field.

```
...
hierarchical_memory_limit 167772160
hierarchical_memsw_limit 167772160
total_rss 9940992
...
```

You could check which step above is mismatch with your side and reply this
email for future discussion, the problem seems to be the
incorrect configuration or launch flags.

On Wed, Oct 5, 2016 at 8:46 PM, Srikant Kalani 
wrote:

> What i can see in http output is mem_rss_bytes is not coming on rhel7.
>
> Here is the http output :
>
> Output for Agent running on rhel7
>
> [{"container\_id":"8062e683\-204c\-40c2\-87ae\-
> fcc2c3f71b85","executor\_id":"\*\*\*\*\*","executor\_name":"Command
> Executor (Task: \*\*\*\*\*) (Command: sh \-c '\\*\*\*\*\*\*...')","
> framework\_id":"edbffd6d\-b274\-4cb1\-b386\-2362ed2af517\-","source":"
> \*\*\*\*\*","statistics":{"cpus\_limit":1.1,"cpus\_
> system\_time\_secs":0.01,"cpus\_user\_time\_secs":0.03,"
> mem\_anon\_bytes":0,"mem\_cache\_bytes":0,"mem\_
> critical\_pressure\_counter":0,"mem\_file\_bytes":0,"mem\_
> limit\_bytes":1107296256,"mem\_low\_pressure\_counter":0,"
> mem\_mapped\_file\_bytes":0,"mem\_medium\_pressure\_
> counter":0,"mem\_rss\_bytes":0,"mem\_swap\_bytes":0,"mem\_
> total\_bytes":0,"mem\_unevictable\_bytes":0,"
> timestamp":1475668277.62915},"status":{"executor\_pid":14454}}]
>
> Output for Agent running on Rhel 6
>
>   [{"container\_id":"359c0944\-c089\-4d43\-983e\-
> 1f97134fe799","executor\_id":"\*\*\*\*\*","executor\_name":"Command
> Executor (Task: \*\*\*\*\*) (Command: sh \-c '\*\*\*\*\*\*...')","
> framework\_id":"edbffd6d\-b274\-4cb1\-b386\-2362ed2af517\-0001","source":"
> \*\*\*\*\*","statistics":{"cpus\_limit":8.1,"cpus\_
> system\_time\_secs":1.92,"cpus\_user\_time\_secs":6.93,"
> mem\_limit\_bytes":1107296256,"mem\_rss\_bytes":2329763840,"
> timestamp":1475670762.73402},"status":{"executor\_pid":31577}}]
>
> Attach are UI screenshot :
> Wa002.jpg is for rhel7 and other one is rhel6.
> On 5 Oct 2016 4:55 p.m., "haosdent"  wrote:
>
>> Hi, @Srikant How about the result of http://${YOUR_AGENT_IP}:5051/containers?
>> It is wired that you could saw
>>
>> ```
>> Updated 'memory.limit_in_bytes' to xxx
>> ```
>>
>> in log as you mentioned, but `limit_in_bytes` is still the initialize
>> value as you show above.
>>
>> On Wed, Oct 5, 2016 at 2:04 PM, Srikant Kalani <
>> srikant.blackr...@gmail.com> wrote:
>>
>>> Here are the values -
>>> Memory.limit_in_bytes = 1107296256
>>> Memory.soft_limit_in_bytes=1107296256
>>> Memory.memsw.limit_in_bytes=9223372036854775807
>>>
>>> I have run the same task on mesos 1.0.1 running 

Re: Troubleshooting tasks that are stuck in the 'Staging' state

2016-10-05 Thread haosdent
> How do you typically monitor the messages between Master and Agents?
For my side, I didn't monitor this. And only check the logs when
troubleshooting some problems.
Not sure if other users or developers have tools to meet your requirement
here.

On Wed, Oct 5, 2016 at 8:16 PM, Frank Scholten 
wrote:

> Ok. How do you typically monitor the messages between Master and
> Agents? Do you have some tools for this on the cluster?
>
> On Tue, Oct 4, 2016 at 6:21 PM, haosdent  wrote:
> > Hi, @Frank Thanks for your information
> >
> >> I see messages 'Telling agent (...) to kill task (...)'. Why does this
> >> happen?
> > This should because your framework send a `KillTaskMessage` or
> > `scheduler::Call::KILL` request to the Mesos Master, then the Mesos is
> going
> > to kill your task.
> >
> >>Is this the exact text to search for or is this the name of the protobuf
> >> message? Are these logged on a higher log level?
> > it exists in the log of the agents. It looks like
> > ```
> > I1004 23:19:36.175673 45405 slave.cpp:1539] Got assigned task '1' for
> > framework e7287433-36f9-48dd-8633-8a6ac7083a43-
> > I1004 23:19:36.176206 45405 slave.cpp:1696] Launching task '1' for
> framework
> > e7287433-36f9-48dd-8633-8a6ac7083a43-
> > ```
> > Usually, you could grep your task id in the agent log to see how the task
> > failed.
> >
> >
> >
> > On Tue, Oct 4, 2016 at 8:50 PM, Frank Scholten 
> > wrote:
> >>
> >> Thanks Haosdent for your quick response.
> >>
> >> I added GLOG_v=1 to the master and agents.
> >>
> >> 1. The framework is registered. Marathon in this case.
> >> 2. I see messages 'Telling agent (...) to kill task (...)'. Why does
> >> this happen? I also see 'Sending explicit reconciliation state
> >> TASK_LOST for task fake-marathon-pacemaker-task-(...)'.
> >> 3. I searched for RunTaskMessage in the agent log but could not find
> >> it. Is this the exact text to search for or is this the name of the
> >> protobuf message? Are these logged on a higher log level?
> >>
> >> On Tue, Oct 4, 2016 at 11:22 AM, haosdent  wrote:
> >> > staging is the initialize status of the task. I think you may your
> logs
> >> > via
> >> > these steps:
> >> >
> >> > 1. If your framework registered successfully in the master?
> >> > 2. If the master send resources offers to your framework and your
> >> > framework
> >> > accept it?
> >> > 3. If your agents receive the RunTaskMessage from master to launch
> your
> >> > task?
> >> >
> >> > In additionally, use `export GLOG_v=1` before start masters and agents
> >> > may
> >> > helpful for your troubleshooting.
> >> >
> >> > On Tue, Oct 4, 2016 at 4:58 PM, Frank Scholten <
> fr...@frankscholten.nl>
> >> > wrote:
> >> >>
> >> >> Hi all,
> >> >>
> >> >> I am looking for some ways to troubleshoot or debug tasks that are
> >> >> stuck in the 'staging' state. Typically they have no logs in the
> >> >> sandbox.
> >> >>
> >> >> Are there are any endpoints or things to look for in logs to identify
> >> >> a root cause?
> >> >>
> >> >> Is there a troubleshooting guide for Mesos to solve problems like
> this?
> >> >>
> >> >> Cheers,
> >> >>
> >> >> Frank
> >> >
> >> >
> >> >
> >> >
> >> > --
> >> > Best Regards,
> >> > Haosdent Huang
> >
> >
> >
> >
> > --
> > Best Regards,
> > Haosdent Huang
>



-- 
Best Regards,
Haosdent Huang


Re: Resource Isolation in Mesos

2016-10-05 Thread Srikant Kalani
What i can see in http output is mem_rss_bytes is not coming on rhel7.

Here is the http output :

Output for Agent running on rhel7

[{"container\_id":"8062e683\-204c\-40c2\-87ae\-fcc2c3f71b85","executor\_id":"\*\*\*\*\*","executor\_name":"Command
Executor (Task: \*\*\*\*\*) (Command: sh \-c
'\\*\*\*\*\*\*...')","framework\_id":"edbffd6d\-b274\-4cb1\-b386\-2362ed2af517\-","source":"\*\*\*\*\*","statistics":{"cpus\_limit":1.1,"cpus\_system\_time\_secs":0.01,"cpus\_user\_time\_secs":0.03,"mem\_anon\_bytes":0,"mem\_cache\_bytes":0,"mem\_critical\_pressure\_counter":0,"mem\_file\_bytes":0,"mem\_limit\_bytes":1107296256,"mem\_low\_pressure\_counter":0,"mem\_mapped\_file\_bytes":0,"mem\_medium\_pressure\_counter":0,"mem\_rss\_bytes":0,"mem\_swap\_bytes":0,"mem\_total\_bytes":0,"mem\_unevictable\_bytes":0,"timestamp":1475668277.62915},"status":{"executor\_pid":14454}}]

Output for Agent running on Rhel 6


[{"container\_id":"359c0944\-c089\-4d43\-983e\-1f97134fe799","executor\_id":"\*\*\*\*\*","executor\_name":"Command
Executor (Task: \*\*\*\*\*) (Command: sh \-c
'\*\*\*\*\*\*...')","framework\_id":"edbffd6d\-b274\-4cb1\-b386\-2362ed2af517\-0001","source":"\*\*\*\*\*","statistics":{"cpus\_limit":8.1,"cpus\_system\_time\_secs":1.92,"cpus\_user\_time\_secs":6.93,"mem\_limit\_bytes":1107296256,"mem\_rss\_bytes":2329763840,"timestamp":1475670762.73402},"status":{"executor\_pid":31577}}]

Attach are UI screenshot :
Wa002.jpg is for rhel7 and other one is rhel6.
On 5 Oct 2016 4:55 p.m., "haosdent"  wrote:

> Hi, @Srikant How about the result of http://${YOUR_AGENT_IP}:5051/containers?
> It is wired that you could saw
>
> ```
> Updated 'memory.limit_in_bytes' to xxx
> ```
>
> in log as you mentioned, but `limit_in_bytes` is still the initialize
> value as you show above.
>
> On Wed, Oct 5, 2016 at 2:04 PM, Srikant Kalani <
> srikant.blackr...@gmail.com> wrote:
>
>> Here are the values -
>> Memory.limit_in_bytes = 1107296256
>> Memory.soft_limit_in_bytes=1107296256
>> Memory.memsw.limit_in_bytes=9223372036854775807
>>
>> I have run the same task on mesos 1.0.1 running on rhel6 and UI then
>> shows task memory usage as 2.2G/1.0G where 2.2 is used and 1.0G is
>> allocated but since we don't have cgroups their so task are not getting
>> killed.
>>
>> On rhel7 UI is showing 0B/1.0G for task memory details.
>>
>> Any idea is this rhel7 fault or do I need to  adjust some configurations ?
>> On 4 Oct 2016 21:33, "haosdent"  wrote:
>>
>>> Hi, @Srikant
>>>
>>> Hi, @Srikant
>>>
>>> Usually, your task should be killed when over cgroup limit. Would you
>>> enter the `/sys/fs/cgroup/memory/mesos` folder in the agent?
>>> Then check the values in `${YOUR_CONTAINER_ID}/memory.limit_in_bytes`,
>>>  `${YOUR_CONTAINER_ID}/memory.soft_limit_in_bytes` and
>>> `${YOUR_CONTAINER_ID}/memory.memsw.limit_in_bytes` and reply in this
>>> email.
>>>
>>> ${YOUR_CONTAINER_ID} is the container id of your task here, you could
>>> find it from the agent log. Or as you said, you only have this one task, so
>>> it should only have one directory under `/sys/fs/cgroup/memory/mesos`.
>>>
>>> Furthermore, would you show the result of 
>>> http://${YOUR_AGENT_IP}:5051/containers?
>>> It contains some tasks statistics information as well.
>>>
>>> On Tue, Oct 4, 2016 at 9:00 PM, Srikant Kalani <
>>> srikant.blackr...@gmail.com> wrote:
>>>
 We have upgraded linux from rhel6 to rhel7 and mesos from 0.27 to 1.0.1.
 After upgrade we are not able to see memory used by task which was fine
 in previous version. Due to this cgroups are not effective.

 Answers to your questions below :

 There is only 1 task running as a appserver which is consuming approx
 20G mem but this info is not coming in Mesos UI.
 Swaps are enabled in agent start command.
 These flags are used in agent - cgroups_limits_swap=true
 --isolation=cgroups/cpu,cgroups/mem --cgroups_hierachy=/sys/fs/c group
 In agent logs I can see updated memory limit to 33MB for container.

 Web UI shows the total memory allocated to framework but it is not
 showing memory used by task.It always shows 0B/33MB.

 Not sure if this is rhel7 issue or mesos 1.0.1.

 Any suggestions ?
 On 26 Sep 2016 21:55, "haosdent"  wrote:

> Hi, @Srikant May you elaborate
>
> >We have verified using top command that framework was using 2gB
> memory while allocated was just 50 mb.
>
> * How many running tasks in your framework?
> * Do you enable or disable swap in the agents?
> * What's the flags that you launch agents?
> * Have you saw some thing like `Updated 'memory.limit_in_bytes' to `
> in the log of agent?
>
> On Tue, Sep 27, 2016 at 12:14 AM, Srikant Kalani <
> srikant.blackr...@gmail.com> wrote:
>
>> Hi Greg ,
>>
>> Previously we were running Mesos 0.27 on Rhel6 and since we already
>> have one c group 

Re: Troubleshooting tasks that are stuck in the 'Staging' state

2016-10-05 Thread Frank Scholten
Ok. How do you typically monitor the messages between Master and
Agents? Do you have some tools for this on the cluster?

On Tue, Oct 4, 2016 at 6:21 PM, haosdent  wrote:
> Hi, @Frank Thanks for your information
>
>> I see messages 'Telling agent (...) to kill task (...)'. Why does this
>> happen?
> This should because your framework send a `KillTaskMessage` or
> `scheduler::Call::KILL` request to the Mesos Master, then the Mesos is going
> to kill your task.
>
>>Is this the exact text to search for or is this the name of the protobuf
>> message? Are these logged on a higher log level?
> it exists in the log of the agents. It looks like
> ```
> I1004 23:19:36.175673 45405 slave.cpp:1539] Got assigned task '1' for
> framework e7287433-36f9-48dd-8633-8a6ac7083a43-
> I1004 23:19:36.176206 45405 slave.cpp:1696] Launching task '1' for framework
> e7287433-36f9-48dd-8633-8a6ac7083a43-
> ```
> Usually, you could grep your task id in the agent log to see how the task
> failed.
>
>
>
> On Tue, Oct 4, 2016 at 8:50 PM, Frank Scholten 
> wrote:
>>
>> Thanks Haosdent for your quick response.
>>
>> I added GLOG_v=1 to the master and agents.
>>
>> 1. The framework is registered. Marathon in this case.
>> 2. I see messages 'Telling agent (...) to kill task (...)'. Why does
>> this happen? I also see 'Sending explicit reconciliation state
>> TASK_LOST for task fake-marathon-pacemaker-task-(...)'.
>> 3. I searched for RunTaskMessage in the agent log but could not find
>> it. Is this the exact text to search for or is this the name of the
>> protobuf message? Are these logged on a higher log level?
>>
>> On Tue, Oct 4, 2016 at 11:22 AM, haosdent  wrote:
>> > staging is the initialize status of the task. I think you may your logs
>> > via
>> > these steps:
>> >
>> > 1. If your framework registered successfully in the master?
>> > 2. If the master send resources offers to your framework and your
>> > framework
>> > accept it?
>> > 3. If your agents receive the RunTaskMessage from master to launch your
>> > task?
>> >
>> > In additionally, use `export GLOG_v=1` before start masters and agents
>> > may
>> > helpful for your troubleshooting.
>> >
>> > On Tue, Oct 4, 2016 at 4:58 PM, Frank Scholten 
>> > wrote:
>> >>
>> >> Hi all,
>> >>
>> >> I am looking for some ways to troubleshoot or debug tasks that are
>> >> stuck in the 'staging' state. Typically they have no logs in the
>> >> sandbox.
>> >>
>> >> Are there are any endpoints or things to look for in logs to identify
>> >> a root cause?
>> >>
>> >> Is there a troubleshooting guide for Mesos to solve problems like this?
>> >>
>> >> Cheers,
>> >>
>> >> Frank
>> >
>> >
>> >
>> >
>> > --
>> > Best Regards,
>> > Haosdent Huang
>
>
>
>
> --
> Best Regards,
> Haosdent Huang


Re: Resource Isolation in Mesos

2016-10-05 Thread haosdent
Hi, @Srikant How about the result of http://${YOUR_AGENT_IP}:5051/containers?
It is wired that you could saw

```
Updated 'memory.limit_in_bytes' to xxx
```

in log as you mentioned, but `limit_in_bytes` is still the initialize value
as you show above.

On Wed, Oct 5, 2016 at 2:04 PM, Srikant Kalani 
wrote:

> Here are the values -
> Memory.limit_in_bytes = 1107296256
> Memory.soft_limit_in_bytes=1107296256
> Memory.memsw.limit_in_bytes=9223372036854775807
>
> I have run the same task on mesos 1.0.1 running on rhel6 and UI then shows
> task memory usage as 2.2G/1.0G where 2.2 is used and 1.0G is allocated but
> since we don't have cgroups their so task are not getting killed.
>
> On rhel7 UI is showing 0B/1.0G for task memory details.
>
> Any idea is this rhel7 fault or do I need to  adjust some configurations ?
> On 4 Oct 2016 21:33, "haosdent"  wrote:
>
>> Hi, @Srikant
>>
>> Hi, @Srikant
>>
>> Usually, your task should be killed when over cgroup limit. Would you
>> enter the `/sys/fs/cgroup/memory/mesos` folder in the agent?
>> Then check the values in `${YOUR_CONTAINER_ID}/memory.limit_in_bytes`,
>>  `${YOUR_CONTAINER_ID}/memory.soft_limit_in_bytes` and
>> `${YOUR_CONTAINER_ID}/memory.memsw.limit_in_bytes` and reply in this
>> email.
>>
>> ${YOUR_CONTAINER_ID} is the container id of your task here, you could
>> find it from the agent log. Or as you said, you only have this one task, so
>> it should only have one directory under `/sys/fs/cgroup/memory/mesos`.
>>
>> Furthermore, would you show the result of 
>> http://${YOUR_AGENT_IP}:5051/containers?
>> It contains some tasks statistics information as well.
>>
>> On Tue, Oct 4, 2016 at 9:00 PM, Srikant Kalani <
>> srikant.blackr...@gmail.com> wrote:
>>
>>> We have upgraded linux from rhel6 to rhel7 and mesos from 0.27 to 1.0.1.
>>> After upgrade we are not able to see memory used by task which was fine
>>> in previous version. Due to this cgroups are not effective.
>>>
>>> Answers to your questions below :
>>>
>>> There is only 1 task running as a appserver which is consuming approx
>>> 20G mem but this info is not coming in Mesos UI.
>>> Swaps are enabled in agent start command.
>>> These flags are used in agent - cgroups_limits_swap=true
>>> --isolation=cgroups/cpu,cgroups/mem --cgroups_hierachy=/sys/fs/c group
>>> In agent logs I can see updated memory limit to 33MB for container.
>>>
>>> Web UI shows the total memory allocated to framework but it is not
>>> showing memory used by task.It always shows 0B/33MB.
>>>
>>> Not sure if this is rhel7 issue or mesos 1.0.1.
>>>
>>> Any suggestions ?
>>> On 26 Sep 2016 21:55, "haosdent"  wrote:
>>>
 Hi, @Srikant May you elaborate

 >We have verified using top command that framework was using 2gB
 memory while allocated was just 50 mb.

 * How many running tasks in your framework?
 * Do you enable or disable swap in the agents?
 * What's the flags that you launch agents?
 * Have you saw some thing like `Updated 'memory.limit_in_bytes' to ` in
 the log of agent?

 On Tue, Sep 27, 2016 at 12:14 AM, Srikant Kalani <
 srikant.blackr...@gmail.com> wrote:

> Hi Greg ,
>
> Previously we were running Mesos 0.27 on Rhel6 and since we already
> have one c group hierarchy for cpu and memory for our production  
> processes
> I'd we were not able to merge two c groups hierarchy on rhel6. Slave
> process was not coming up.
> Now we have moved  to Rhel7 and both mesos master and slave are
> running on rhel7 with c group implemented.But we are seeing that mesos UI
> not showing the actual memory used by framework.
>
> Any idea why framework usage of cpu and memory is not coming in UI.
> Due to this OS is still not killing the task which are consuming more
> memory than the allocated one.
> We have verified using top command that framework was using 2gB memory
> while allocated was just 50 mb.
>
> Please suggest.
> On 8 Sep 2016 01:53, "Greg Mann"  wrote:
>
>> Hi Srikant,
>> Without using cgroups, it won't be possible to enforce isolation of
>> cpu/memory on a Linux agent. Could you elaborate a bit on why you aren't
>> able to use cgroups currently? Have you tested the existing Mesos cgroup
>> isolators in your system?
>>
>> Cheers,
>> Greg
>>
>> On Tue, Sep 6, 2016 at 9:24 PM, Srikant Kalani <
>> srikant.blackr...@gmail.com> wrote:
>>
>>> Hi Guys,
>>>
>>> We are running Mesos cluster in our development environment. We are
>>> seeing the cases where framework uses more amount of resources like cpu 
>>> and
>>> memory then the initial requested resources. When any new framework is
>>> registered Mesos calculates the resources on the basis of already 
>>> offered
>>> resources to first framework and it doesn't consider actual 

Re: Resource Isolation in Mesos

2016-10-05 Thread Srikant Kalani
Here are the values -
Memory.limit_in_bytes = 1107296256
Memory.soft_limit_in_bytes=1107296256
Memory.memsw.limit_in_bytes=9223372036854775807

I have run the same task on mesos 1.0.1 running on rhel6 and UI then shows
task memory usage as 2.2G/1.0G where 2.2 is used and 1.0G is allocated but
since we don't have cgroups their so task are not getting killed.

On rhel7 UI is showing 0B/1.0G for task memory details.

Any idea is this rhel7 fault or do I need to  adjust some configurations ?
On 4 Oct 2016 21:33, "haosdent"  wrote:

> Hi, @Srikant
>
> Hi, @Srikant
>
> Usually, your task should be killed when over cgroup limit. Would you
> enter the `/sys/fs/cgroup/memory/mesos` folder in the agent?
> Then check the values in `${YOUR_CONTAINER_ID}/memory.limit_in_bytes`,
>  `${YOUR_CONTAINER_ID}/memory.soft_limit_in_bytes` and
> `${YOUR_CONTAINER_ID}/memory.memsw.limit_in_bytes` and reply in this
> email.
>
> ${YOUR_CONTAINER_ID} is the container id of your task here, you could find
> it from the agent log. Or as you said, you only have this one task, so it
> should only have one directory under `/sys/fs/cgroup/memory/mesos`.
>
> Furthermore, would you show the result of 
> http://${YOUR_AGENT_IP}:5051/containers?
> It contains some tasks statistics information as well.
>
> On Tue, Oct 4, 2016 at 9:00 PM, Srikant Kalani <
> srikant.blackr...@gmail.com> wrote:
>
>> We have upgraded linux from rhel6 to rhel7 and mesos from 0.27 to 1.0.1.
>> After upgrade we are not able to see memory used by task which was fine
>> in previous version. Due to this cgroups are not effective.
>>
>> Answers to your questions below :
>>
>> There is only 1 task running as a appserver which is consuming approx 20G
>> mem but this info is not coming in Mesos UI.
>> Swaps are enabled in agent start command.
>> These flags are used in agent - cgroups_limits_swap=true
>> --isolation=cgroups/cpu,cgroups/mem --cgroups_hierachy=/sys/fs/c group
>> In agent logs I can see updated memory limit to 33MB for container.
>>
>> Web UI shows the total memory allocated to framework but it is not
>> showing memory used by task.It always shows 0B/33MB.
>>
>> Not sure if this is rhel7 issue or mesos 1.0.1.
>>
>> Any suggestions ?
>> On 26 Sep 2016 21:55, "haosdent"  wrote:
>>
>>> Hi, @Srikant May you elaborate
>>>
>>> >We have verified using top command that framework was using 2gB memory
>>> while allocated was just 50 mb.
>>>
>>> * How many running tasks in your framework?
>>> * Do you enable or disable swap in the agents?
>>> * What's the flags that you launch agents?
>>> * Have you saw some thing like `Updated 'memory.limit_in_bytes' to ` in
>>> the log of agent?
>>>
>>> On Tue, Sep 27, 2016 at 12:14 AM, Srikant Kalani <
>>> srikant.blackr...@gmail.com> wrote:
>>>
 Hi Greg ,

 Previously we were running Mesos 0.27 on Rhel6 and since we already
 have one c group hierarchy for cpu and memory for our production  processes
 I'd we were not able to merge two c groups hierarchy on rhel6. Slave
 process was not coming up.
 Now we have moved  to Rhel7 and both mesos master and slave are running
 on rhel7 with c group implemented.But we are seeing that mesos UI not
 showing the actual memory used by framework.

 Any idea why framework usage of cpu and memory is not coming in UI. Due
 to this OS is still not killing the task which are consuming more memory
 than the allocated one.
 We have verified using top command that framework was using 2gB memory
 while allocated was just 50 mb.

 Please suggest.
 On 8 Sep 2016 01:53, "Greg Mann"  wrote:

> Hi Srikant,
> Without using cgroups, it won't be possible to enforce isolation of
> cpu/memory on a Linux agent. Could you elaborate a bit on why you aren't
> able to use cgroups currently? Have you tested the existing Mesos cgroup
> isolators in your system?
>
> Cheers,
> Greg
>
> On Tue, Sep 6, 2016 at 9:24 PM, Srikant Kalani <
> srikant.blackr...@gmail.com> wrote:
>
>> Hi Guys,
>>
>> We are running Mesos cluster in our development environment. We are
>> seeing the cases where framework uses more amount of resources like cpu 
>> and
>> memory then the initial requested resources. When any new framework is
>> registered Mesos calculates the resources on the basis of already offered
>> resources to first framework and it doesn't consider actual  resources
>> utilised by previous framework.
>> This is resulting in incorrect calculation of resources.
>> Mesos website says that we should Implement  c groups but it is not
>> possible in our case as we have already implemented c groups in other
>> projects and due to Linux restrictions  we can't merge two c groups
>> hierarchy.
>>
>> Any idea how we can implement resource Isolation in Mesos ?
>>
>> We are using Mesos 0.27.1