[Meetup] Bangalore's first Mesos Meetup

2016-08-01 Thread DhilipKumar Sankaranarayanan
Hi All,

Happy to announce Bangalore's first Mesos Meetup at Huawei R campus.  All
are welcome. Please RSVP at the below link

https://www.meetup.com/Bangalore-Mesos-cncf-User-Group/events/228745899/

Anyone wants to present or talk about Apache Mesos please let us know.

Regards,
Dhilip


Re: Initial Design Document Apache Mesos Federation (JIRA 3548)

2016-08-01 Thread DhilipKumar Sankaranarayanan
Hi All,

Sorry for the long gap.  We had an interesting discussion last week at
Mesosphere HQ again on this topic before the Mesos SF Meetup.

The discussion revolved around several areas and suggestions on the
proposed design.

One of the main item that popped up was the approach through which we
should achieve Mesos Federation.  The intend was to take the approach that
will be more sensible for the community and easy to adopt by most.

*Approach 1:* (Peer to Peer with a separate policy Engine) Already Proposed
Design
*Approach 2:*  (Hierarchical Design) Design similar to Kubernetes
Federation where we introduce a Federation Layer in-between Framework and
the Masters.

Both the designs have their unique advantages and dis-advantages.  So here
is the survey link please provide your feedback, this should set the ball
rolling for us.

https://goo.gl/forms/DpVRV9Zh3kunhJkP2

If you have third approach to be include please write to me, ill be happy
to add that in the survey

Regardless of the design chosen, following enhancement to the master will
be helpful to reduce "offers" traffic across continents.

Enhancement: A framework will be able to send RequestResource( constrains)
to the master, the master then only sends those offers that match the
constrain.

Regards,
Dhilip




On Fri, Jul 15, 2016 at 3:46 PM, DhilipKumar Sankaranarayanan <
s.dhilipku...@gmail.com> wrote:

> Hi All,
>
> I got a chance to bring this up during yesterdays Community Sync.  It was
> great discussing with you all.
>
> As a general feedback the role of policy engine in the design needs to be
> clearer, i will update the Document with more information on PE very soon.
>
> We are yet to get more insight on the License issues like bringing in a
> Mozzilla 2.0 library into an Apache 2.0 project.
>
> It will be fantastic to get more thoughts on this from the community so
> please share if you or your organisation had thought about it.
>
> HI Alex,
>
> Thanks again.
>
> a) Yes you are correct, thats exactly what we thought, a Framework could
> simply query and learn about its next step (bursting or load balancing).
> b)  We are currently thinking that the Framework will run in only one
> place and should be able to connect to other datacenters.  Each data
> centres could have some Frameworks running the local and some part of a
> federation.
>
> Regards,
> Dhilip
>
>
> On Thu, Jul 14, 2016 at 9:17 AM, Alexander Gallego 
> wrote:
>
>>
>>
>> On Thu, Jul 14, 2016 at 2:40 AM, DhilipKumar Sankaranarayanan <
>> s.dhilipku...@gmail.com> wrote:
>>
>>> HI Alex,
>>>
>>> Thanks for taking a look.  We have simplified the design since the
>>> conference.  The Allocation and Anonymous modules where only helping us to
>>> control the offers sent to the frameworks.  Now we think that Roles and
>>> Quota in Moses elegantly solve this problem and we could take advantage of
>>> it.
>>>
>>
>> Sounds good, given that the design is entirely different now, can you
>> share some of these thoughts.
>>
>>
>>>
>>> The current design does not propose Mesos Modules, the POC we
>>> demonstrated @ the mesoscon is slightly out of date in that respect.
>>>
>>> The current design only enforces that any Policy Engine implementation
>>> should honour certain REST apis.   This also removes Consul out of the
>>> picture, but at Huawei our implementation would pretty much consider Consul
>>> or something similar.
>>>
>>> 1) Failure semantics
>>> I do agree it is not straight forward to declare that a DC is lost just
>>> because framework lost the connection intermittently.  Probing the
>>> 'Gossiper' we would know that the DC is still active but not just reachable
>>> to us,  In that case its worth the wait.  If the DC in question is not
>>> reachable from everyother DC, only then we could come to such conclusion.
>>>
>>>
>>
>> how do you envision frameworks integrating w/ this. Are you saying that
>> frameworks should poll the HTTP endpoint of the Gossiper?
>>
>>
>>
>>> 2)  Can you share more details about the allocator modules.
>>> As mentioned earlier these modules are no longer relevant we have much
>>> simpler way to achieve this.
>>>
>>> 3) High Availability
>>> I think you are talking about the below section?
>>> "Sequence Diagram for High Availability
>>>
>>> (Incase of local datacenter failure)
>>> Very Similar to cloud bursting use-case scenario.  "
>>> The sequence diagram only represents flow of events in case if the
>>> current datacenter fails and the framework needs to connect to a new one.
>>> It is not talking about the approach you mentioned.  I will update doc
>>> couple more diagrams soon to make it more understandable.  We would
>>> certainly like to have a federated K/V storage layer across the DCs which
>>> is why Consul was considered in the first place.
>>>
>>>
>> Does this mean that you have to run the actual framework code in all of
>> the DC's ?  or you have yet to iron this out?
>>
>>
>>
>>
>>> 4) Metrics / Monitoring - 

1.0.1 release

2016-08-01 Thread Vinod Kone
Hi,

As discussed on the 1.0 voting thread, we plan to cut a 1.0.1 as early as
this week. So if you have anything that needs to absolutely go into the
patch release, please work with your shepherd and get it landed on trunk
and backported to the 1.0.x branch.

Thanks,


Re: Enabling basic access authentication

2016-08-01 Thread Vinod Kone
We separated out the default authentication mode for read only (default: no
authn) and read-write (default: authn) endpoints. Since the webui only
depends on the read-only endpoints you need to explicitly enable authn for
read-only endpoints if you need authn. See
https://github.com/apache/mesos/blob/master/docs/upgrades.md for more
details.

On Mon, Aug 1, 2016 at 12:20 PM, Douglas Nelson  wrote:

> It was working for me with mesos 1.0.0-rc2. Now that I made the switch to
> 1.0.0 the feature is missing for user/pass prompt at the WebUI. Was another
> flag added or was it decided that this feature wasn't necessary?
>
> On Tue, Jul 12, 2016 at 6:26 PM, Douglas Nelson 
> wrote:
>
>> Ah, I missed that in the vote message. That makes sense. I'm running
>> version 0.28.2 so that would be why.
>>
>> On Tue, Jul 12, 2016 at 6:22 PM, Zhitao Li  wrote:
>>
>>> Just went through this: I think the necessary endpoint `/master/state`
>>> is only authenticated after 1.0.0, which is still going through release
>>> vote.
>>>
>>> Can you share which version of Mesos you are running?
>>>
>>> On Tue, Jul 12, 2016 at 5:18 PM, Douglas Nelson 
>>> wrote:
>>>
 With marathon you can enable basic access authentication to the WebUI
 with the flag --http_credentials.

 I expected something similar with the flag --authenticate_http in mesos
 but when I hit the WebUI I'm not prompted to give a username/pass. Is that
 feature not included in mesos or is there a different configuration I need
 to set?

 Thanks!

>>>
>>>
>>>
>>> --
>>> Cheers,
>>>
>>> Zhitao Li
>>>
>>
>>
>


Re: Enabling basic access authentication

2016-08-01 Thread Douglas Nelson
It was working for me with mesos 1.0.0-rc2. Now that I made the switch to
1.0.0 the feature is missing for user/pass prompt at the WebUI. Was another
flag added or was it decided that this feature wasn't necessary?

On Tue, Jul 12, 2016 at 6:26 PM, Douglas Nelson  wrote:

> Ah, I missed that in the vote message. That makes sense. I'm running
> version 0.28.2 so that would be why.
>
> On Tue, Jul 12, 2016 at 6:22 PM, Zhitao Li  wrote:
>
>> Just went through this: I think the necessary endpoint `/master/state` is
>> only authenticated after 1.0.0, which is still going through release vote.
>>
>> Can you share which version of Mesos you are running?
>>
>> On Tue, Jul 12, 2016 at 5:18 PM, Douglas Nelson 
>> wrote:
>>
>>> With marathon you can enable basic access authentication to the WebUI
>>> with the flag --http_credentials.
>>>
>>> I expected something similar with the flag --authenticate_http in mesos
>>> but when I hit the WebUI I'm not prompted to give a username/pass. Is that
>>> feature not included in mesos or is there a different configuration I need
>>> to set?
>>>
>>> Thanks!
>>>
>>
>>
>>
>> --
>> Cheers,
>>
>> Zhitao Li
>>
>
>


Re: Attributes cause agent to fail

2016-08-01 Thread Douglas Nelson
That showed me what wasn't working. If you start the mesos agent at all
before setting attributes (or if you change attributes) you need to make
sure it doesn't recover old live executors.

The error:
Failed to perform recovery: Incompatible agent info detected.

The solution:
Step 1: rm -f /var/lib/mesos/meta/slaves/latest
Step 2: Restart the agent.

Thanks!

On Fri, Jul 29, 2016 at 8:47 PM, Benjamin Mahler  wrote:

> Unfortunately we log termination messages to stderr rather than the
> logging files. Can you show stderr? I suspect we're printing the exit
> message there.
>
> See: https://issues.apache.org/jira/browse/MESOS-5854
>
> On Fri, Jul 29, 2016 at 5:57 PM, Douglas Nelson 
> wrote:
>
>> It might be an issue with the mesos-init-wrapper? I'm using that to set
>> the flag via config files. I'll have to look through it and see exactly
>> what it's doing when it sets the attributes flag.
>>
>> On Fri, Jul 29, 2016 at 6:48 PM, Douglas Nelson 
>> wrote:
>>
>>> I'm pretty sure I set the flag right. Here is the agent's info:
>>>
>>> Log file created at: 2016/07/29 18:25:16
>>> Running on machine: lubuntu
>>> Log line format: [IWEF]mmdd hh:mm:ss.uu threadid file:line] msg
>>> I0729 18:25:16.494326  4559 logging.cpp:194] INFO level logging started!
>>> I0729 18:25:16.496150  4559 containerizer.cpp:196] Using isolation:
>>> posix/cpu,posix/mem,filesystem/posix,network/cni
>>> I0729 18:25:16.498539  4559 linux_launcher.cpp:101] Using
>>> /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher
>>> I0729 18:25:16.499295  4559 main.cpp:434] Starting Mesos agent
>>> I0729 18:25:16.500417  4577 slave.cpp:198] Agent started on 1)@
>>> 127.0.1.1:5051
>>> I0729 18:25:16.500427  4577 slave.cpp:199] Flags at startup:
>>> --appc_simple_discovery_uri_prefix="http://;
>>> --appc_store_dir="/tmp/mesos/store/appc" --attributes="test:test"
>>> --authenticate_http_readonl...
>>> I0729 18:25:16.500751  4577 slave.cpp:519] Agent resources: cpus(*):1;
>>> mem(*):1000; disk(*):13901; ports(*):[31000-32000]
>>> I0729 18:25:16.500776  4577 slave.cpp:527] Agent attributes: [ test=test
>>> ]
>>> I0729 18:25:16.500779  4577 slave.cpp:532] Agent hostname: lubuntu
>>> I0729 18:25:16.502638  4578 state.cpp:57] Recovering state from
>>> '/var/lib/mesos/meta'
>>> I0729 18:25:16.502667  4578 state.cpp:697] No checkpointed resources
>>> found at '/var/lib/mesos/meta/resources/resources.info'
>>>
>>>
>>> On Fri, Jul 29, 2016 at 6:41 PM, Joseph Wu  wrote:
>>>
 Works fine for me.  Make sure the agent isn't just complaining about
 invalid flags.

 i.e. This is invalid:
 --attributes="something"

 This is valid:
 --attributes="something:foo"
 --attributes="something:foo; nothing:bar"

 And make sure your agent's work directory doesn't contain info from an
 agent started with different attributes (or no attributes).

 On Fri, Jul 29, 2016 at 5:31 PM, Douglas Nelson 
 wrote:

> When I set any attributes for the agent node it fails to run. No
> mesos-slave.ERROR log is created. I am using mesos 1.0.0 from the
> mesosphere package, but I also tried building it and had the same issue.
>
> As soon as I remove the --attributes flag the agent runs normally and
> registers itself with the master node. Is attributes deprecated? Is anyone
> else running into this?
>


>>>
>>
>


Re: Running a mesos executor within a container....

2016-08-01 Thread haosdent
I think you need du -sh /mnt/mesos/sandbox in the container because it is
the mounted path of the sandbox in the docker container.

On Mon, Aug 1, 2016 at 8:21 PM, Mark Hammons 
wrote:

> I'll double check, but last time I du -sh the var folder in the container
> it was too small to contain the binaries I downloaded.
>
> Mark Edgar Hammons II - Research Engineer at BioEmergences
> 0603695656
>
> On 01 Aug 2016, at 03:57, haosdent  wrote:
>
> >First, the binaries I get mesos to download into my sandbox don't appear
> to be within the docker image
> The sandbox folder should mount into the docker container with the same
> path, does this not work for you?
>
> >running the executor causes it to complain about libmesos being missing.
> Yes, you need package the native library(libmesos) which required by
> executor into your jar or package it into your docker container.
> Or you could consider implementing your executor base on the new HTTP API (
> https://github.com/mesosphere/mesos-rxjava)
>
> On Sun, Jul 31, 2016 at 9:47 PM, Mark Hammons <
> mark.hamm...@inaf.cnrs-gif.fr> wrote:
>
>> Hi all,
>>
>>
>>
>> So far I've been developing a mesos framework using a custom executor and
>> scheduler. Custom executors seem to provide a lot of flexibility in
>> responding to and communicating with the parent mesos-slave, so I'd like to
>> keep that around. However, my users would like to have customizable
>> environments for their algorithms to run in (like maybe one user would
>> design their application in an ubuntu environment and doesn't want to take
>> the time to adapt to the CentOS 7 environment the mesos slave is running).
>> So, I'd like to set the sandbox of the executor to be within the context of
>> a docker image. I thought doing something like this would work:
>> http://pastie.org/10924936
>>
>>
>>
>> but it doesn't for three big reasons. First, the binaries I get mesos to
>> download into my sandbox don't appear to be within the docker image, and
>> second the command defined by execCommand is executed at the root of the
>> docker image instead of within the mesos sandbox. Even when I get around
>> these issues by adding directives in execCommand to download the missing
>> binaries into the docker image, running the executor causes it to complain
>> about libmesos being missing. Is there any way to do what I'm trying to do?
>> Could someone give me an example project that works this way?
>>
>>
>>
>> Thanks,
>>
>> Mark
>> --
>>
>> Mark Hammons - +33 06 03 69 56 56
>>
>> Research Engineer @ BioEmergences 
>>
>> Lab Phone: 01 69 82 34 19
>>
>
>
>
> --
> Best Regards,
> Haosdent Huang
>
>


-- 
Best Regards,
Haosdent Huang


Re: Log the print out instead of print screen

2016-08-01 Thread haosdent
You could specific the `--log_dir` when you start mesos master and agents.
For example, if you start master and agents with
`--log_dir=/var/log/mesos`, all the logs would generate under
`/var/log/mesos`.

On Mon, Aug 1, 2016 at 7:47 PM, Bryan Fok  wrote:

> Hi
>
> How do I redirect the default print screen messages to log file
> (preferable to the log file create by loggin module)?
>
>
> I0801 10:53:40.08255139 group.cpp:427] Trying to create path '/mesos'
> in ZooKeeper
> I0801 10:53:40.08376830 detector.cpp:152] Detected a new leader:
> (id='9')
> I0801 10:53:40.08385932 group.cpp:700] Trying to get
> '/mesos/json.info_09' in ZooKeeper
> I0801 10:53:40.08439849 detector.cpp:479] A new leading master (UPID=
> master@10.85.1.117:5050) is detected
> I0801 10:53:40.08445528 sched.cpp:326] New master detected at
> master@10.85.1.117:5050
> I0801 10:53:40.08454728 sched.cpp:336] No credentials provided.
> Attempting to register without authentication
> I0801 10:53:40.08543336 sched.cpp:703] Framework registered with
> a8180ad8-5fb6-4696-bacf-462552ffe4ef-0466
> I0801 10:53:56.31438820 sched.cpp:1911] Asked to stop the driver
> I0801 10:53:56.31469024 sched.cpp:1143] Stopping framework
> 'a8180ad8-5fb6-4696-bacf-462552ffe4ef-0466'
> I0801 10:53:56.31511057 sched.cpp:1911] Asked to stop the driver
>
>
> Thank you,
> Bryan
>



-- 
Best Regards,
Haosdent Huang


Re: never finish unresponsive tasks

2016-08-01 Thread haosdent
Hi, it is impossible to allocate unlimited memory to tasks. Could your
tasks always finish in a fixed memory resource range? If so, you could
choose a max value of that range.

On Mon, Aug 1, 2016 at 7:54 PM, Bryan Fok  wrote:

> Sometimes there are a few tasks (depend on how many tasks have submitted)
> never complete, and no error return either. When I see this,  I will
> increase the "mem" in the task.resources and it solve the problem. I dont
> think modify it every time when we encounter this problem is a good
> solution. Is there anyway that by default allocate unlimited memory to
> tasks? Or its not even the root problem?
>
> Thank you,
> Bryan
>
>


-- 
Best Regards,
Haosdent Huang


RE: Cadvisor and Mesos: cgroup monitoring

2016-08-01 Thread Aurélien DEHAY
My bad.

I forgot to restart the tasks. It works fine.

Thanks again.

De : haosdent [mailto:haosd...@gmail.com]
Envoyé : lundi 1 août 2016 16:10
À : user 
Objet : Re: Cadvisor and Mesos: cgroup monitoring

Hi, which mesos version you used? cgroups/devices is available since 1.0.0

On Mon, Aug 1, 2016 at 8:10 PM, Aurélien DEHAY 
> wrote:

Hi.



Adding cgroups/devices does not work, the /proc/x/task/x/cgroup does not list 
the mesos cgroup under devices.



I will try snap with the plugin from Roger.



Thanks.


De : haosdent >
Envoyé : vendredi 29 juillet 2016 17:30:20

À : user
Objet : Re: Cadvisor and Mesos: cgroup monitoring

Hi, you could add `cgroups/devices` into your Mesos Agent isolation flags and 
restart it.

On Fri, Jul 29, 2016 at 10:30 PM, 
aurelien.de...@gmail.com 
> wrote:

Hello.



For example:
proc/113197/task/113197
[root@opvaames06 113197]# cat cgroup
10:devices:/
9:perf_event:/
8:freezer:/mesos/3af01ccb-ace4-4467-84e9-1e6ae28d6dd7
7:memory:/mesos/3af01ccb-ace4-4467-84e9-1e6ae28d6dd7
6:cpuset:/
5:net_cls:/
4:cpuacct,cpu:/mesos/3af01ccb-ace4-4467-84e9-1e6ae28d6dd7
3:hugetlb:/
2:blkio:/
1:name=systemd:/mesos_executors.slice

First line: 10: devices:/.

From a mail on the cadvisor mailing list:
cAvisor uses `ps` to group processes into cgroups. `ps` uses `devices` cgroups 
to identify the cgroups of a process. In your case, devices cgroup is still set 
to root `/`.


Thanks for the answer.


De : haosdent >
Envoyé : vendredi 29 juillet 2016 16:21:07
À : user
Objet : Re: Cadvisor and Mesos: cgroup monitoring

>- is there a way to "link" the cgroup id to the mesos task id?
The cgroup id you saw is the ContainerId actually. You could get the mapping 
relations from querying the state endpoint of Mesos agent.

>In mesos, this information is set to /, so all my process are shown in the / 
>cgroup.
May you mind provide more details about this? As I know, all the mesos 
containers should under `Flags::cgroups_root` which default value is "mesos".

In additionally, do you ever try the `/metrics/snapshot` endpoints of Mesos 
Master and Mesos Agent, which provide some useful messages could used to 
monitor as well.

On Fri, Jul 29, 2016 at 10:07 PM, 
aurelien.de...@gmail.com 
> wrote:



Hello.



I'm trying to find a solution to monitor the real usage of my mesos tasks. I 
don't use docker at all, but I gave a look to cadvisor.



Unfortunatly, cadvisor uses ps and the device information to determine the 
cgroup ownership of a process. In mesos, this information is set to /, so all 
my process are shown in the / cgroup.



I can monitor the usage of the cgroups however, but I have no clue of which 
process are inside, and the cgroup name is not related to the taskid in mesos.



So, 3 questions:

- is anybody managed to use cadvisor (with ou without mesos) for cgroups?

- is there a "better" solution ?

- is there a way to "link" the cgroup id to the mesos task id?



Thanks for any clue.



And a good sysadmin appreciation day!



Aurélien.




--
Best Regards,
Haosdent Huang



--
Best Regards,
Haosdent Huang



--
Best Regards,
Haosdent Huang


Re: Cadvisor and Mesos: cgroup monitoring

2016-08-01 Thread haosdent
Hi, which mesos version you used? cgroups/devices is available since 1.0.0

On Mon, Aug 1, 2016 at 8:10 PM, Aurélien DEHAY 
wrote:

> Hi.
>
>
> Adding cgroups/devices does not work, the /proc/x/task/x/cgroup does not
> list the mesos cgroup under devices.
>
>
> I will try snap with the plugin from Roger.
>
>
> Thanks.
> --
> *De :* haosdent 
> *Envoyé :* vendredi 29 juillet 2016 17:30:20
>
> *À :* user
> *Objet :* Re: Cadvisor and Mesos: cgroup monitoring
>
> Hi, you could add `cgroups/devices` into your Mesos Agent isolation flags
> and restart it.
>
> On Fri, Jul 29, 2016 at 10:30 PM, aurelien.de...@gmail.com <
> aurelien.de...@gmail.com> wrote:
>
>> Hello.
>>
>>
>> For example:
>>
>> proc/113197/task/113197
>> [root@opvaames06 113197]# cat cgroup
>> 10:devices:/
>> 9:perf_event:/
>> 8:freezer:/mesos/3af01ccb-ace4-4467-84e9-1e6ae28d6dd7
>> 7:memory:/mesos/3af01ccb-ace4-4467-84e9-1e6ae28d6dd7
>> 6:cpuset:/
>> 5:net_cls:/
>> 4:cpuacct,cpu:/mesos/3af01ccb-ace4-4467-84e9-1e6ae28d6dd7
>> 3:hugetlb:/
>> 2:blkio:/
>> 1:name=systemd:/mesos_executors.slice
>>
>> First line: 10: devices:/.
>>
>> From a mail on the cadvisor mailing list:
>> cAvisor uses `ps` to group processes into cgroups. `ps` uses `devices`
>> cgroups to identify the cgroups of a process. In your case, devices cgroup
>> is still set to root `/`.
>>
>> Thanks for the answer.
>> --
>> *De :* haosdent 
>> *Envoyé :* vendredi 29 juillet 2016 16:21:07
>> *À :* user
>> *Objet :* Re: Cadvisor and Mesos: cgroup monitoring
>>
>> >- is there a way to "link" the cgroup id to the mesos task id?
>> The cgroup id you saw is the ContainerId actually. You could get the
>> mapping relations from querying the state endpoint of Mesos agent.
>>
>> >In mesos, this information is set to /, so all my process are shown in
>> the / cgroup.
>> May you mind provide more details about this? As I know, all the mesos
>> containers should under `Flags::cgroups_root` which default value is
>> "mesos".
>>
>> In additionally, do you ever try the `/metrics/snapshot` endpoints of
>> Mesos Master and Mesos Agent, which provide some useful messages could used
>> to monitor as well.
>>
>> On Fri, Jul 29, 2016 at 10:07 PM, aurelien.de...@gmail.com <
>> aurelien.de...@gmail.com> wrote:
>>
>>>
>>>
>>>
>>> Hello.
>>>
>>>
>>> I'm trying to find a solution to monitor the real usage of my mesos
>>> tasks. I don't use docker at all, but I gave a look to cadvisor.
>>>
>>>
>>> Unfortunatly, cadvisor uses ps and the device information to determine
>>> the cgroup ownership of a process. In mesos, this information is set to /,
>>> so all my process are shown in the / cgroup.
>>>
>>>
>>> I can monitor the usage of the cgroups however, but I have no clue of
>>> which process are inside, and the cgroup name is not related to the taskid
>>> in mesos.
>>>
>>>
>>> So, 3 questions:
>>>
>>> - is anybody managed to use cadvisor (with ou without mesos) for cgroups?
>>>
>>> - is there a "better" solution ?
>>>
>>> - is there a way to "link" the cgroup id to the mesos task id?
>>>
>>>
>>> Thanks for any clue.
>>>
>>>
>>> And a good sysadmin appreciation day!
>>>
>>>
>>> Aurélien.
>>>
>>>
>>
>>
>> --
>> Best Regards,
>> Haosdent Huang
>>
>
>
>
> --
> Best Regards,
> Haosdent Huang
>



-- 
Best Regards,
Haosdent Huang


Re: Running a mesos executor within a container....

2016-08-01 Thread Mark Hammons
I'll double check, but last time I du -sh the var folder in the container it 
was too small to contain the binaries I downloaded.

Mark Edgar Hammons II - Research Engineer at BioEmergences
0603695656

> On 01 Aug 2016, at 03:57, haosdent  wrote:
> 
> >First, the binaries I get mesos to download into my sandbox don't appear to 
> >be within the docker image
> The sandbox folder should mount into the docker container with the same path, 
> does this not work for you? 
> 
> >running the executor causes it to complain about libmesos being missing.
> Yes, you need package the native library(libmesos) which required by executor 
> into your jar or package it into your docker container.
> Or you could consider implementing your executor base on the new HTTP API 
> (https://github.com/mesosphere/mesos-rxjava)
> 
>> On Sun, Jul 31, 2016 at 9:47 PM, Mark Hammons 
>>  wrote:
>> Hi all,
>>  
>> So far I've been developing a mesos framework using a custom executor and 
>> scheduler. Custom executors seem to provide a lot of flexibility in 
>> responding to and communicating with the parent mesos-slave, so I'd like to 
>> keep that around. However, my users would like to have customizable 
>> environments for their algorithms to run in (like maybe one user would 
>> design their application in an ubuntu environment and doesn't want to take 
>> the time to adapt to the CentOS 7 environment the mesos slave is running). 
>> So, I'd like to set the sandbox of the executor to be within the context of 
>> a docker image. I thought doing something like this would work: 
>> http://pastie.org/10924936
>>  
>> but it doesn't for three big reasons. First, the binaries I get mesos to 
>> download into my sandbox don't appear to be within the docker image, and 
>> second the command defined by execCommand is executed at the root of the 
>> docker image instead of within the mesos sandbox. Even when I get around 
>> these issues by adding directives in execCommand to download the missing 
>> binaries into the docker image, running the executor causes it to complain 
>> about libmesos being missing. Is there any way to do what I'm trying to do? 
>> Could someone give me an example project that works this way?
>>  
>> Thanks,
>> Mark
>> --
>> Mark Hammons - +33 06 03 69 56 56
>> Research Engineer @ BioEmergences
>> Lab Phone: 01 69 82 34 19
> 
> 
> 
> -- 
> Best Regards,
> Haosdent Huang


RE: Cadvisor and Mesos: cgroup monitoring

2016-08-01 Thread Aurélien DEHAY
Hi.


Adding cgroups/devices does not work, the /proc/x/task/x/cgroup does not list 
the mesos cgroup under devices.


I will try snap with the plugin from Roger.


Thanks.


De : haosdent 
Envoyé : vendredi 29 juillet 2016 17:30:20
À : user
Objet : Re: Cadvisor and Mesos: cgroup monitoring

Hi, you could add `cgroups/devices` into your Mesos Agent isolation flags and 
restart it.

On Fri, Jul 29, 2016 at 10:30 PM, 
aurelien.de...@gmail.com 
> wrote:

Hello.


For example:

proc/113197/task/113197
[root@opvaames06 113197]# cat cgroup
10:devices:/
9:perf_event:/
8:freezer:/mesos/3af01ccb-ace4-4467-84e9-1e6ae28d6dd7
7:memory:/mesos/3af01ccb-ace4-4467-84e9-1e6ae28d6dd7
6:cpuset:/
5:net_cls:/
4:cpuacct,cpu:/mesos/3af01ccb-ace4-4467-84e9-1e6ae28d6dd7
3:hugetlb:/
2:blkio:/
1:name=systemd:/mesos_executors.slice

First line: 10: devices:/.

>From a mail on the cadvisor mailing list:
cAvisor uses `ps` to group processes into cgroups. `ps` uses `devices` cgroups 
to identify the cgroups of a process. In your case, devices cgroup is still set 
to root `/`.


Thanks for the answer.


De : haosdent >
Envoyé : vendredi 29 juillet 2016 16:21:07
À : user
Objet : Re: Cadvisor and Mesos: cgroup monitoring

>- is there a way to "link" the cgroup id to the mesos task id?
The cgroup id you saw is the ContainerId actually. You could get the mapping 
relations from querying the state endpoint of Mesos agent.

>In mesos, this information is set to /, so all my process are shown in the / 
>cgroup.
May you mind provide more details about this? As I know, all the mesos 
containers should under `Flags::cgroups_root` which default value is "mesos".

In additionally, do you ever try the `/metrics/snapshot` endpoints of Mesos 
Master and Mesos Agent, which provide some useful messages could used to 
monitor as well.

On Fri, Jul 29, 2016 at 10:07 PM, 
aurelien.de...@gmail.com 
> wrote:



Hello.


I'm trying to find a solution to monitor the real usage of my mesos tasks. I 
don't use docker at all, but I gave a look to cadvisor.


Unfortunatly, cadvisor uses ps and the device information to determine the 
cgroup ownership of a process. In mesos, this information is set to /, so all 
my process are shown in the / cgroup.


I can monitor the usage of the cgroups however, but I have no clue of which 
process are inside, and the cgroup name is not related to the taskid in mesos.


So, 3 questions:

- is anybody managed to use cadvisor (with ou without mesos) for cgroups?

- is there a "better" solution ?

- is there a way to "link" the cgroup id to the mesos task id?


Thanks for any clue.


And a good sysadmin appreciation day!


Aurélien.




--
Best Regards,
Haosdent Huang



--
Best Regards,
Haosdent Huang


never finish unresponsive tasks

2016-08-01 Thread Bryan Fok
Sometimes there are a few tasks (depend on how many tasks have submitted)
never complete, and no error return either. When I see this,  I will
increase the "mem" in the task.resources and it solve the problem. I dont
think modify it every time when we encounter this problem is a good
solution. Is there anyway that by default allocate unlimited memory to
tasks? Or its not even the root problem?

Thank you,
Bryan


Log the print out instead of print screen

2016-08-01 Thread Bryan Fok
Hi

How do I redirect the default print screen messages to log file (preferable
to the log file create by loggin module)?


I0801 10:53:40.08255139 group.cpp:427] Trying to create path '/mesos'
in ZooKeeper
I0801 10:53:40.08376830 detector.cpp:152] Detected a new leader:
(id='9')
I0801 10:53:40.08385932 group.cpp:700] Trying to get
'/mesos/json.info_09' in ZooKeeper
I0801 10:53:40.08439849 detector.cpp:479] A new leading master (UPID=
master@10.85.1.117:5050) is detected
I0801 10:53:40.08445528 sched.cpp:326] New master detected at
master@10.85.1.117:5050
I0801 10:53:40.08454728 sched.cpp:336] No credentials provided.
Attempting to register without authentication
I0801 10:53:40.08543336 sched.cpp:703] Framework registered with
a8180ad8-5fb6-4696-bacf-462552ffe4ef-0466
I0801 10:53:56.31438820 sched.cpp:1911] Asked to stop the driver
I0801 10:53:56.31469024 sched.cpp:1143] Stopping framework
'a8180ad8-5fb6-4696-bacf-462552ffe4ef-0466'
I0801 10:53:56.31511057 sched.cpp:1911] Asked to stop the driver


Thank you,
Bryan