Re: what's the difference between mesos and yarn?

2016-08-17 Thread John Omernik
Checkout Apache Myriad: You can run Yarn on Mesos :)



On Mon, Aug 15, 2016 at 10:35 PM, tommy xiao  wrote:

> yarn is based on BigData community to provide resource manager, and mesos
> is general datacenter focus resource manager. it's feature have some
> overlap and the semantics is different totally in my options.
>
> 2016-08-15 9:36 GMT+08:00 Yu Wei :
>
>> Hi guys,
>>
>>
>> What's the difference between yarn and mesos in practice?
>>
>>
>> If using yarn, does container still needed?
>>
>>
>> Thanks,
>>
>>
>> Jared, (韦煜)
>> Software developer
>> Interested in open source software, big data, Linux
>>
>
>
>
> --
> Deshi Xiao
> Twitter: xds2000
> E-mail: xiaods(AT)gmail.com
>


Dynamic Reservations and Roles

2016-01-21 Thread John Omernik
Hey all, I am trying to come up with a process that I can say "I am running
as "prod" principal, I connect to the reserve endpoint, and I make a
request for X CPU, Y Mem,, for the "dev role and usable by the "dev"
principal.

I feel like that I should be able to reserve that out, i.e. as a prod
principal in mesos, I should be able to say ok. I am setting aside x
resources for role dev, principal dev.

However, I get a error that says "Invalid RESERVE operation: The reserved
resource's principal 'devprin' does not match the principal 'prodprin' (I
am making the request and basic authing as prod.

What it comes down to, is I understand the message, that the principals
don't match, but I actually want it setup so that dev can't reserve
resources. Only prod can, and prod can reserve it FOR dev to use, and once
the resources are allocated to dev, they can then use them and control
them.  But they shouldn't be able to reserve them.  Does that makes sense?

Thoughts or questions?


Re: Access to Design Doc

2016-01-13 Thread John Omernik
I have access now, I will review. Thanks

On Tuesday, January 12, 2016, Greg Mann <g...@mesosphere.io> wrote:

> Hi John,
> I just shared the doc with you; let me know if you still have trouble
> accessing it.
>
> Cheers,
> Greg
>
> On Tue, Jan 12, 2016 at 11:42 AM, John Omernik <j...@omernik.com
> <javascript:_e(%7B%7D,'cvml','j...@omernik.com');>> wrote:
>
>> Is there a place to request google doc permissions on the design doc here:
>>
>> https://issues.apache.org/jira/browse/MESOS-2840
>>
>>
>>
>

-- 
Sent from my iThing


Access to Design Doc

2016-01-12 Thread John Omernik
Is there a place to request google doc permissions on the design doc here:

https://issues.apache.org/jira/browse/MESOS-2840


Re: mesos-elasticsearch vs Elasticsearch with Marathon

2015-12-21 Thread John Omernik
I'll just toss another way.. there is an elastic search on yarn framework
that actually works really nice with Apache Myriad (for running Yarn on
Mesos) I know it sounds a bit convoluted but I have set it up so I can
create ES clusters on demand, just give me a cluster name, and node size,
and it will spin up a cluster. Want to add more nodes? Great, each "scale"
operation adds another yarn application... so you could start with a 3 node
cluster, then add 2 nodes, add 2 nodes add 3 nodes.  You'd have a 10 node
cluster, but you could scale down by 3 or 2 nodes because each time you've
added nodes it's a separate yarn application that can be killed.

I was using MapR FS as my base filesystem, so that also helps with data
storage etc.



On Mon, Dec 21, 2015 at 3:11 AM, Eric LEMOINE  wrote:

> Hi
>
> I am new to Mesos and I have a naive question related to Elasticsearch,
> Mesos and Marathon.
>
> So there's the mesos-elasticsearch [*] project which provides a Mesos
> framework/scheduler for Elasticsearch. I guess it's also possible to run
> Elasticsearch with Marathon. What are the fundamental differences between
> the two approaches? When should one favor one approach over the other one?
> What are the reasons for using mesos-elasticsearch instead of just running
> Elasticsearch on top of Marathon?
>
> Thanks for any insight.
>
> [*] 
>


Re: Dynamic Reservations and --roles

2015-12-13 Thread John Omernik
I filed https://issues.apache.org/jira/browse/MESOS-4143 to address the
reserve unreserve endpoints reserving roles for non-explicit rules.

I also opened https://issues.apache.org/jira/browse/MESOS-4144 to add the
ability to add roles dynamically based on an API request to properly
authorized principals.

Thanks!

John

On Sat, Dec 12, 2015 at 1:40 AM, Michael Park  wrote:

> This seems like a bug to me. Please file a JIRA ticket and I'll sync with
> Neil regarding what the behavior should be with dynamic/implicit roles in
> mind.
>
> Thanks!
>
> MPark.
>
>
> On Fri, Dec 11, 2015 at 1:32 PM Vinod Kone  wrote:
>
>>
>> 1. If the role specified in the reserve/unreserve operation doesn't exist
>>> in --roles on the master, it should reject the reservation on the /reserve
>>> endpoint.  Why allow the reservation of roles if you can't use them if they
>>> are specified
>>>
>>> 2. If the role doesn't exist in --roles, and a reservation comes
>>> through, it should be added dynamically. This seems to make more sense to
>>> me. I.e. adding roles doesn't require a master restart.
>>>
>>
>> There is currently work in progress to add support for implicit roles
>> (roles not specified via --roles). But, I agree that until we add that
>> feature the /reserve endpoint should reject reservations for unknown roles.
>> Not sure what the rationale was to allow this. @mpark?
>>
>>


Dynamic Reservations - API to See them

2015-12-11 Thread John Omernik
Is there an API endpoint that allows an operator to see the current dynamic
reservations?  I keep track of what's there etc.

John


Dynamic Reservations and --roles

2015-12-11 Thread John Omernik
I am following http://mesos.apache.org/documentation/latest/reservation/ to
learn how to do dynamic reservations.  I don't want to statically assign
roles, therefore, I started my slaves only with the (*) resources.

I used the HTTP endpoints to reserve some resources.  And it returned
successful. In addition, I've checked the /slaves endpoint and indeed the
resources are allocated.

Now when I try to uses those roles, I get an error:

sched.cpp:1024] Got error 'Role 'dev' is not present in the master's
--roles'

Ok, so looking I indeed likely need to specify --roles at startup of
master.  This got me thinking, is this an oversight or by design?  If it's
an oversight, then I will file a JIRA, if by designed, I'd like to
understand more.

What I mean is this: The whole point of /reserver /unreserve is not have to
specify the resources at slave startup.  That's great.  Then when I
reserver or unreserve, from how I understand it one of two things should
happen (This is where I need feedback)

1. If the role specified in the reserve/unreserve operation doesn't exist
in --roles on the master, it should reject the reservation on the /reserve
endpoint.  Why allow the reservation of roles if you can't use them if they
are specified

2. If the role doesn't exist in --roles, and a reservation comes through,
it should be added dynamically. This seems to make more sense to me. I.e.
adding roles doesn't require a master restart.

If there is a 3rd option here, that I am not seeing, please let me know,
otherwise I will file a JIRA to this...

John


Mesos ACL User

2015-12-08 Thread John Omernik
In crafting my ACLs, I found that I would like to have a situation where
groups were used instead of just user... i.e. if I have a certain frame,
perhaps a dev instance of Marathon, I want folks in the dev group to all be
able to to run frameworks as themselves.  Right now,  have a principal that
can run in any role and with any user, prn_prodcontrol. That works for me.
Then I have a principal that is my devcontrol.  So I register dev Marathon
with that, and now anyone who has my credentials for the dev marathon, can
submit marathon jobs, which is cool, however, they can only do it as
unixdevuser, which is my unix user on every box... that's cool too. Also,
the marathondev framework can only operate in the dev role.


{
 "register_frameworks": [
  { "principals": { "values": ["prn_prodcontrol"] }, "roles": { "type":
"ANY"}},
  { "principals": { "values": ["prn_devcontrol"] }, "roles": {"values":
["dev"]}}
  ]
 "run_tasks": [
  { "principals": { "values": ["prn_prodcontrol"] }, "users": { "type":
"ANY"}},
  { "principals": { "values": ["prn_devcontrol"] }, "users": {"values":
["unixdevuser"]}}
]
}


What would be ideal is if I have a group marathondevgrp (unix group on all
nodes) and then I register the marathondev framework with principle
prn_devcontrol, having an ACL that stated...


{
 "register_frameworks": [
  { "principals": { "values": ["prn_prodcontrol"] }, "roles": { "type":
"ANY"}},
  { "principals": { "values": ["prn_devcontrol"] }, "roles": {"values":
["dev"]}}
  ]
 "run_tasks": [
  { "principals": { "values": ["prn_prodcontrol"] }, "users": { "type":
"ANY"}},
  { "principals": { "values": ["prn_devcontrol"] }, "users": {"values":
["marathondevgrp"]}}
]
}

That it would allow a task to run in the devmarathon as any unix user in
that group. This would allow me to have dev users run frameworks as
themselves (for data access control on my shared filesystem) and still have
the freedom to submit to marathon (dev).


So does ACLs support groups? Is this something that would be difficult to
add?  Thoughts about other approach to achieve similar results?

Thanks!

John


Weird Error on One Node

2015-12-04 Thread John Omernik
I am trying to start a task on a node, and it keeps failing with no logs in
the sandbox (it's blank)


In the slave logs, I get the error below.  I looked into MESOS-3352 and
since I am running RHEL 7 with Version 208 (that "should" be the patched
version based on what I've read). I am thinking this should be working. (I
am running Mesos 0.25.0) Now, that warning and my issue may not be related
as that warning may pop up anytime it detects systemd < 218.

That said, I tried stopping the slave, clearning all files for the slave
and restarting, and can't seem to get it to work (clearning all files is
erasing logs and tmp folders).


There are not that many available logs, so any help would be appreciated.

John




W1204 19:40:47.767102 21241 systemd.cpp:136] Required functionality
`Delegate` was introduced in Version `218`. Your system may not function
properly; however since some distributions have patched systemd packages,
your system may still be functional. This is why we keep running. See
MESOS-3352 for more information

W1204 19:43:45.861021 21254 slave.cpp:2141] Ignoring updating pid for
framework 9fb66f25-2a80-44be-857b-a51de754618c- because it does not
exist

E1204 19:47:07.500804 21265 slave.cpp:3342] Container
'1500d513-fb13-484e-a9a9-3c9c70a66be6' for executor
'zetadrill.ccedde4d-9abf-11e5-b92d-024209d8b836' of framework
'9fb66f25-2a80-44be-857b-a51de754618c-' failed to start: Failed to
prepare isolator: Failed to create directory
'/sys/fs/cgroup/memory/mesos/1500d513-fb13-484e-a9a9-3c9c70a66be6': No such
file or directory


Re: Weird Error on One Node

2015-12-04 Thread John Omernik
I tried restarting the slave again and this time it worked... *shrug*

That said, anything I can do next time this happens?  Do I need to do
something on RHEL7 and Cgroups to make the info in MESOS-3352 work?

Thanks!

John

On Fri, Dec 4, 2015 at 1:54 PM, John Omernik <j...@omernik.com> wrote:

> I am trying to start a task on a node, and it keeps failing with no logs
> in the sandbox (it's blank)
>
>
> In the slave logs, I get the error below.  I looked into MESOS-3352 and
> since I am running RHEL 7 with Version 208 (that "should" be the patched
> version based on what I've read). I am thinking this should be working. (I
> am running Mesos 0.25.0) Now, that warning and my issue may not be related
> as that warning may pop up anytime it detects systemd < 218.
>
> That said, I tried stopping the slave, clearning all files for the slave
> and restarting, and can't seem to get it to work (clearning all files is
> erasing logs and tmp folders).
>
>
> There are not that many available logs, so any help would be appreciated.
>
> John
>
>
>
>
> W1204 19:40:47.767102 21241 systemd.cpp:136] Required functionality
> `Delegate` was introduced in Version `218`. Your system may not function
> properly; however since some distributions have patched systemd packages,
> your system may still be functional. This is why we keep running. See
> MESOS-3352 for more information
>
> W1204 19:43:45.861021 21254 slave.cpp:2141] Ignoring updating pid for
> framework 9fb66f25-2a80-44be-857b-a51de754618c- because it does not
> exist
>
> E1204 19:47:07.500804 21265 slave.cpp:3342] Container
> '1500d513-fb13-484e-a9a9-3c9c70a66be6' for executor
> 'zetadrill.ccedde4d-9abf-11e5-b92d-024209d8b836' of framework
> '9fb66f25-2a80-44be-857b-a51de754618c-' failed to start: Failed to
> prepare isolator: Failed to create directory
> '/sys/fs/cgroup/memory/mesos/1500d513-fb13-484e-a9a9-3c9c70a66be6': No such
> file or directory
>


Re: Web UI Memory Usage in Firefox

2015-12-03 Thread John Omernik
Here is the Mesos Jira

https://issues.apache.org/jira/browse/MESOS-4060



On Thu, Dec 3, 2015 at 3:37 AM, Orlando Hohmeier <orla...@mesosphere.io>
wrote:

> Hi John,
>
> many thanks for filing the GH issue! Would be really great if you could
> also report the issue to the Mesos JIRA. Thank you in advance.
>
> Best
> Orlando
>
> On Wednesday, December 2, 2015 at 8:26:04 PM UTC+1, John Omernik wrote:
>>
>> Marathon Issue filed.
>>
>> https://github.com/mesosphere/marathon/issues/2755
>>
>> At this point, should I look at a Mesos JIRA too?
>>
>>
>>
>> On Wed, Dec 2, 2015 at 12:53 PM, Orlando Hohmeier <orl...@mesosphere.io>
>> wrote:
>>
>>> Hi John,
>>>
>>> thanks a lot for reporting and investigating this! We will have a look
>>> into it. I would be most pleased if you could create a GitHub Issue for
>>> this including all the details.
>>>
>>> https://github.com/mesosphere/marathon
>>>
>>> BTW: Have you ever experienced the same in a different browser (e.g.
>>> Chrome)?
>>>
>>> Thanks
>>> Orlando
>>>
>>> On Wednesday, December 2, 2015 at 6:07:40 PM UTC+1, John Omernik wrote:
>>>>
>>>> I am cross posting this in Marathon and Mesos lists because both UIs
>>>> are having this issue, and I figured I'd save time in posting two separate
>>>> messages.
>>>>
>>>> Basically, in using Firefox, I noticed that over time, my firefox would
>>>> get to become unusable when I had Marathon and Mesos WebUIs up and
>>>> running.  At first, I thought it was a function of my home (Mac) computer
>>>> and firefox. But when I started a PoC for work, and my Windows install of
>>>> firefox had the same issues, I started doing more investigation.
>>>>
>>>> First of all, both at home and at work, my Firefox is only dedicated to
>>>> "cluster" related tasks. Thus, I don't have other tabs that are not Cluster
>>>> UIs.
>>>>
>>>> My typical setup is to have MapR UI, Mesos UI, Marathon UI, Chronos UI,
>>>> Myriad UI, and Yarn UI all up and running.
>>>>
>>>> After about 3-4 hours, my browser would get really slow, and
>>>> non-responsive. I'd kill all and start again.  Rinse repeat.
>>>>
>>>> So I did some analysis, and basically found a plugin for Firefox that
>>>> shows on each tab the amount of memory being used. I found that both
>>>> Marathon and Mesos UI were the culprits, and things got really bad after
>>>> just 2-3 hours.  With those setup, on Windows, I have the following memory
>>>> usage:
>>>>
>>>> MapR UI: 7.7mb
>>>> Myriad UI: 9.8mb
>>>> Yarn: 2.7 mb
>>>> Chronos 10.2 mb
>>>> Marathon: 163 mb
>>>> Mesos UI: 463 mb
>>>>
>>>> Both Marathon and Mesos continually climb, slowly, up some, down a few,
>>>> up some more, but obviously generally trending up.  I guess I wanted to
>>>> toss it out here to see if it's something in my settings, or something that
>>>> others are seeing.  It's a problem for me from a usability standpoint, and
>>>> I am guessing that it's one of those things that while not a priority for a
>>>> project, should be looked at.
>>>>
>>>> John
>>>>
>>>>
>>>>
>>>>
>>>>
>> --
> You received this message because you are subscribed to the Google Groups
> "marathon-framework" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to marathon-framework+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>


Re: Native Lib vs. Rest API

2015-12-03 Thread John Omernik
Thank you!

On Thu, Dec 3, 2015 at 1:40 PM, Vinod Kone <vinodk...@apache.org> wrote:

> Yes, that's the plan.
>
> Here are the related epics tracking the work: MESOS-2288
> <https://issues.apache.org/jira/browse/MESOS-2288> and MESOS-3302
> <https://issues.apache.org/jira/browse/MESOS-3302>
>
> The user doc for the scheduler API is
> https://github.com/apache/mesos/blob/master/docs/scheduler-http-api.md
>
> On Thu, Dec 3, 2015 at 11:34 AM, John Omernik <j...@omernik.com> wrote:
>
>> Somewhere in the back of my brain I thought I read something about a
>> migration away from using the mesos native lib and going to a more generic
>> API approach to support better portability and less reliance on the lib.
>>
>> I read about this before I understood things well (or as well I do now I
>> should say). Am I misremembering reading about this? I can't find any
>> stories/documentation on this. If I am correct on this, can someone point
>> me to a JIRA or a discussion on how this is supposed to work? I.e. is the
>> goal to migrate all frameworks off the native library to deprecate it etc?
>>
>> Thanks, sorry for the weird questions.
>>
>> John
>>
>
>


Native Lib vs. Rest API

2015-12-03 Thread John Omernik
Somewhere in the back of my brain I thought I read something about a
migration away from using the mesos native lib and going to a more generic
API approach to support better portability and less reliance on the lib.

I read about this before I understood things well (or as well I do now I
should say). Am I misremembering reading about this? I can't find any
stories/documentation on this. If I am correct on this, can someone point
me to a JIRA or a discussion on how this is supposed to work? I.e. is the
goal to migrate all frameworks off the native library to deprecate it etc?

Thanks, sorry for the weird questions.

John


Roles and Oversubscription

2015-12-02 Thread John Omernik
How do roles and oversubscription work together?  When you specify
resources that can be oversubscribed, do you say what role they work in? Is
"revocable" a role in and of itself?

I am trying to work through these various items in my path to learning more
about oversubscription.

Thanks!

John


Web UI Memory Usage in Firefox

2015-12-02 Thread John Omernik
I am cross posting this in Marathon and Mesos lists because both UIs are
having this issue, and I figured I'd save time in posting two separate
messages.

Basically, in using Firefox, I noticed that over time, my firefox would get
to become unusable when I had Marathon and Mesos WebUIs up and running.  At
first, I thought it was a function of my home (Mac) computer and firefox.
But when I started a PoC for work, and my Windows install of firefox had
the same issues, I started doing more investigation.

First of all, both at home and at work, my Firefox is only dedicated to
"cluster" related tasks. Thus, I don't have other tabs that are not Cluster
UIs.

My typical setup is to have MapR UI, Mesos UI, Marathon UI, Chronos UI,
Myriad UI, and Yarn UI all up and running.

After about 3-4 hours, my browser would get really slow, and
non-responsive. I'd kill all and start again.  Rinse repeat.

So I did some analysis, and basically found a plugin for Firefox that shows
on each tab the amount of memory being used. I found that both Marathon and
Mesos UI were the culprits, and things got really bad after just 2-3
hours.  With those setup, on Windows, I have the following memory usage:

MapR UI: 7.7mb
Myriad UI: 9.8mb
Yarn: 2.7 mb
Chronos 10.2 mb
Marathon: 163 mb
Mesos UI: 463 mb

Both Marathon and Mesos continually climb, slowly, up some, down a few, up
some more, but obviously generally trending up.  I guess I wanted to toss
it out here to see if it's something in my settings, or something that
others are seeing.  It's a problem for me from a usability standpoint, and
I am guessing that it's one of those things that while not a priority for a
project, should be looked at.

John


Re: Web UI Memory Usage in Firefox

2015-12-02 Thread John Omernik
I tried that and both Marathon and Mesos did not relinquish any of their
memory.

On Wed, Dec 2, 2015 at 11:51 AM, Joseph Wu <jos...@mesosphere.io> wrote:

> Hi John,
>
> I wonder if this is just an issue with how Firefox does garbage collection.
>
> Can you try navigating to about:memory and clicking the "GC" button?  The
> web UI's definitely should not need that much memory.
>
> ~Joseph
>
> On Wed, Dec 2, 2015 at 9:07 AM, John Omernik <j...@omernik.com> wrote:
>
>> I am cross posting this in Marathon and Mesos lists because both UIs are
>> having this issue, and I figured I'd save time in posting two separate
>> messages.
>>
>> Basically, in using Firefox, I noticed that over time, my firefox would
>> get to become unusable when I had Marathon and Mesos WebUIs up and
>> running.  At first, I thought it was a function of my home (Mac) computer
>> and firefox. But when I started a PoC for work, and my Windows install of
>> firefox had the same issues, I started doing more investigation.
>>
>> First of all, both at home and at work, my Firefox is only dedicated to
>> "cluster" related tasks. Thus, I don't have other tabs that are not Cluster
>> UIs.
>>
>> My typical setup is to have MapR UI, Mesos UI, Marathon UI, Chronos UI,
>> Myriad UI, and Yarn UI all up and running.
>>
>> After about 3-4 hours, my browser would get really slow, and
>> non-responsive. I'd kill all and start again.  Rinse repeat.
>>
>> So I did some analysis, and basically found a plugin for Firefox that
>> shows on each tab the amount of memory being used. I found that both
>> Marathon and Mesos UI were the culprits, and things got really bad after
>> just 2-3 hours.  With those setup, on Windows, I have the following memory
>> usage:
>>
>> MapR UI: 7.7mb
>> Myriad UI: 9.8mb
>> Yarn: 2.7 mb
>> Chronos 10.2 mb
>> Marathon: 163 mb
>> Mesos UI: 463 mb
>>
>> Both Marathon and Mesos continually climb, slowly, up some, down a few,
>> up some more, but obviously generally trending up.  I guess I wanted to
>> toss it out here to see if it's something in my settings, or something that
>> others are seeing.  It's a problem for me from a usability standpoint, and
>> I am guessing that it's one of those things that while not a priority for a
>> project, should be looked at.
>>
>> John
>>
>>
>>
>>
>>
>


Re: Web UI Memory Usage in Firefox

2015-12-02 Thread John Omernik
Marathon Issue filed.

https://github.com/mesosphere/marathon/issues/2755

At this point, should I look at a Mesos JIRA too?



On Wed, Dec 2, 2015 at 12:53 PM, Orlando Hohmeier <orla...@mesosphere.io>
wrote:

> Hi John,
>
> thanks a lot for reporting and investigating this! We will have a look
> into it. I would be most pleased if you could create a GitHub Issue for
> this including all the details.
>
> https://github.com/mesosphere/marathon
>
> BTW: Have you ever experienced the same in a different browser (e.g.
> Chrome)?
>
> Thanks
> Orlando
>
> On Wednesday, December 2, 2015 at 6:07:40 PM UTC+1, John Omernik wrote:
>>
>> I am cross posting this in Marathon and Mesos lists because both UIs are
>> having this issue, and I figured I'd save time in posting two separate
>> messages.
>>
>> Basically, in using Firefox, I noticed that over time, my firefox would
>> get to become unusable when I had Marathon and Mesos WebUIs up and
>> running.  At first, I thought it was a function of my home (Mac) computer
>> and firefox. But when I started a PoC for work, and my Windows install of
>> firefox had the same issues, I started doing more investigation.
>>
>> First of all, both at home and at work, my Firefox is only dedicated to
>> "cluster" related tasks. Thus, I don't have other tabs that are not Cluster
>> UIs.
>>
>> My typical setup is to have MapR UI, Mesos UI, Marathon UI, Chronos UI,
>> Myriad UI, and Yarn UI all up and running.
>>
>> After about 3-4 hours, my browser would get really slow, and
>> non-responsive. I'd kill all and start again.  Rinse repeat.
>>
>> So I did some analysis, and basically found a plugin for Firefox that
>> shows on each tab the amount of memory being used. I found that both
>> Marathon and Mesos UI were the culprits, and things got really bad after
>> just 2-3 hours.  With those setup, on Windows, I have the following memory
>> usage:
>>
>> MapR UI: 7.7mb
>> Myriad UI: 9.8mb
>> Yarn: 2.7 mb
>> Chronos 10.2 mb
>> Marathon: 163 mb
>> Mesos UI: 463 mb
>>
>> Both Marathon and Mesos continually climb, slowly, up some, down a few,
>> up some more, but obviously generally trending up.  I guess I wanted to
>> toss it out here to see if it's something in my settings, or something that
>> others are seeing.  It's a problem for me from a usability standpoint, and
>> I am guessing that it's one of those things that while not a priority for a
>> project, should be looked at.
>>
>> John
>>
>>
>>
>>
>> --
> You received this message because you are subscribed to the Google Groups
> "marathon-framework" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to marathon-framework+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>


Re: Docker Multi-Host Networking and Mesos Isolation Strategies

2015-11-04 Thread John Omernik
I created a basic stub at https://issues.apache.org/jira/browse/MESOS-3828

Thanks!

John


On Wed, Nov 4, 2015 at 8:32 AM, haosdent <haosd...@gmail.com> wrote:

> This new docker feature looks excited! To integrated with this, my quick
> idea is we could implement it as a pluggable module and let user choose
> which network isolator should used. But this is just my opinion. Could you
> create a story for this in https://issues.apache.org/jira/browse/MESOS so
> we could track this better.
>
> On Wed, Nov 4, 2015 at 9:29 PM, John Omernik <j...@omernik.com> wrote:
>
>> Hey all,
>>
>> I see Docker 1.9 has a neat multihost networking feature.
>>
>> http://blog.docker.com/2015/11/docker-multi-host-networking-ga/
>>
>> I am curious how this may integrate (if at all) with the Network
>> Isolation/IP per container strategy Mesos is looking at.  Is there overlap
>> here? Are there integration points? Are we looking at divergent or
>> convergent network strategies here?
>>
>> I would imagine while Docker multi-host is docker specific, Mesos is
>> trying to solve the any container on multiple hosts problem, and thus the
>> scope may be larger, but could there be opportunity for integration?  The
>> reason I ask, is as I am rolling out Mesos PoCs the dev team is excited
>> about these new features and I want to understand how these may or may not
>> converge in the future.
>>
>> John
>>
>>
>>
>
>
> --
> Best Regards,
> Haosdent Huang
>


Re: Marathon 0.11.1 - Mesos 0.25 - Mesos-DNS 0.4.0

2015-11-03 Thread John Omernik
I used

"IPSources": ["host", "netinfo", "mesos"]


With the thought that I would preference for the host at this point. When
network isolation works in Marathon, then I will likely switch to netinfo.

On Mon, Nov 2, 2015 at 7:28 PM, James DeFelice <james.defel...@gmail.com>
wrote:

> What settings worked for you? We did aim for least surprise. Sounds like
> we missed a bit. We're happy to accept suggestions for improvement via gh
> issues filed against the mesos-dns repo.
> On Oct 29, 2015 7:39 AM, "John Omernik" <j...@omernik.com> wrote:
>
>> That is good to know, however, I would challenge the group on something
>> like this not being bug based on the documentation.  When a change in
>> mesos-dns, and what fields it looks at is not affected by the mesos-dns
>> component, but instead other components in a way that could have serious
>> negative impacts on folks who are running this, there should be some
>> fanfare there about changes.  Also, I would advocate that in mesos-dns the
>> default should have been the same as previous releases (which I would
>> assume was host ip) as default, then allow people who are aware of the
>> underpinnings to make the change.
>>
>> On Wed, Oct 28, 2015 at 3:02 PM, Grzegorz Graczyk <gregor...@gmail.com>
>> wrote:
>>
>>> It's not a bug, it's a feature -
>>> http://mesosphere.github.io/mesos-dns/docs/configuration-parameters.html 
>>> look
>>> at IPSources config
>>>
>>> śr., 28.10.2015 o 15:59 użytkownik John Omernik <j...@omernik.com>
>>> napisał:
>>>
>>>> If I rolled back mesos-dns to v0.2.0 (on the releases page) then it
>>>> pulls the right IP address..   (Mesos-dns version is the easiest of the
>>>> three to change)
>>>>
>>>> John
>>>>
>>>> On Wed, Oct 28, 2015 at 9:52 AM, John Omernik <j...@omernik.com> wrote:
>>>>
>>>>> So, the issues that are listed appear to be resolved with marathon
>>>>> 0.11.1, and the mesos-dns issue is not listed at all.
>>>>>
>>>>> Note, I tried mesos-dns 0.3.0 and that has the same problem as 0.4.0.
>>>>>
>>>>> On Wed, Oct 28, 2015 at 9:46 AM, John Omernik <j...@omernik.com>
>>>>> wrote:
>>>>>
>>>>>> I will check out those issues and report back.
>>>>>>
>>>>>> On Wed, Oct 28, 2015 at 9:42 AM, craig w <codecr...@gmail.com> wrote:
>>>>>>
>>>>>>> I've had no issue with the following combination:
>>>>>>>
>>>>>>> MesosDNS 0.4.0
>>>>>>> Marathon 0.11.0
>>>>>>> Mesos 0.24.1
>>>>>>>
>>>>>>> I've been waiting to upgrade to Mesos 0.25.0 because of issues
>>>>>>> mentioned in the mesos mailing list regarding Marathon 0.11.x and Mesos
>>>>>>> 0.25.0
>>>>>>>
>>>>>>> On Wed, Oct 28, 2015 at 10:38 AM, John Omernik <j...@omernik.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hey all -
>>>>>>>>
>>>>>>>> I am cross posting this because it's a number of moving parts that
>>>>>>>> could be at issue here (Mesos, Mesos-dns, and/or Marathon).
>>>>>>>>
>>>>>>>> Basically: At the version combination in Subject, the IP that is
>>>>>>>> registered in mesos-dns for Docker containers running in Marathon is 
>>>>>>>> the
>>>>>>>> internal (container) IP address of the docker (in bridged mode) not the
>>>>>>>> nodes. This obviously causes issues.  Note this doesn't happen when the
>>>>>>>> Marathon application is non-Docker.
>>>>>>>>
>>>>>>>> I was running Mesos-dns 0.4.0 on a cluster running Mesos 0.24.0 and
>>>>>>>> Marathon 0.10.0 and I upgraded to Mesos 0.25.0 and Marathon 0.11.1 and
>>>>>>>> noticed this behavior happening.
>>>>>>>>
>>>>>>>> I thought that was odd because I have another cluster that was
>>>>>>>> running Mesos 0.25.0 and Marathon 0.11.1 and it wasn't happening, 
>>>>>>>> until I
>>>>>>>> realized that I hadn't upgraded Mesos-dns lately, I upgraded to 
>>>>>>>> Mesos-dns
>>>>>>>> 0.4.0 and the problem started occurring.
>>>>>>>>
>>>>>>>> Is there a setting that I need to use the external IP of the
>>>>>>>> container? Is this issue known? Is there a workaround? This is pretty 
>>>>>>>> major
>>>>>>>> for Docker running on Marathon and using Mesos-dns for service 
>>>>>>>> discovery.
>>>>>>>>
>>>>>>>> John Omernik
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> https://github.com/mindscratch
>>>>>>> https://www.google.com/+CraigWickesser
>>>>>>> https://twitter.com/mind_scratch
>>>>>>> https://twitter.com/craig_links
>>>>>>>
>>>>>>> --
>>>>>>> You received this message because you are subscribed to the Google
>>>>>>> Groups "marathon-framework" group.
>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>> send an email to marathon-framework+unsubscr...@googlegroups.com.
>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "marathon-framework" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to marathon-framework+unsubscr...@googlegroups.com.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>


Re: Marathon 0.11.1 - Mesos 0.25 - Mesos-DNS 0.4.0

2015-11-03 Thread John Omernik
No, it wasn't specified at all. I was using an old config.json, thus I had
to add the setting with the host listed first for it to work. Not sure why
docker ended up being first in line there.

On Tue, Nov 3, 2015 at 2:02 PM, James DeFelice <james.defel...@gmail.com>
wrote:

> The default value of IPSources doesn't have `docker` listed. As long as
> that's not in the list you shouldn't have had a problem, unless some bad
> actor was writing the wrong labels into the task. I don't see support for
> NetworkInfos (`netinfos`) in marathon yet. Which means that `host` should
> have been the fallback.
>
> Did you, by chance, have `docker` listed in IPSources at any point?
>
>
> On Tue, Nov 3, 2015 at 12:04 PM, John Omernik <j...@omernik.com> wrote:
>
>> I used
>>
>> "IPSources": ["host", "netinfo", "mesos"]
>>
>>
>> With the thought that I would preference for the host at this point. When
>> network isolation works in Marathon, then I will likely switch to netinfo.
>>
>> On Mon, Nov 2, 2015 at 7:28 PM, James DeFelice <james.defel...@gmail.com>
>> wrote:
>>
>>> What settings worked for you? We did aim for least surprise. Sounds like
>>> we missed a bit. We're happy to accept suggestions for improvement via gh
>>> issues filed against the mesos-dns repo.
>>> On Oct 29, 2015 7:39 AM, "John Omernik" <j...@omernik.com> wrote:
>>>
>>>> That is good to know, however, I would challenge the group on something
>>>> like this not being bug based on the documentation.  When a change in
>>>> mesos-dns, and what fields it looks at is not affected by the mesos-dns
>>>> component, but instead other components in a way that could have serious
>>>> negative impacts on folks who are running this, there should be some
>>>> fanfare there about changes.  Also, I would advocate that in mesos-dns the
>>>> default should have been the same as previous releases (which I would
>>>> assume was host ip) as default, then allow people who are aware of the
>>>> underpinnings to make the change.
>>>>
>>>> On Wed, Oct 28, 2015 at 3:02 PM, Grzegorz Graczyk <gregor...@gmail.com>
>>>> wrote:
>>>>
>>>>> It's not a bug, it's a feature -
>>>>> http://mesosphere.github.io/mesos-dns/docs/configuration-parameters.html 
>>>>> look
>>>>> at IPSources config
>>>>>
>>>>> śr., 28.10.2015 o 15:59 użytkownik John Omernik <j...@omernik.com>
>>>>> napisał:
>>>>>
>>>>>> If I rolled back mesos-dns to v0.2.0 (on the releases page) then it
>>>>>> pulls the right IP address..   (Mesos-dns version is the easiest of the
>>>>>> three to change)
>>>>>>
>>>>>> John
>>>>>>
>>>>>> On Wed, Oct 28, 2015 at 9:52 AM, John Omernik <j...@omernik.com>
>>>>>> wrote:
>>>>>>
>>>>>>> So, the issues that are listed appear to be resolved with marathon
>>>>>>> 0.11.1, and the mesos-dns issue is not listed at all.
>>>>>>>
>>>>>>> Note, I tried mesos-dns 0.3.0 and that has the same problem as
>>>>>>> 0.4.0.
>>>>>>>
>>>>>>> On Wed, Oct 28, 2015 at 9:46 AM, John Omernik <j...@omernik.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I will check out those issues and report back.
>>>>>>>>
>>>>>>>> On Wed, Oct 28, 2015 at 9:42 AM, craig w <codecr...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I've had no issue with the following combination:
>>>>>>>>>
>>>>>>>>> MesosDNS 0.4.0
>>>>>>>>> Marathon 0.11.0
>>>>>>>>> Mesos 0.24.1
>>>>>>>>>
>>>>>>>>> I've been waiting to upgrade to Mesos 0.25.0 because of issues
>>>>>>>>> mentioned in the mesos mailing list regarding Marathon 0.11.x and 
>>>>>>>>> Mesos
>>>>>>>>> 0.25.0
>>>>>>>>>
>>>>>>>>> On Wed, Oct 28, 2015 at 10:38 AM, John Omernik <j...@omernik.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>

Re: Marathon 0.11.1 - Mesos 0.25 - Mesos-DNS 0.4.0

2015-10-29 Thread John Omernik
That is good to know, however, I would challenge the group on something
like this not being bug based on the documentation.  When a change in
mesos-dns, and what fields it looks at is not affected by the mesos-dns
component, but instead other components in a way that could have serious
negative impacts on folks who are running this, there should be some
fanfare there about changes.  Also, I would advocate that in mesos-dns the
default should have been the same as previous releases (which I would
assume was host ip) as default, then allow people who are aware of the
underpinnings to make the change.

On Wed, Oct 28, 2015 at 3:02 PM, Grzegorz Graczyk <gregor...@gmail.com>
wrote:

> It's not a bug, it's a feature -
> http://mesosphere.github.io/mesos-dns/docs/configuration-parameters.html look
> at IPSources config
>
> śr., 28.10.2015 o 15:59 użytkownik John Omernik <j...@omernik.com>
> napisał:
>
>> If I rolled back mesos-dns to v0.2.0 (on the releases page) then it pulls
>> the right IP address..   (Mesos-dns version is the easiest of the three to
>> change)
>>
>> John
>>
>> On Wed, Oct 28, 2015 at 9:52 AM, John Omernik <j...@omernik.com> wrote:
>>
>>> So, the issues that are listed appear to be resolved with marathon
>>> 0.11.1, and the mesos-dns issue is not listed at all.
>>>
>>> Note, I tried mesos-dns 0.3.0 and that has the same problem as 0.4.0.
>>>
>>> On Wed, Oct 28, 2015 at 9:46 AM, John Omernik <j...@omernik.com> wrote:
>>>
>>>> I will check out those issues and report back.
>>>>
>>>> On Wed, Oct 28, 2015 at 9:42 AM, craig w <codecr...@gmail.com> wrote:
>>>>
>>>>> I've had no issue with the following combination:
>>>>>
>>>>> MesosDNS 0.4.0
>>>>> Marathon 0.11.0
>>>>> Mesos 0.24.1
>>>>>
>>>>> I've been waiting to upgrade to Mesos 0.25.0 because of issues
>>>>> mentioned in the mesos mailing list regarding Marathon 0.11.x and Mesos
>>>>> 0.25.0
>>>>>
>>>>> On Wed, Oct 28, 2015 at 10:38 AM, John Omernik <j...@omernik.com>
>>>>> wrote:
>>>>>
>>>>>> Hey all -
>>>>>>
>>>>>> I am cross posting this because it's a number of moving parts that
>>>>>> could be at issue here (Mesos, Mesos-dns, and/or Marathon).
>>>>>>
>>>>>> Basically: At the version combination in Subject, the IP that is
>>>>>> registered in mesos-dns for Docker containers running in Marathon is the
>>>>>> internal (container) IP address of the docker (in bridged mode) not the
>>>>>> nodes. This obviously causes issues.  Note this doesn't happen when the
>>>>>> Marathon application is non-Docker.
>>>>>>
>>>>>> I was running Mesos-dns 0.4.0 on a cluster running Mesos 0.24.0 and
>>>>>> Marathon 0.10.0 and I upgraded to Mesos 0.25.0 and Marathon 0.11.1 and
>>>>>> noticed this behavior happening.
>>>>>>
>>>>>> I thought that was odd because I have another cluster that was
>>>>>> running Mesos 0.25.0 and Marathon 0.11.1 and it wasn't happening, until I
>>>>>> realized that I hadn't upgraded Mesos-dns lately, I upgraded to Mesos-dns
>>>>>> 0.4.0 and the problem started occurring.
>>>>>>
>>>>>> Is there a setting that I need to use the external IP of the
>>>>>> container? Is this issue known? Is there a workaround? This is pretty 
>>>>>> major
>>>>>> for Docker running on Marathon and using Mesos-dns for service discovery.
>>>>>>
>>>>>> John Omernik
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> https://github.com/mindscratch
>>>>> https://www.google.com/+CraigWickesser
>>>>> https://twitter.com/mind_scratch
>>>>> https://twitter.com/craig_links
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "marathon-framework" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to marathon-framework+unsubscr...@googlegroups.com.
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>>
>>>
>> --
> You received this message because you are subscribed to the Google Groups
> "marathon-framework" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to marathon-framework+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>


Marathon 0.11.1 - Mesos 0.25 - Mesos-DNS 0.4.0

2015-10-28 Thread John Omernik
Hey all -

I am cross posting this because it's a number of moving parts that could be
at issue here (Mesos, Mesos-dns, and/or Marathon).

Basically: At the version combination in Subject, the IP that is registered
in mesos-dns for Docker containers running in Marathon is the internal
(container) IP address of the docker (in bridged mode) not the nodes. This
obviously causes issues.  Note this doesn't happen when the Marathon
application is non-Docker.

I was running Mesos-dns 0.4.0 on a cluster running Mesos 0.24.0 and
Marathon 0.10.0 and I upgraded to Mesos 0.25.0 and Marathon 0.11.1 and
noticed this behavior happening.

I thought that was odd because I have another cluster that was running
Mesos 0.25.0 and Marathon 0.11.1 and it wasn't happening, until I realized
that I hadn't upgraded Mesos-dns lately, I upgraded to Mesos-dns 0.4.0 and
the problem started occurring.

Is there a setting that I need to use the external IP of the container? Is
this issue known? Is there a workaround? This is pretty major for Docker
running on Marathon and using Mesos-dns for service discovery.

John Omernik


Re: Marathon 0.11.1 - Mesos 0.25 - Mesos-DNS 0.4.0

2015-10-28 Thread John Omernik
If I rolled back mesos-dns to v0.2.0 (on the releases page) then it pulls
the right IP address..   (Mesos-dns version is the easiest of the three to
change)

John

On Wed, Oct 28, 2015 at 9:52 AM, John Omernik <j...@omernik.com> wrote:

> So, the issues that are listed appear to be resolved with marathon 0.11.1,
> and the mesos-dns issue is not listed at all.
>
> Note, I tried mesos-dns 0.3.0 and that has the same problem as 0.4.0.
>
> On Wed, Oct 28, 2015 at 9:46 AM, John Omernik <j...@omernik.com> wrote:
>
>> I will check out those issues and report back.
>>
>> On Wed, Oct 28, 2015 at 9:42 AM, craig w <codecr...@gmail.com> wrote:
>>
>>> I've had no issue with the following combination:
>>>
>>> MesosDNS 0.4.0
>>> Marathon 0.11.0
>>> Mesos 0.24.1
>>>
>>> I've been waiting to upgrade to Mesos 0.25.0 because of issues mentioned
>>> in the mesos mailing list regarding Marathon 0.11.x and Mesos 0.25.0
>>>
>>> On Wed, Oct 28, 2015 at 10:38 AM, John Omernik <j...@omernik.com> wrote:
>>>
>>>> Hey all -
>>>>
>>>> I am cross posting this because it's a number of moving parts that
>>>> could be at issue here (Mesos, Mesos-dns, and/or Marathon).
>>>>
>>>> Basically: At the version combination in Subject, the IP that is
>>>> registered in mesos-dns for Docker containers running in Marathon is the
>>>> internal (container) IP address of the docker (in bridged mode) not the
>>>> nodes. This obviously causes issues.  Note this doesn't happen when the
>>>> Marathon application is non-Docker.
>>>>
>>>> I was running Mesos-dns 0.4.0 on a cluster running Mesos 0.24.0 and
>>>> Marathon 0.10.0 and I upgraded to Mesos 0.25.0 and Marathon 0.11.1 and
>>>> noticed this behavior happening.
>>>>
>>>> I thought that was odd because I have another cluster that was running
>>>> Mesos 0.25.0 and Marathon 0.11.1 and it wasn't happening, until I realized
>>>> that I hadn't upgraded Mesos-dns lately, I upgraded to Mesos-dns 0.4.0 and
>>>> the problem started occurring.
>>>>
>>>> Is there a setting that I need to use the external IP of the container?
>>>> Is this issue known? Is there a workaround? This is pretty major for Docker
>>>> running on Marathon and using Mesos-dns for service discovery.
>>>>
>>>> John Omernik
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>>
>>> https://github.com/mindscratch
>>> https://www.google.com/+CraigWickesser
>>> https://twitter.com/mind_scratch
>>> https://twitter.com/craig_links
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "marathon-framework" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to marathon-framework+unsubscr...@googlegroups.com.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
>


Re: Marathon 0.11.1 - Mesos 0.25 - Mesos-DNS 0.4.0

2015-10-28 Thread John Omernik
I will check out those issues and report back.

On Wed, Oct 28, 2015 at 9:42 AM, craig w <codecr...@gmail.com> wrote:

> I've had no issue with the following combination:
>
> MesosDNS 0.4.0
> Marathon 0.11.0
> Mesos 0.24.1
>
> I've been waiting to upgrade to Mesos 0.25.0 because of issues mentioned
> in the mesos mailing list regarding Marathon 0.11.x and Mesos 0.25.0
>
> On Wed, Oct 28, 2015 at 10:38 AM, John Omernik <j...@omernik.com> wrote:
>
>> Hey all -
>>
>> I am cross posting this because it's a number of moving parts that could
>> be at issue here (Mesos, Mesos-dns, and/or Marathon).
>>
>> Basically: At the version combination in Subject, the IP that is
>> registered in mesos-dns for Docker containers running in Marathon is the
>> internal (container) IP address of the docker (in bridged mode) not the
>> nodes. This obviously causes issues.  Note this doesn't happen when the
>> Marathon application is non-Docker.
>>
>> I was running Mesos-dns 0.4.0 on a cluster running Mesos 0.24.0 and
>> Marathon 0.10.0 and I upgraded to Mesos 0.25.0 and Marathon 0.11.1 and
>> noticed this behavior happening.
>>
>> I thought that was odd because I have another cluster that was running
>> Mesos 0.25.0 and Marathon 0.11.1 and it wasn't happening, until I realized
>> that I hadn't upgraded Mesos-dns lately, I upgraded to Mesos-dns 0.4.0 and
>> the problem started occurring.
>>
>> Is there a setting that I need to use the external IP of the container?
>> Is this issue known? Is there a workaround? This is pretty major for Docker
>> running on Marathon and using Mesos-dns for service discovery.
>>
>> John Omernik
>>
>>
>>
>
>
> --
>
> https://github.com/mindscratch
> https://www.google.com/+CraigWickesser
> https://twitter.com/mind_scratch
> https://twitter.com/craig_links
>
> --
> You received this message because you are subscribed to the Google Groups
> "marathon-framework" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to marathon-framework+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>


Re: Odd Scenerio with Mesos.

2015-10-21 Thread John Omernik
On it, it's in a weird PoC lab thing and I have to do some gyrations to get
logs off, it will be soon.

On Wed, Oct 21, 2015 at 2:46 PM, Vinod Kone <vinodk...@gmail.com> wrote:

> Logs please.
>
> On Wed, Oct 21, 2015 at 12:44 PM, John Omernik <j...@omernik.com> wrote:
>
>> I am running 0.24.
>>
>> I am running some tasks in marathon, and when they hit an OOM condition a
>> task is killed that is expected. Than I get a bunch of errors related to
>> "Failed to read "meory.limit_in_bytes', 'memory.max_usage_in_bytes' and
>> memory.stat.
>>
>> In addition the task tries to restart but keeps failing.
>>
>> A few notes, when the tasks fails, the sandbox becomes unavailable making
>> troubleshooting difficult. When this has occurred before, it seemed the
>> only way to get things working was to stop the slave, clear out the tmp
>> directory, and start it again. I'd like to understand why my task won't get
>> moving again.
>>
>> There are also lots of errors related to "failed to clean up isolator"
>> and invalid cgroups, I can get specific logs if people think it's needed.
>> I am thinking it's related to checkpointing or something like that? I.e. an
>> executor hit the OOM got killed, and it is trying to start back up, but
>> something isn't right?
>>
>> I know this is a jumped unorganized question, I can logs if needed.
>>
>>
>>
>


Odd Scenerio with Mesos.

2015-10-21 Thread John Omernik
I am running 0.24.

I am running some tasks in marathon, and when they hit an OOM condition a
task is killed that is expected. Than I get a bunch of errors related to
"Failed to read "meory.limit_in_bytes', 'memory.max_usage_in_bytes' and
memory.stat.

In addition the task tries to restart but keeps failing.

A few notes, when the tasks fails, the sandbox becomes unavailable making
troubleshooting difficult. When this has occurred before, it seemed the
only way to get things working was to stop the slave, clear out the tmp
directory, and start it again. I'd like to understand why my task won't get
moving again.

There are also lots of errors related to "failed to clean up isolator" and
invalid cgroups, I can get specific logs if people think it's needed.  I am
thinking it's related to checkpointing or something like that? I.e. an
executor hit the OOM got killed, and it is trying to start back up, but
something isn't right?

I know this is a jumped unorganized question, I can logs if needed.


Mesos-Dns Masters List

2015-10-14 Thread John Omernik
Hey all,

I was using mesos-dns, and I filled in my zk field based on the HA mesos
cluster I have.  Mesos dns is up, but in the stderr, I keep seeing
"generator.go:342 warning: leader "master@10.0.0.1:5050" is not in master
list.

I don't have a master list in my config.json, instead I am using the zk
list.   but the 10.0.0.1 is a valid master which is running properly. I am
running Mesos 0.24 and mesos-dns 0.3.0.   Any thoughts on this?

John


Mesos/Marathon/HAProxy Logging

2015-08-25 Thread John Omernik
I have been playing with an application that is a very simple app: A
webservice running in Python. I've created a docker container, it runs in
the container, I setup marathon to run it, I use mesos-dns and ha proxy and
I can access the service just fine anywhere in the cluster.

First let me say this is VERY cool. The capabilities here awesome.

Now the challenge: the security guy in me wants to take good logs from my
app.  It was setup to do it's own logging through a custom module. I am
very happy with it.  I setup the app in the container to mount a volume
that's in my MapRFS via NFS so I can log directly to a clustered
filesystem. THis is awesome, I can read my logs in Apache Drill as they are
written!!!

However, the haproxy through me for a loop. Once I started running the app
in Marathon with a service port and routed around via haproxy, I realized
something:  I lost my source IPs in my logs?

Why?

Because once HAProxy takes over, it no longer needs to keep the source IP,
and instead the next hop only sees the previous connection IP.  From a
service discovery perspective it works great, but with this setup, I'd lose
the previous hop. Perhaps I manually add something in haproxy to add an
X-forwarded-for header, that would be nice, however, that only works for
http apps, what about other TCP apps that are not HTTP?

This is an interesting problem, because apps should have good logging,
security, performance, troubleshooting, and if I can't get the source IP it
could be a problem.

So, my question is this, anyone ran into this? How are you handling it?
Any brainstorms here we may be able to work off of?

One thing I thought was why are we using HAproxy? Couldn't the same HAProxy
script, actually put in forwarding rules in IPtables?  This sounds messy,
but could it work? Has anyone explored that? If the data was forwarded,
than it wouldn't lose the IP information (and timeouts wouldn't be a
concern either (I think I posted before on how long running TCP connections
can be closed down by HAProxy if they don't implement TCP Keep alives).

Other ideas?  This is interesting to me, and likely others.


Re: Mesos/Marathon/HAProxy Logging

2015-08-25 Thread John Omernik
So I agree that is how it should be done, however the current
implementation on Mesos, requires me to manually code something like. In
addition, this is only for http traffic, not tcp... what happens when the
service running on Mesos isn't HTTP? I was hoping for some discussion
beyond just manually editing the ha proxy script to make it http and add
the headers...


On Tue, Aug 25, 2015 at 12:46 PM, Jeff Schroeder jeffschroe...@computer.org
 wrote:

 This is the header that should be passed:

 https://en.m.wikipedia.org/wiki/X-Forwarded-For

 Most of the modern internet routes through reverse proxies and this is how
 we log the actual source clients to solve similar auditing and compliance
 needs.


 On Tuesday, August 25, 2015, John Omernik j...@omernik.com wrote:

 I have been playing with an application that is a very simple app: A
 webservice running in Python. I've created a docker container, it runs in
 the container, I setup marathon to run it, I use mesos-dns and ha proxy and
 I can access the service just fine anywhere in the cluster.

 First let me say this is VERY cool. The capabilities here awesome.

 Now the challenge: the security guy in me wants to take good logs from my
 app.  It was setup to do it's own logging through a custom module. I am
 very happy with it.  I setup the app in the container to mount a volume
 that's in my MapRFS via NFS so I can log directly to a clustered
 filesystem. THis is awesome, I can read my logs in Apache Drill as they are
 written!!!

 However, the haproxy through me for a loop. Once I started running the
 app in Marathon with a service port and routed around via haproxy, I
 realized something:  I lost my source IPs in my logs?

 Why?

 Because once HAProxy takes over, it no longer needs to keep the source
 IP, and instead the next hop only sees the previous connection IP.  From a
 service discovery perspective it works great, but with this setup, I'd lose
 the previous hop. Perhaps I manually add something in haproxy to add an
 X-forwarded-for header, that would be nice, however, that only works for
 http apps, what about other TCP apps that are not HTTP?

 This is an interesting problem, because apps should have good logging,
 security, performance, troubleshooting, and if I can't get the source IP it
 could be a problem.

 So, my question is this, anyone ran into this? How are you handling it?
 Any brainstorms here we may be able to work off of?

 One thing I thought was why are we using HAproxy? Couldn't the same
 HAProxy script, actually put in forwarding rules in IPtables?  This sounds
 messy, but could it work? Has anyone explored that? If the data was
 forwarded, than it wouldn't lose the IP information (and timeouts wouldn't
 be a concern either (I think I posted before on how long running TCP
 connections can be closed down by HAProxy if they don't implement TCP Keep
 alives).

 Other ideas?  This is interesting to me, and likely others.



 --
 Text by Jeff, typos by iPhone



Re: Mesos Modifying User Group

2015-08-13 Thread John Omernik
I ran into this same issue.  For me it manifested as weird permission
denied in MapR's NFS implementation, running in bash, etc was fine. But
running in on Mesos, it didn't work (permission denied)(Also thank you to
MapR for helping me troubleshoot).  Good news, there is a patch.

https://issues.apache.org/jira/browse/MESOS-719

And it's fixed in Mesos 0.23.  I applied the patch and recompiled and it
worked great, and when I installed 0.23, it also worked great.

Good luck.

John

On Wed, Aug 12, 2015 at 5:28 PM, Nastooh Avessta (navesta) 
nave...@cisco.com wrote:

 Having a bit of a strange problem with Mesos 0.22, running Spark 1.4.0, on
 Docker 1.6 slaves. Part of my Spark program calls on a script that accesses
 a GPU. I am able to run this script:

 1.   As Bash

 2.   Via Marathon

 3.   As part of a Spark program running as a standalone master

 However, when I try to run the same Spark program with Mesos as master,
 i.e., spark-submit --master mesos://\`cat /etc/mesos/zk\` --deploy-mode
 client…, I am not able to access dri devices, e.g., mfx init:
 /dev/dri/renderD128 fd open failed. What seems to be happening is that the
 group membership of the default user, in this case “ubuntu” is modified by
 Mesos, i.e., whereas under cases 1-3, above, I get:



 $ id

 uid=1000(ubuntu) gid=1000(ubuntu)
 groups=1000(ubuntu),4(adm),20(dialout),24(cdrom),25(floppy),27(sudo),29(audio),30(dip),44(video),46(plugdev),102(netdev),999(docker)

 In case of Mesos, I get:

 uid=1000(ubuntu) gid=1000(ubuntu) groups=1000(ubuntu),0(root)



 I am wondering if there are configuration parameters that can be passed to
 Mesos to prevent it from modifying user groups?



 Cheers,

 [image: http://www.cisco.com/web/europe/images/email/signature/logo05.jpg]

 *Nastooh Avessta*
 ENGINEER.SOFTWARE ENGINEERING
 nave...@cisco.com
 Phone: *+1 604 647 1527 %2B1%20604%20647%201527*

 *Cisco Systems Limited*
 595 Burrard Street, Suite 2123 Three Bentall Centre, PO Box 49121
 VANCOUVER
 BRITISH COLUMBIA
 V7X 1J1
 CA
 Cisco.com http://www.cisco.com/



 [image: Think before you print.]Think before you print.

 This email may contain confidential and privileged material for the sole
 use of the intended recipient. Any review, use, distribution or disclosure
 by others is strictly prohibited. If you are not the intended recipient (or
 authorized to receive for the recipient), please contact the sender by
 reply email and delete all copies of this message.

 For corporate legal information go to:
 http://www.cisco.com/web/about/doing_business/legal/cri/index.html

 Cisco Systems Canada Co, 181 Bay St., Suite 3400, Toronto, ON, Canada, M5J
 2T3. Phone: 416-306-7000; Fax: 416-306-7099. *Preferences
 http://www.cisco.com/offer/subscribe/?sid=000478326 - Unsubscribe
 http://www.cisco.com/offer/unsubscribe/?sid=000478327 – Privacy
 http://www.cisco.com/web/siteassets/legal/privacy.html*





Re: Static Resource Reservation on mesos slave

2015-08-11 Thread John Omernik
I am reviving this thread as I am looking to do something similar but have
a question

 --resources=cpus(role1):1;mem(role1):2048;disk(role1):32768;cpus(*):11

In what Vinod has put works, but in reality I want (assuming a 12 vcore and
32768 MB of ram node)

 --resources=cpus(role1):1;mem(role1):2048;cpus(*):12;mem(*):32768

In other words, role1 is allowed to use 1 CPU and 2 GB of memory on this
node.  All other roles are allowed to use the 12 CPUs and 32 GB of ram (all
the node's resources). However, Mesos is adding these, thus the slave is
registering thinking it has 13 available cores,  the desired behavior is
that dev is allowed to use a core if one is available on this node, and
production is allowed to use 12 cores if available. If Prod is using 12
cores, then obviously, the dev task needing one core won't run here, if
prod is using 11 cores, then a dev task using 1 core will be able to run.
How would I achieve that using the the --resources flag?

Thanks!




On Fri, Jun 19, 2015 at 2:31 PM, Vinod Kone vinodk...@gmail.com wrote:


 On Fri, Jun 19, 2015 at 12:08 PM, Anindya Sinha anindya.si...@gmail.com
 wrote:

 On mesos-slave: I start mesos-slave with resources carved out for role1
 as:
   --resources=cpus(role1):1;mem(role1):2048;disk(role1):32768


 You need to add * (unreserved) resources to this flag. Assuming you have
 12 cpus on this box it would look like

  --resources=cpus(role1):1;mem(role1):2048;disk(role1):32768;cpus(*):11



--resources documentation

2015-08-04 Thread John Omernik
Hey, all, I am looking to set my slave resources and was looking at the
documentation and was unclear exactly the format used by memory (and/or
disk) I am going to assume based on the numbers below, that 15360 is like
in MB?  (15 GB of ram seems like a good example, as opposed to saying it's
in KB which would say there are 15 MB of RAM in this example, or bytes
would say there are 15 KB of ram on this example).   If I am just missing
the documentation that helps explain this, please point me in that
direction. I would prefer a solid answer rather than trying to infer from
code/examples.

Thanks!



http://open.mesosphere.com/reference/mesos-slave/

Total consumable resources per slave, in the form
'name(role):value;name(role):value...'. This value can be set to limit
resources per role, or to overstate the number of resources that are
available to the slave. --resources=cpus(*):8; mem(*):15360;
disk(*):710534; ports(*):[31000-32000]
--resources=cpus(prod):8; cpus(stage):2 mem(*):15360; disk(*):710534;
ports(*):[31000-32000]
All * roles will be detected, so you can specify only the resources that
are not all roles (*). --resources=cpus(prod):8; cpus(stage)


Docker on Marathon 0.9.0 on Mesos 0.23.0

2015-08-04 Thread John Omernik
I am finding that Docker Containers won't start for me in the versions
above, the only information I am getting from the sandbox is below, I am
not sure what the issue is in that the file is in the same location where
the previous version files were...  Any help is appreciated.

John



mesos-docker-executor: error while loading shared libraries:
libmesos-0.23.0.so: cannot open shared object file: No such file or
directory


Re: --resources documentation

2015-08-04 Thread John Omernik
Perfect! Thanks!

On Tue, Aug 4, 2015 at 11:49 AM, Greg Mann g...@mesosphere.io wrote:

 Hi John,
 You are correct, memory  disk are specified in MB. This is documented at
 http://mesos.apache.org/documentation/attributes-resources/ in the
 section titled Predefined Uses  Conventions.

 Cheers,
 Greg

 On Tue, Aug 4, 2015 at 6:54 AM, John Omernik j...@omernik.com wrote:

 Hey, all, I am looking to set my slave resources and was looking at the
 documentation and was unclear exactly the format used by memory (and/or
 disk) I am going to assume based on the numbers below, that 15360 is like
 in MB?  (15 GB of ram seems like a good example, as opposed to saying it's
 in KB which would say there are 15 MB of RAM in this example, or bytes
 would say there are 15 KB of ram on this example).   If I am just missing
 the documentation that helps explain this, please point me in that
 direction. I would prefer a solid answer rather than trying to infer from
 code/examples.

 Thanks!



 http://open.mesosphere.com/reference/mesos-slave/

 Total consumable resources per slave, in the form
 'name(role):value;name(role):value...'. This value can be set to limit
 resources per role, or to overstate the number of resources that are
 available to the slave. --resources=cpus(*):8; mem(*):15360;
 disk(*):710534; ports(*):[31000-32000]
 --resources=cpus(prod):8; cpus(stage):2 mem(*):15360; disk(*):710534;
 ports(*):[31000-32000]
 All * roles will be detected, so you can specify only the resources that
 are not all roles (*). --resources=cpus(prod):8; cpus(stage)





Re: Build 0.23 gcc Version

2015-07-28 Thread John Omernik
So, I don't mean to sound like a newbie here, but in running my current
setup which has 4.6.3, (and I tried to run 4.8) how can I get Mesos 0.23 to
compile. Is this something I need to change in certain files? In certain
steps? Is this something that should be a bug in Mesos to handle the
versions? Is this a configuration issue? I'd love to learn more about how
this works, but would love some pointers here, and since my setup is fairly
vanilla, others may also benefit from getting this to work.

John

On Mon, Jul 27, 2015 at 10:56 AM, James Peach jor...@gmail.com wrote:


  On Jul 24, 2015, at 3:57 PM, Michael Park mcyp...@gmail.com wrote:
 
  Hi John,
 
  I would first suggest trying CC=gcc CXX=g++ ../configure, and if
 that works, try to find out what which cc and which c++ return and find out
 what they symlink to.
  I believe autotools uses cc and c++ rather than gcc and g++ by default,
 so I think there's probably something funky going on there.

 No, you explicitly tell autoconf to default to G++

 mesos.git jpeach$ grep AC_PROG_C configure.ac
 AC_PROG_CXX([g++])
 AC_PROG_CC([gcc])

 IMHO the correct invocation is something like:
 AC_PROG_CXX([c++ g++ clang++])

 since you should always default to the system default toolchain

 J




Re: Build 0.23 gcc Version

2015-07-27 Thread John Omernik
Output below with the version and the command I ran

Basically I had the standard gcc installed and I added PPA for gcc 4.8 and
did apt-get install. This likely left the supported official gcc there.  I
am a little fresh when it comes to dealing with this stuff, is it as
simple as apt-get remove gcc?





configure: creating ./config.lt

config.lt: creating libtool

configure: Setting up build environment for x86_64 linux-gnu

checking whether we are using the GNU C++ compiler... (cached) yes

checking whether g++ accepts -g... (cached) yes

checking dependency style of g++... (cached) gcc3

checking whether we are using the GNU C compiler... (cached) yes

checking whether gcc accepts -g... (cached) yes

checking for gcc option to accept ISO C89... (cached) none needed

checking whether gcc understands -c and -o together... (cached) yes

checking dependency style of gcc... (cached) gcc3

checking for C++ compiler vendor... gnu

checking for C++ compiler version... 4.6.3

checking for C++ compiler vendor... (cached) gnu

configure: error: GCC 4.8 or higher required (found 4.6.3)

darkness@hadoopmapr1:/opt/mapr/mesos/mesos-0.23.0/build$ gcc --version

gcc (Ubuntu 4.8.1-2ubuntu1~12.04) 4.8.1

Copyright (C) 2013 Free Software Foundation, Inc.

This is free software; see the source for copying conditions.  There is NO

warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.


darkness@hadoopmapr1:/opt/mapr/mesos/mesos-0.23.0/build$ CC=gcc CXX=g++
../configure

On Fri, Jul 24, 2015 at 5:57 PM, Michael Park mcyp...@gmail.com wrote:

 Hi John,

 I would first suggest trying *CC=gcc CXX=g++ ../configure*, and if that
 works, try to find out what *which* *cc* and *which* *c++* return and
 find out what they symlink to.
 I believe autotools uses *cc* and *c++* rather than *gcc* and *g++* by
 default, so I think there's probably something funky going on there.

 MPark.

 On Fri, Jul 24, 2015 at 2:31 PM Benjamin Hindman 
 benjamin.hind...@gmail.com wrote:

 Hey John,

 It appears that we're finding gcc 4.6.3 on your machine. Is it possible
 that your autotools are hard coded to look for a gcc that is not the gcc
 that you've installed and is on your path?

 At least for me I use devtoolset-2 and Software Collections (scl) and I
 can get my machine into funky set ups where I've got a gcc 4.8 installed
 but using autotools it picks the wrong compiler.

 Ben.

 On Fri, Jul 24, 2015 at 2:02 PM John Omernik j...@omernik.com wrote:

 I am trying to build 0.23, I got the error below.  I already installed
 gcc-4.8 and set my alternatives to work with 4.8 as you can see gcc
 --version returns the right version, where is the configure script pulling
 that data? Are there flags I could use to help it through the process? :)

 John



 configure: error: GCC 4.8 or higher required (found 4.6.3)

 darkness@hadoopmapr1:/opt/mapr/mesos/mesos-0.23.0/build$ gcc --version

 gcc (Ubuntu 4.8.1-2ubuntu1~12.04) 4.8.1

 Copyright (C) 2013 Free Software Foundation, Inc.

 This is free software; see the source for copying conditions.  There is
 NO

 warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
 PURPOSE.




Build 0.23 gcc Version

2015-07-24 Thread John Omernik
I am trying to build 0.23, I got the error below.  I already installed
gcc-4.8 and set my alternatives to work with 4.8 as you can see gcc
--version returns the right version, where is the configure script pulling
that data? Are there flags I could use to help it through the process? :)

John



configure: error: GCC 4.8 or higher required (found 4.6.3)

darkness@hadoopmapr1:/opt/mapr/mesos/mesos-0.23.0/build$ gcc --version

gcc (Ubuntu 4.8.1-2ubuntu1~12.04) 4.8.1

Copyright (C) 2013 Free Software Foundation, Inc.

This is free software; see the source for copying conditions.  There is NO

warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.


Mailing lists for specific frameworks

2015-06-24 Thread John Omernik
Hey all, many of the frameworks in development such as the mesos-kafka and
mesos-elasticsearch look very promising!  In order to keep the Mesos users
group clean, I was wondering if there are mailing lists, google groups etc
set for some of these.

I think it could be beneficial from a standpoint of seeing which frameworks
have interest, and possibly getting more contributions from the community,
as well as help focus the Mesos group.  I am not sure what sort of
guidelines could be setup, and I would encourage there to not be don't
post here post there mentality, but instead a helpful approach where we
can try to direct users asking questions about frameworks to specific
groups.  This would also help these frameworks in that people wouldn't use
git issues for questions.

Thoughts?  Mesosphere, is this something that lines up with your community
goals?

John


Re: Writing outside the sandbox

2015-05-12 Thread John Omernik
Root IS able to write to the share outside of Mesos. I am working with MapR
to understand the NFS component better.



On Tue, May 12, 2015 at 11:28 AM, Bjoern Metzdorf bjo...@metzdorf.de
wrote:

 Is there anything in the nfs server log files? Maybe it squashes root by
 default and the root group membership of darkness falls into that?

 Regards,
 Bjoern

 On May 12, 2015, at 5:53 AM, John Omernik j...@omernik.com wrote:

 So I tried su darkness and su - darkness and both allowed a file write
 with no issues.  On the group thing, while it is weird would that
 actually hurt ti to contain that group?  Even if I set the directory to 777
 I still get a failure. on a create within it.  I am guessing this is
 something more to do with MapRs NFS than Mesos at this point, but if anyone
 would have any other tips on troubleshooting to confirm that, I'd
 appreciate it.

 John

 On Mon, May 11, 2015 at 5:18 PM, Marco Massenzio ma...@mesosphere.io
 wrote:

 Looks to me that while 'uid' is 1000
 uid=1000(darkness) gid=1000(darkness) groups=1000(darkness),0(root)

 this is still root's env when run from Mesos (also, weird that groups
 contains 0(root)):
 USER=root

 again - not sure how we su to a different user, but this usually happens
 if one does `su darkness` (instead of `su - darkness`) from the shell, at
 any rate.

 *Marco Massenzio*
 *Distributed Systems Engineer*

 On Mon, May 11, 2015 at 6:54 AM, John Omernik j...@omernik.com wrote:

 Paul: I checked in multiple places and I don't see rootsquash being
 used. I am using the MapR NFS server, and I do not believe that is a common
 option in the default setup ( I will follow up closer on that).

 Adam and Maxime:  So I included the output of both id (instead of
 whoami) and env (as seen below) and I believe that your ideas may be
 getting somewhere.  There are a number of things that strike me as odd in
 the outputs, and I'd like your thoughts on them.  First of all, remember
 that the permissions on the folders are 775 right now, so with the primary
 group set (which it appears to be based on id) and the user set, it still
 should have write access.  That said, the SUed process doesn't have any of
 the other groups (which I want to test if any of those controls access,
 especially with MapR). At risk of exposing to much information about my
 test network in a public forum, I left all the details in the ENV to see if
 there are things other may see that could be causing me issues.

 Thanks for the replies so far!





 *New Script:*

 #!/bin/bash

 echo Writing id information to stderr for one stop logging 12

 id 12


 echo  12


 echo Printing out the env command to std err for one stop loggins 12

 env 12


 mkdir /mapr/brewpot/mesos/storm/test/test1

 touch /mapr/brewpot/mesos/storm/test/test1/testing.go





 *Run within Mesos:*

 I0511 08:41:02.804448  8048 exec.cpp:132] Version: 0.21.0
 I0511 08:41:02.814324  8059 exec.cpp:206] Executor registered on slave
 20150505-145508-1644210368-5050-8608-S2
 Writing id information to stderr for one stop logging
 uid=1000(darkness) gid=1000(darkness) groups=1000(darkness),0(root)

 Printing out the env command to std err for one stop loggins
 LIBPROCESS_IP=192.168.0.98
 HOST=hadoopmapr3.brewingintel.com
 SHELL=/bin/bash
 TERM=unknown
 PORT_10005=31783

 MESOS_DIRECTORY=/tmp/mesos/slaves/20150505-145508-1644210368-5050-8608-S2/frameworks/20150302-094409-1644210368-5050-2134-0003/executors/permtest.5f822976-f7e3-11e4-a22d-56847afe9799/runs/e53dc010-dd3c-4993-8f39-f8b532e5cf8b
 PORT0=31783
 MESOS_TASK_ID=permtest.5f822976-f7e3-11e4-a22d-56847afe9799
 USER=root
 LD_LIBRARY_PATH=:/usr/local/lib
 SUDO_USER=darkness
 MESOS_EXECUTOR_ID=permtest.5f822976-f7e3-11e4-a22d-56847afe9799
 SUDO_UID=1000
 USERNAME=root

 PATH=/home/darkness:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
 MAIL=/var/mail/root

 PWD=/opt/mapr/mesos/tmp/slave/slaves/20150505-145508-1644210368-5050-8608-S2/frameworks/20150302-094409-1644210368-5050-2134-0003/executors/permtest.5f822976-f7e3-11e4-a22d-56847afe9799/runs/e53dc010-dd3c-4993-8f39-f8b532e5cf8b
 MESOS_NATIVE_JAVA_LIBRARY=/usr/local/lib/libmesos-0.21.0.so
 MESOS_NATIVE_LIBRARY=/usr/local/lib/libmesos-0.21.0.so
 LANG=en_US.UTF-8
 PORTS=31783
 MESOS_SLAVE_PID=slave(1)@192.168.0.98:5051
 MESOS_FRAMEWORK_ID=20150302-094409-1644210368-5050-2134-0003
 MESOS_CHECKPOINT=1
 SUDO_COMMAND=/usr/local/bin/mesos daemon.sh mesos-slave --master=
 192.168.0.98:5050 --ip=192.168.0.98
 --log_dir=/opt/mapr/mesos/tmp/slave_log/ --containerizers=docker,mesos
 --gc_delay=600mins --disk_watch_interval=60secs
 HOME=/home/darkness
 SHLVL=2
 LIBPROCESS_PORT=0
 MARATHON_APP_ID=/permtest
 PYTHONPATH=:/usr/local/libexec/mesos/python
 MARATHON_APP_VERSION=2015-05-11T13:41:04.218Z
 LOGNAME=root
 MESOS_SLAVE_ID=20150505-145508-1644210368-5050-8608-S2
 PORT=31783
 SUDO_GID=1000
 MESOS_RECOVERY_TIMEOUT=15mins
 _=/usr/bin/env
 mkdir: cannot create directory `/mapr/brewpot/mesos/storm/test/test1':
 Permission denied

Re: Writing outside the sandbox

2015-05-12 Thread John Omernik
So I tried su darkness and su - darkness and both allowed a file write with
no issues.  On the group thing, while it is weird would that actually
hurt ti to contain that group?  Even if I set the directory to 777 I still
get a failure. on a create within it.  I am guessing this is something more
to do with MapRs NFS than Mesos at this point, but if anyone would have any
other tips on troubleshooting to confirm that, I'd appreciate it.

John

On Mon, May 11, 2015 at 5:18 PM, Marco Massenzio ma...@mesosphere.io
wrote:

 Looks to me that while 'uid' is 1000
 uid=1000(darkness) gid=1000(darkness) groups=1000(darkness),0(root)

 this is still root's env when run from Mesos (also, weird that groups
 contains 0(root)):
 USER=root

 again - not sure how we su to a different user, but this usually happens
 if one does `su darkness` (instead of `su - darkness`) from the shell, at
 any rate.

 *Marco Massenzio*
 *Distributed Systems Engineer*

 On Mon, May 11, 2015 at 6:54 AM, John Omernik j...@omernik.com wrote:

 Paul: I checked in multiple places and I don't see rootsquash being used.
 I am using the MapR NFS server, and I do not believe that is a common
 option in the default setup ( I will follow up closer on that).

 Adam and Maxime:  So I included the output of both id (instead of whoami)
 and env (as seen below) and I believe that your ideas may be getting
 somewhere.  There are a number of things that strike me as odd in the
 outputs, and I'd like your thoughts on them.  First of all, remember that
 the permissions on the folders are 775 right now, so with the primary group
 set (which it appears to be based on id) and the user set, it still should
 have write access.  That said, the SUed process doesn't have any of the
 other groups (which I want to test if any of those controls access,
 especially with MapR). At risk of exposing to much information about my
 test network in a public forum, I left all the details in the ENV to see if
 there are things other may see that could be causing me issues.

 Thanks for the replies so far!





 *New Script:*

 #!/bin/bash

 echo Writing id information to stderr for one stop logging 12

 id 12


 echo  12


 echo Printing out the env command to std err for one stop loggins 12

 env 12


 mkdir /mapr/brewpot/mesos/storm/test/test1

 touch /mapr/brewpot/mesos/storm/test/test1/testing.go





 *Run within Mesos:*

 I0511 08:41:02.804448  8048 exec.cpp:132] Version: 0.21.0
 I0511 08:41:02.814324  8059 exec.cpp:206] Executor registered on slave
 20150505-145508-1644210368-5050-8608-S2
 Writing id information to stderr for one stop logging
 uid=1000(darkness) gid=1000(darkness) groups=1000(darkness),0(root)

 Printing out the env command to std err for one stop loggins
 LIBPROCESS_IP=192.168.0.98
 HOST=hadoopmapr3.brewingintel.com
 SHELL=/bin/bash
 TERM=unknown
 PORT_10005=31783

 MESOS_DIRECTORY=/tmp/mesos/slaves/20150505-145508-1644210368-5050-8608-S2/frameworks/20150302-094409-1644210368-5050-2134-0003/executors/permtest.5f822976-f7e3-11e4-a22d-56847afe9799/runs/e53dc010-dd3c-4993-8f39-f8b532e5cf8b
 PORT0=31783
 MESOS_TASK_ID=permtest.5f822976-f7e3-11e4-a22d-56847afe9799
 USER=root
 LD_LIBRARY_PATH=:/usr/local/lib
 SUDO_USER=darkness
 MESOS_EXECUTOR_ID=permtest.5f822976-f7e3-11e4-a22d-56847afe9799
 SUDO_UID=1000
 USERNAME=root

 PATH=/home/darkness:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
 MAIL=/var/mail/root

 PWD=/opt/mapr/mesos/tmp/slave/slaves/20150505-145508-1644210368-5050-8608-S2/frameworks/20150302-094409-1644210368-5050-2134-0003/executors/permtest.5f822976-f7e3-11e4-a22d-56847afe9799/runs/e53dc010-dd3c-4993-8f39-f8b532e5cf8b
 MESOS_NATIVE_JAVA_LIBRARY=/usr/local/lib/libmesos-0.21.0.so
 MESOS_NATIVE_LIBRARY=/usr/local/lib/libmesos-0.21.0.so
 LANG=en_US.UTF-8
 PORTS=31783
 MESOS_SLAVE_PID=slave(1)@192.168.0.98:5051
 MESOS_FRAMEWORK_ID=20150302-094409-1644210368-5050-2134-0003
 MESOS_CHECKPOINT=1
 SUDO_COMMAND=/usr/local/bin/mesos daemon.sh mesos-slave --master=
 192.168.0.98:5050 --ip=192.168.0.98
 --log_dir=/opt/mapr/mesos/tmp/slave_log/ --containerizers=docker,mesos
 --gc_delay=600mins --disk_watch_interval=60secs
 HOME=/home/darkness
 SHLVL=2
 LIBPROCESS_PORT=0
 MARATHON_APP_ID=/permtest
 PYTHONPATH=:/usr/local/libexec/mesos/python
 MARATHON_APP_VERSION=2015-05-11T13:41:04.218Z
 LOGNAME=root
 MESOS_SLAVE_ID=20150505-145508-1644210368-5050-8608-S2
 PORT=31783
 SUDO_GID=1000
 MESOS_RECOVERY_TIMEOUT=15mins
 _=/usr/bin/env
 mkdir: cannot create directory `/mapr/brewpot/mesos/storm/test/test1':
 Permission denied
 touch: cannot touch `/mapr/brewpot/mesos/storm/test/test1/testing.go': No
 such file or directory


 *Run from command line:*

 Writing id information to stderr for one stop logging
 uid=1000(darkness) gid=1000(darkness)
 groups=1000(darkness),4(adm),24(cdrom),27(sudo),30(dip),42(shadow),46(plugdev),111(lpadmin),112(sambashare),700(mapr),2000(brewclub),2001(lcusers)

 Printing out the env command to std err for one stop loggins

Re: Writing outside the sandbox

2015-05-11 Thread John Omernik
=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.axv=01;35:*.anx=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.axa=00;36:*.oga=00;36:*.spx=00;36:*.xspf=00;36:
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/lib/scala/bin
MAIL=/var/mail/darkness
PWD=/mnt
LANG=en_US.UTF-8
NODE_PATH=/usr/lib/nodejs:/usr/lib/node_modules:/usr/share/javascript
HOME=/home/darkness
SHLVL=2
LOGNAME=darkness
SSH_CONNECTION=192.168.0.186 57204 192.168.0.100 22
LESSOPEN=| /usr/bin/lesspipe %s
LESSCLOSE=/usr/bin/lesspipe %s %s
_=/usr/bin/env


On Mon, May 11, 2015 at 1:05 AM, Maxime Brugidou maxime.brugi...@gmail.com
wrote:

 Mesos does not set the groups of the process correctly. There is a JIRA
 ticket for that. It only set the gid. I believe that this could explain the
 issue if your user is in a specific NFS group to be able go write.

 See
 https://issues.apache.org/jira/plugins/servlet/mobile#issue/MESOS-719
  On May 11, 2015 3:51 AM, Paul Brett pbr...@twitter.com wrote:

 Can you check on the NFS server to see if the filesystem has been
 exported with the rootsquash option?  That's a commonly used option which
 converts root uid on NFS clients to nobody on the server.

 -- Paul Brett
 On May 10, 2015 5:15 PM, Adam Bordelon a...@mesosphere.io wrote:

 Go ahead and run `env` in your script too, and see if there are any
 interesting differences when run via Marathon vs. directly.
 Maybe you're running in a different shell?

 On Sun, May 10, 2015 at 2:21 PM, John Omernik j...@omernik.com wrote:

 I believe the slave IS running as root. FWIW when I ran the script from
 above as root, it did work as intended (created the files on the NFS
 share).

 On Sun, May 10, 2015 at 9:08 AM, Dick Davies d...@hellooperator.net
 wrote:

 Any idea what user mesos is running as? This could just be a
 filesystem permission
 thing (ISTR last time I used NFS mounts, they had a 'root squash'
 option that prevented
 local root from writing to the NFS mount).

 On 9 May 2015 at 22:13, John Omernik j...@omernik.com wrote:
  I am not specifying isolators. The Default? :)  Is that a per slave
 setting?
 
  On Sat, May 9, 2015 at 3:33 PM, James DeFelice 
 james.defel...@gmail.com
  wrote:
 
  What isolators are you using?
 
  On Sat, May 9, 2015 at 3:48 PM, John Omernik j...@omernik.com
 wrote:
 
  Marco... great idea... thank you.  I just tried it and it worked
 when I
  had a /mnt/permtesting with the same permissions.  So it appears
 something
  to do with NFS and Mesos (Remember I tested just NFS that worked
 fine, it's
  the combination that is causing this).
 
  On Sat, May 9, 2015 at 1:09 PM, Marco Massenzio 
 ma...@mesosphere.io
  wrote:
 
  Out of my own curiousity (sorry, I have no fresh insights into
 the issue
  here) did you try to run the script and write to a non-NFS mounted
  directory? (same ownership/permissions)
 
  This way we could at least find out whether it's something
 related to
  NFS, or a more general permission-related issue.
 
  Marco Massenzio
  Distributed Systems Engineer
 
  On Sat, May 9, 2015 at 5:10 AM, John Omernik j...@omernik.com
 wrote:
 
  Here is the testing I am doing. I used a simple script (run.sh)
 It
  writes the user it is running as to stderr (so it's the same log
 as the
  errors from file writing) and then tries to make a directory in
 nfs, and
  then touch a file in nfs.  Note: This script directly run  works
 on every
  node.  You can see the JSON I used in marathon, and in the
 sandbox results,
  you can see the user is indeed darkness and the directory cannot
 be created.
  However when directly run, it the script, with the same user,
 creates the
  directory with no issue.  Now,  I realize this COULD still be a
 NFS quirk
  here, however, this testing points at some restriction in how
 marathon kicks
  off the cmd.   Any thoughts on where to look would be very
 helpful!
 
  John
 
 
 
  Script:
 
  #!/bin/bash
  echo Writing whoami to stderr for one stop logging 12
  whoami 12
  mkdir /mapr/brewpot/mesos/storm/test/test1
  touch /mapr/brewpot/mesos/storm/test/test1/testing.go
 
 
 
  Run Via Marathon
 
 
  {
  cmd: /mapr/brewpot/mesos/storm/run.sh,
  cpus: 1.0,
  mem: 1024,
  id: permtest,
  user: darkness,
  instances: 1
  }
 
 
  I0509 07:02:52.457242  9562 exec.cpp:132] Version: 0.21.0
  I0509 07:02:52.462700  9570 exec.cpp:206] Executor registered on
 slave
  20150505-145508-1644210368-5050-8608-S0
  Writing whoami to stderr for one stop logging
  darkness
  mkdir: cannot create directory
 `/mapr/brewpot/mesos/storm/test/test1':
  Permission denied
  touch: cannot touch
 `/mapr/brewpot/mesos/storm/test/test1/testing.go':
  No such file or directory
 
 
  Run Via Shell:
 
 
  $ /mapr/brewpot/mesos/storm/run.sh
  Writing whoami to stderr

Re: Writing outside the sandbox

2015-05-09 Thread John Omernik
I am not specifying isolators. The Default? :)  Is that a per slave setting?

On Sat, May 9, 2015 at 3:33 PM, James DeFelice james.defel...@gmail.com
wrote:

 What isolators are you using?

 On Sat, May 9, 2015 at 3:48 PM, John Omernik j...@omernik.com wrote:

 Marco... great idea... thank you.  I just tried it and it worked when I
 had a /mnt/permtesting with the same permissions.  So it appears something
 to do with NFS and Mesos (Remember I tested just NFS that worked fine, it's
 the combination that is causing this).

 On Sat, May 9, 2015 at 1:09 PM, Marco Massenzio ma...@mesosphere.io
 wrote:

 Out of my own curiousity (sorry, I have no fresh insights into the issue
 here) did you try to run the script and write to a non-NFS mounted
 directory? (same ownership/permissions)

 This way we could at least find out whether it's something related to
 NFS, or a more general permission-related issue.

 *Marco Massenzio*
 *Distributed Systems Engineer*

 On Sat, May 9, 2015 at 5:10 AM, John Omernik j...@omernik.com wrote:

 Here is the testing I am doing. I used a simple script (run.sh)  It
 writes the user it is running as to stderr (so it's the same log as the
 errors from file writing) and then tries to make a directory in nfs, and
 then touch a file in nfs.  Note: This script directly run  works on every
 node.  You can see the JSON I used in marathon, and in the sandbox results,
 you can see the user is indeed darkness and the directory cannot be
 created. However when directly run, it the script, with the same user,
 creates the directory with no issue.  Now,  I realize this COULD still be a
 NFS quirk here, however, this testing points at some restriction in how
 marathon kicks off the cmd.   Any thoughts on where to look would be very
 helpful!

 John



 Script:

 #!/bin/bash
 echo Writing whoami to stderr for one stop logging 12
 whoami 12
 mkdir /mapr/brewpot/mesos/storm/test/test1
 touch /mapr/brewpot/mesos/storm/test/test1/testing.go



 Run Via Marathon


 {
 cmd: /mapr/brewpot/mesos/storm/run.sh,
 cpus: 1.0,
 mem: 1024,
 id: permtest,
 user: darkness,
 instances: 1
 }


 I0509 07:02:52.457242  9562 exec.cpp:132] Version: 0.21.0
 I0509 07:02:52.462700  9570 exec.cpp:206] Executor registered on slave
 20150505-145508-1644210368-5050-8608-S0
 Writing whoami to stderr for one stop logging
 darkness
 mkdir: cannot create directory `/mapr/brewpot/mesos/storm/test/test1':
 Permission denied
 touch: cannot touch `/mapr/brewpot/mesos/storm/test/test1/testing.go':
 No such file or directory


 Run Via Shell:


 $ /mapr/brewpot/mesos/storm/run.sh
 Writing whoami to stderr for one stop logging
 darkness
 darkness@hadoopmapr1:/mapr/brewpot/mesos/storm$ ls ./test/
 test1
 darkness@hadoopmapr1:/mapr/brewpot/mesos/storm$ ls ./test/test1/
 testing.go


 On Sat, May 9, 2015 at 3:14 AM, Adam Bordelon a...@mesosphere.io
 wrote:

 I don't know of anything inside of Mesos that would prevent you from
 writing to NFS. Maybe examine the environment variables set when running 
 as
 that user. Or are you running in a Docker container? Those can have
 additional restrictions.

 On Fri, May 8, 2015 at 4:44 PM, John Omernik j...@omernik.com wrote:

 I am doing something where people may recommend against my course of
 action. However, I am curious if there is a way basically I have a
 process being kicked off in marathon that is trying to write to a nfs
 location.  The permissions of the user running the task and the nfs
 location are good. So what component of mesos or marathon is keeping me
 from writing here ?  ( I am getting permission denied). Is this one of
 those things that is just not allowed, or is there an option to pass to
 marathon to allow this?  Thanks !

 --
 Sent from my iThing








 --
 James DeFelice
 585.241.9488 (voice)
 650.649.6071 (fax)



Writing outside the sandbox

2015-05-08 Thread John Omernik
I am doing something where people may recommend against my course of
action. However, I am curious if there is a way basically I have a
process being kicked off in marathon that is trying to write to a nfs
location.  The permissions of the user running the task and the nfs
location are good. So what component of mesos or marathon is keeping me
from writing here ?  ( I am getting permission denied). Is this one of
those things that is just not allowed, or is there an option to pass to
marathon to allow this?  Thanks !

-- 
Sent from my iThing


Re: Storm Mesos Error

2015-04-29 Thread John Omernik
I used the bin/build-release.sh package

and it put in all in a folder named apache-storm-0.9.3... that's probably
my problem? :)

On Wed, Apr 29, 2015 at 1:30 PM, Tim Chen t...@mesosphere.io wrote:

 Hi John,

 Does your storm-mesos tar ball as a folder storm-mesos-0.9.3 in there?

 Tim

 On Wed, Apr 29, 2015 at 11:26 AM, John Omernik j...@omernik.com wrote:

 Greetings all,

 I got my storm nimbus running, but when I try to run a test topology, the
 task enters a lost state and  I get the below in my stderr on the
 sandbox. Note, the URL for the storm.yaml works fine, not sure why it's
 causing an issue on the cp.




 cp: cannot create regular file `storm-mesos*/conf': No such file or
 directory

 Full:

 WARNING: Logging before InitGoogleLogging() is written to STDERR
 I0429 13:20:08.873922  5061 fetcher.cpp:76] Fetching URI
 'file:///mapr/brewpot/mesos/storm-mesos-0.9.3.tgz'
 I0429 13:20:08.874048  5061 fetcher.cpp:179] Copying resource from
 '/mapr/brewpot/mesos/storm-mesos-0.9.3.tgz' to
 '/tmp/mesos/slaves/20150402-121212-1644210368-5050-21808-S0/frameworks/20150402-121212-1644210368-5050-21808-0008/executors/myTopo-1-1430331608/runs/8ac24d9e-bb7f-49fe-9519-a07aa6a1d969'
 I0429 13:20:09.103682  5061 fetcher.cpp:64] Extracted resource
 '/tmp/mesos/slaves/20150402-121212-1644210368-5050-21808-S0/frameworks/20150402-121212-1644210368-5050-21808-0008/executors/myTopo-1-1430331608/runs/8ac24d9e-bb7f-49fe-9519-a07aa6a1d969/storm-mesos-0.9.3.tgz'
 into
 '/tmp/mesos/slaves/20150402-121212-1644210368-5050-21808-S0/frameworks/20150402-121212-1644210368-5050-21808-0008/executors/myTopo-1-1430331608/runs/8ac24d9e-bb7f-49fe-9519-a07aa6a1d969'
 I0429 13:20:09.109590  5061 fetcher.cpp:76] Fetching URI '
 http://hadoopmapr1.brewingintel.com:43938/conf/storm.yaml'
 I0429 13:20:09.109658  5061 fetcher.cpp:126] Downloading '
 http://hadoopmapr1.brewingintel.com:43938/conf/storm.yaml' to
 '/tmp/mesos/slaves/20150402-121212-1644210368-5050-21808-S0/frameworks/20150402-121212-1644210368-5050-21808-0008/executors/myTopo-1-1430331608/runs/8ac24d9e-bb7f-49fe-9519-a07aa6a1d969/storm.yaml'
 cp: cannot create regular file `storm-mesos*/conf': No such file or
 directory





Storm Mesos Error

2015-04-29 Thread John Omernik
Greetings all,

I got my storm nimbus running, but when I try to run a test topology, the
task enters a lost state and  I get the below in my stderr on the
sandbox. Note, the URL for the storm.yaml works fine, not sure why it's
causing an issue on the cp.




cp: cannot create regular file `storm-mesos*/conf': No such file or
directory

Full:

WARNING: Logging before InitGoogleLogging() is written to STDERR
I0429 13:20:08.873922  5061 fetcher.cpp:76] Fetching URI
'file:///mapr/brewpot/mesos/storm-mesos-0.9.3.tgz'
I0429 13:20:08.874048  5061 fetcher.cpp:179] Copying resource from
'/mapr/brewpot/mesos/storm-mesos-0.9.3.tgz' to
'/tmp/mesos/slaves/20150402-121212-1644210368-5050-21808-S0/frameworks/20150402-121212-1644210368-5050-21808-0008/executors/myTopo-1-1430331608/runs/8ac24d9e-bb7f-49fe-9519-a07aa6a1d969'
I0429 13:20:09.103682  5061 fetcher.cpp:64] Extracted resource
'/tmp/mesos/slaves/20150402-121212-1644210368-5050-21808-S0/frameworks/20150402-121212-1644210368-5050-21808-0008/executors/myTopo-1-1430331608/runs/8ac24d9e-bb7f-49fe-9519-a07aa6a1d969/storm-mesos-0.9.3.tgz'
into
'/tmp/mesos/slaves/20150402-121212-1644210368-5050-21808-S0/frameworks/20150402-121212-1644210368-5050-21808-0008/executors/myTopo-1-1430331608/runs/8ac24d9e-bb7f-49fe-9519-a07aa6a1d969'
I0429 13:20:09.109590  5061 fetcher.cpp:76] Fetching URI '
http://hadoopmapr1.brewingintel.com:43938/conf/storm.yaml'
I0429 13:20:09.109658  5061 fetcher.cpp:126] Downloading '
http://hadoopmapr1.brewingintel.com:43938/conf/storm.yaml' to
'/tmp/mesos/slaves/20150402-121212-1644210368-5050-21808-S0/frameworks/20150402-121212-1644210368-5050-21808-0008/executors/myTopo-1-1430331608/runs/8ac24d9e-bb7f-49fe-9519-a07aa6a1d969/storm.yaml'
cp: cannot create regular file `storm-mesos*/conf': No such file or
directory


Re: Storm Mesos Error

2015-04-29 Thread John Omernik
Thanks Tim. That fixed it.  I unpacked, renamed to storm-mesos-0.9.3 and
repacked, copied to hdfs and executed, all is well... that's a bit unclear
to us n00bs in the audience, but I explained verbose here to help anyone
else who made the same mistake as me.



On Wed, Apr 29, 2015 at 1:34 PM, John Omernik j...@omernik.com wrote:

 I used the bin/build-release.sh package

 and it put in all in a folder named apache-storm-0.9.3... that's probably
 my problem? :)

 On Wed, Apr 29, 2015 at 1:30 PM, Tim Chen t...@mesosphere.io wrote:

 Hi John,

 Does your storm-mesos tar ball as a folder storm-mesos-0.9.3 in there?

 Tim

 On Wed, Apr 29, 2015 at 11:26 AM, John Omernik j...@omernik.com wrote:

 Greetings all,

 I got my storm nimbus running, but when I try to run a test topology,
 the task enters a lost state and  I get the below in my stderr on the
 sandbox. Note, the URL for the storm.yaml works fine, not sure why it's
 causing an issue on the cp.




 cp: cannot create regular file `storm-mesos*/conf': No such file or
 directory

 Full:

 WARNING: Logging before InitGoogleLogging() is written to STDERR
 I0429 13:20:08.873922  5061 fetcher.cpp:76] Fetching URI
 'file:///mapr/brewpot/mesos/storm-mesos-0.9.3.tgz'
 I0429 13:20:08.874048  5061 fetcher.cpp:179] Copying resource from
 '/mapr/brewpot/mesos/storm-mesos-0.9.3.tgz' to
 '/tmp/mesos/slaves/20150402-121212-1644210368-5050-21808-S0/frameworks/20150402-121212-1644210368-5050-21808-0008/executors/myTopo-1-1430331608/runs/8ac24d9e-bb7f-49fe-9519-a07aa6a1d969'
 I0429 13:20:09.103682  5061 fetcher.cpp:64] Extracted resource
 '/tmp/mesos/slaves/20150402-121212-1644210368-5050-21808-S0/frameworks/20150402-121212-1644210368-5050-21808-0008/executors/myTopo-1-1430331608/runs/8ac24d9e-bb7f-49fe-9519-a07aa6a1d969/storm-mesos-0.9.3.tgz'
 into
 '/tmp/mesos/slaves/20150402-121212-1644210368-5050-21808-S0/frameworks/20150402-121212-1644210368-5050-21808-0008/executors/myTopo-1-1430331608/runs/8ac24d9e-bb7f-49fe-9519-a07aa6a1d969'
 I0429 13:20:09.109590  5061 fetcher.cpp:76] Fetching URI '
 http://hadoopmapr1.brewingintel.com:43938/conf/storm.yaml'
 I0429 13:20:09.109658  5061 fetcher.cpp:126] Downloading '
 http://hadoopmapr1.brewingintel.com:43938/conf/storm.yaml' to
 '/tmp/mesos/slaves/20150402-121212-1644210368-5050-21808-S0/frameworks/20150402-121212-1644210368-5050-21808-0008/executors/myTopo-1-1430331608/runs/8ac24d9e-bb7f-49fe-9519-a07aa6a1d969/storm.yaml'
 cp: cannot create regular file `storm-mesos*/conf': No such file or
 directory






Re: Using mesos-dns in an enterprise

2015-04-02 Thread John Omernik
True :)


On Thu, Apr 2, 2015 at 3:37 PM, Tom Arnfeld t...@duedil.com wrote:

 Last time I checked haproxy didn't support UDP which would be key for
 mesos-dns.

 --

 Tom Arnfeld
 Developer // DueDil

 (+44) 7525940046
 25 Christopher Street, London, EC2A 2BS


 On Thu, Apr 2, 2015 at 3:53 PM, John Omernik j...@omernik.com wrote:

 That was my first response as well... I work at a bank, and the thought
 of changing dns servers on the clients everywhere made me roll my eyes :)

 John


 On Thu, Apr 2, 2015 at 9:39 AM, Tom Arnfeld t...@duedil.com wrote:

 This is great, thanks for sharing!

 It's nice to see other members of the community sharing more realistic
 implementations of DNS rather than just update your resolv conf and it
 works :-)

 --

 Tom Arnfeld
 Developer // DueDil

 (+44) 7525940046
 25 Christopher Street, London, EC2A 2BS


 On Thu, Apr 2, 2015 at 3:30 PM, John Omernik j...@omernik.com wrote:

 Based on my earlier emails about the state of service discovery.  I did
 some research and a little writeup on how to use mesos-dns as a forward
 lookup zone in a enterprise bind installation. I feel this is more secure,
 and more comfortable for an enterprise DNS team as opposed to changing the
 first resolver on every client that may interact with mesos to be the
 mesos-dns server.  Please feel free to modify/correct and include this in
 the mesos-dns documentation if you feel it's valuable.


 Goals/Thought Process
 - Run mesos-dns on a non-standard port. (such as 8053).  This allows
 you to run it as a non-root user.
 - While most DNS clients may not understand this (a different port), in
 an enterprise, most DNS servers will respect a forward lookup zone with a
 server using a different port.
 - Setup below for BIND9 allows you to keep all your mesos servers AND
 clients in an enterprise pointing their requests at your enterprise DNS
 server, rather than mesos-dns.
   - This is easier from an enterprise configuration standpoint. Make
 one change on your dns servers, rather than adding a resolver on all the
 clients.
   - This is more secure in that you can run mesos-dns as non-root (53
 is a privileged port, 8053 is not) no sudo required
   - For more security, you can limit connections to the mesos-dns
 server to only your enterprise dns servers. This could help mitigate any
 unknown vulnerabilities in mesos-dns.
   - This allows you to HA mesos-dns in that you can specify multiple
 resolvers for your bind configuration.




 Bind9 Config
 This was put into my named.conf.local It sets up the .mesos zone and
 forwards to mesos dns. All my mesos servers already pointed at this server,
 therefore no client changes required.


 #192.168.0.100 is my host running mesos DNS
 zone mesos {
 type forward;
 forward only;
 forwarders { 192.168.0.100 port 8053; };
 };




 config.json mesos-dns config file.
 I DID specify my internal DNS server in the resolvers (192.168.0.10)
 however, I am not sure if I need to do this.  Since only requests for
 .mesos will actually be sent to mesos-dns.

 {
   masters: [192.168.0.98:5050],
   refreshSeconds: 60,
   ttl: 60,
   domain: mesos,
   port: 8053,
   resolvers: [192.168.0.10],
   timeout: 5,
   listener: 0.0.0.0,
   email: root.mesos-dns.mesos
 }


 marathon start json
 Note the lack of sudo here. I also constrained it to one host for now,
 but that could change if needed.

 {
 cmd: /mapr/brewpot/mesos/mesos-dns/mesos-dns
 -config=/mapr/brewpot/mesos/mesos-dns/config.json,
 cpus: 1.0,
 mem: 1024,
 id: mesos-dns,
 instances: 1,
 constraints: [[hostname, CLUSTER, hadoopmapr1.brewingintel.com
 ]]
 }







Using mesos-dns in an enterprise

2015-04-02 Thread John Omernik
Based on my earlier emails about the state of service discovery.  I did
some research and a little writeup on how to use mesos-dns as a forward
lookup zone in a enterprise bind installation. I feel this is more secure,
and more comfortable for an enterprise DNS team as opposed to changing the
first resolver on every client that may interact with mesos to be the
mesos-dns server.  Please feel free to modify/correct and include this in
the mesos-dns documentation if you feel it's valuable.


Goals/Thought Process
- Run mesos-dns on a non-standard port. (such as 8053).  This allows you to
run it as a non-root user.
- While most DNS clients may not understand this (a different port), in an
enterprise, most DNS servers will respect a forward lookup zone with a
server using a different port.
- Setup below for BIND9 allows you to keep all your mesos servers AND
clients in an enterprise pointing their requests at your enterprise DNS
server, rather than mesos-dns.
  - This is easier from an enterprise configuration standpoint. Make one
change on your dns servers, rather than adding a resolver on all the
clients.
  - This is more secure in that you can run mesos-dns as non-root (53 is a
privileged port, 8053 is not) no sudo required
  - For more security, you can limit connections to the mesos-dns server to
only your enterprise dns servers. This could help mitigate any unknown
vulnerabilities in mesos-dns.
  - This allows you to HA mesos-dns in that you can specify multiple
resolvers for your bind configuration.




Bind9 Config
This was put into my named.conf.local It sets up the .mesos zone and
forwards to mesos dns. All my mesos servers already pointed at this server,
therefore no client changes required.


#192.168.0.100 is my host running mesos DNS
zone mesos {
type forward;
forward only;
forwarders { 192.168.0.100 port 8053; };
};




config.json mesos-dns config file.
I DID specify my internal DNS server in the resolvers (192.168.0.10)
however, I am not sure if I need to do this.  Since only requests for
.mesos will actually be sent to mesos-dns.

{
  masters: [192.168.0.98:5050],
  refreshSeconds: 60,
  ttl: 60,
  domain: mesos,
  port: 8053,
  resolvers: [192.168.0.10],
  timeout: 5,
  listener: 0.0.0.0,
  email: root.mesos-dns.mesos
}


marathon start json
Note the lack of sudo here. I also constrained it to one host for now, but
that could change if needed.

{
cmd: /mapr/brewpot/mesos/mesos-dns/mesos-dns
-config=/mapr/brewpot/mesos/mesos-dns/config.json,
cpus: 1.0,
mem: 1024,
id: mesos-dns,
instances: 1,
constraints: [[hostname, CLUSTER, hadoopmapr1.brewingintel.com]]
}


Re: Using mesos-dns in an enterprise

2015-04-02 Thread John Omernik
I wonder if you registered mesos-dns's port in marathon like you do
docker containers, if you could use the marathon-ha-proxy bridge in
conjunction to allow it to show up anywhere...

On Thu, Apr 2, 2015 at 11:08 AM, James DeFelice
james.defel...@gmail.com wrote:
 This is roughly how we've integrated consul dns at client sites. Bind config
 still needs updating if/when mesos dns relocates.

 --sent from my phone

 On Apr 2, 2015 10:30 AM, John Omernik j...@omernik.com wrote:

 Based on my earlier emails about the state of service discovery.  I did
 some research and a little writeup on how to use mesos-dns as a forward
 lookup zone in a enterprise bind installation. I feel this is more secure,
 and more comfortable for an enterprise DNS team as opposed to changing the
 first resolver on every client that may interact with mesos to be the
 mesos-dns server.  Please feel free to modify/correct and include this in
 the mesos-dns documentation if you feel it's valuable.


 Goals/Thought Process
 - Run mesos-dns on a non-standard port. (such as 8053).  This allows you
 to run it as a non-root user.
 - While most DNS clients may not understand this (a different port), in an
 enterprise, most DNS servers will respect a forward lookup zone with a
 server using a different port.
 - Setup below for BIND9 allows you to keep all your mesos servers AND
 clients in an enterprise pointing their requests at your enterprise DNS
 server, rather than mesos-dns.
   - This is easier from an enterprise configuration standpoint. Make one
 change on your dns servers, rather than adding a resolver on all the
 clients.
   - This is more secure in that you can run mesos-dns as non-root (53 is a
 privileged port, 8053 is not) no sudo required
   - For more security, you can limit connections to the mesos-dns server
 to only your enterprise dns servers. This could help mitigate any unknown
 vulnerabilities in mesos-dns.
   - This allows you to HA mesos-dns in that you can specify multiple
 resolvers for your bind configuration.




 Bind9 Config
 This was put into my named.conf.local It sets up the .mesos zone and
 forwards to mesos dns. All my mesos servers already pointed at this server,
 therefore no client changes required.


 #192.168.0.100 is my host running mesos DNS
 zone mesos {
 type forward;
 forward only;
 forwarders { 192.168.0.100 port 8053; };
 };




 config.json mesos-dns config file.
 I DID specify my internal DNS server in the resolvers (192.168.0.10)
 however, I am not sure if I need to do this.  Since only requests for .mesos
 will actually be sent to mesos-dns.

 {
   masters: [192.168.0.98:5050],
   refreshSeconds: 60,
   ttl: 60,
   domain: mesos,
   port: 8053,
   resolvers: [192.168.0.10],
   timeout: 5,
   listener: 0.0.0.0,
   email: root.mesos-dns.mesos
 }


 marathon start json
 Note the lack of sudo here. I also constrained it to one host for now, but
 that could change if needed.

 {
 cmd: /mapr/brewpot/mesos/mesos-dns/mesos-dns
 -config=/mapr/brewpot/mesos/mesos-dns/config.json,
 cpus: 1.0,
 mem: 1024,
 id: mesos-dns,
 instances: 1,
 constraints: [[hostname, CLUSTER, hadoopmapr1.brewingintel.com]]
 }


Current State of Service Discovery

2015-04-01 Thread John Omernik
I have been researching service discovery on Mesos quite a bit lately, and
due to my background, may be making assumptions that don't apply to a Mesos
Datacenter. I've read through docs, and I have come up with two main
approaches to service discovery, and both appear to have strengths and
weaknesses, and I wanted to describe what I've seen here, as well as the
challenges as I understand them to perhaps have any misconceptions I may
have corrected.

Basically, I see two main approaches to the service discovery on Mesos. You
have the mesos-dns (https://github.com/mesosphere/mesos-dns) package with
is a DNS based service discovery, and then you have HAProxy based discovery
(which can be represented by both the haproxy-marathon-bridge (
https://github.com/mesosphere/marathon/blob/master/bin/haproxy-marathon-bridge)
script and the Bamboo project (https://github.com/QubitProducts/bamboo)).

HAProxy

With the HAProxy method, as I see it, you basically install HAProxy on
every node. The two above mentioned projects query marathon to determine
where the services are running, and then rewrite the haproxy config on
every node to allow basically every node to listen on a specific port, and
from there, that port will be forwarded, via round robin to the actual
node/port combinations where the services running.

So, let's use the example of a Hive Thrift server running in a Docker
container on port 1.  Lets say you have a 5 node cluster, node1, node2,
etc. You spin that container up with instances = 3 in marathon, and
Marathon/docker run the container on node2, node3 and another on node2
 There is a bridged port to 1 inside the container, that is tied to an
available port on the physical node. Perhaps one instance on node2 gets
3 and the other instance gets 30001.  node3's instance is tied to port
30001.  So now you have 3 instances that exposed at

node2:3  - dockercontainer:1
node2:30001 - dockercontianer:1
node3:3 - dockercontainer:1

With the Haproxy setup, each node would get this in its local haproxy
config:

listen hivethrift-1
  bind 0.0.0.0:1
  mode tcp
  option tcplog
  balance leastconn
  server hivethrift-3 node2:3 check
  server hivethrift-2 node2:30001 check
  server hivethrift-1 node3:3 check

This would allow you to connect to any node in your cluster, on port 1
and be served one of the three containers running your hive thrift server.

Pretty neat? However, there are some challenges here:

1. You now have a total of 65536 ports for your data center. This method is
port only, basically your whole cluster listens on a port and it's
dedicated to one service.  This actually makes sense in some ways because
if you think of Mesos as a cluster operating system, the limitations of
TCP/UDP are such that each kernel has that many ports.  There isn't a
cluster TCP or UDP, just TCP and UDP.  That still is a lot of ports,
however, you do need to be aware of the limitation and manage your ports.
Especially since that number isn't really the total number of available
ports. There are ports in that 65536 that are reserved for cluster
operations, and/or stuff like hdfs.

2. You are now essentially adding a hop to your traffic that could affect
sensitive applications.  At least with the haproxy-marathon-bridge script,
the settings for each application is static from the script (an update here
would be to allow timeout settings, and other haproxy options to be set per
application and managed somewhere, and I think that maybe what bamboo may
offer, just haven't dug in yet).  So the glaring issue I found was
specifically with the hive thrift service.  You connect, you run some
queries, all is well. However, if you submit a query, and it's a long query
(longer then the default 5 ms timeout).  There may not be any packets
actually transferred in that time.  The client is ok with this, the server
is ok with this, however, haproxy sees no packets in it's timeout period,
and decides the connection is dead, closes it, and then you get problems.
I would imagine Thrift isn't the only service that may have situations like
this occur.  I need to do more research on how to get around this, there
may be some hope in hive 1.1.0 with thrift keep alives, however, not every
application service will have the option in the pipeline.

Mesos-DNS

This project came to my attention this week, and I am looking to get it
installed today to have hands on time with it.  Basically, it's a binary
that queries the mesos-master and develops A records that are hostnames,
based on the framework names, and SRV records based on the assigned ports.

This is where I get confused. I can see the A records being useful,
however, you would have to have your entire network be able to be use the
mesos-dns (including non-mesos systems).  Otherwise how would a client know
to connect to a .mesos domain name? Perhaps there should be a way to
integrate mesos-dns as the authoritative zone for .mesos in your 

HAProxy for Hive Thrift Server on Mesos

2015-03-21 Thread John Omernik
I have a nice setup with a Hive thrift server running in a docker
container on Mesos. It works pretty well, but something, I believe in
how HAProxy works with the connection, is causing the thrift server
connection  to die after a time.  Basically, I can run a few queries,
but after 2 or 3, or specifically after a longer query, I get the
error below indicating End of File on the connection.  Then no more
connections work until I reestablish the connection to the thrift
server. I've tried looking in logs. the thrift server std err logs
show no issues. I am guess I need to dig into haproxy logs, but I am
not seeing any issues in syslog so far.  I'd love any pointers on how
to trouble shoot this.  By the way, I have MySQl, hive metastore, and
a minecraft server all running on Mesos/Docker with no issues, not
sure why the thrift server is so sensitive.

:)


pyhs2 connection string:

hs2 = 
pyhs2.connect(host='marathonmaster',port=1,authMechanism='PLAIN',user='bestuser',password='removed',database='default')


Error

/usr/local/lib/python2.7/dist-packages/thrift/transport/TSocket.pyc in
read(self, sz)
118 if len(buff) == 0:
119   raise TTransportException(type=TTransportException.END_OF_FILE,
-- 120 message='TSocket read 0 bytes')
121 return buff
122

TTransportException: TSocket read 0 bytes


Running Spark on Mesos

2015-01-06 Thread John Omernik
I have Spark 1.2 running nicely with both the SparkSQL thrift server
and running it in iPython.

My question is this. I am running on Mesos in fine grained mode, what
is the appropriate way to manage the two instances? Should I run a
Course grained mode for the Spark SQL Thrift Server so that RDDs can
persist?  Should Run both as separate Spark instances in Fine Grain
Mode (I'ld have to change the port on one of them)  Is there a way to
have one spark driver server both things so I only use resources for
one driver?   How would you run this in a production environment?

Thanks!

John


Passing -D Java Options to hadoop-mesos

2014-11-21 Thread John Omernik
I'd like to pass some -D options to my java instance running hadoop on
hadoop-mesos. Where can I set that up to be properly passed through Mesos?

Thanks!

John


Re: hadoop-mesos error

2014-11-18 Thread John Omernik
Are there specific things you are looking for, that's a lot of information
you are looking for. (I can post them all, it's just heavy). One thing I
did shift form java6 to java7, I wonder if that playing a part here.  I
can't go back either :(.  I've tried updating and recompiling my
hadoop-mesos against the version of hadoop I am using, with no luck.  Still
getting the same error. For the JT config, are you looking for the whole
mapred-site?  Thanks!



On Tue, Nov 18, 2014 at 2:32 PM, Tom Arnfeld t...@duedil.com wrote:

 Hi John,

 Could you paste your JT configuration and the configuration that gets
 printed out by the executor?

 Also, what version of Hadoop are you running, and what revision of the
 framework?

 Cheers,

 Tom.

 --

 Tom Arnfeld
 Developer // DueDil

 (+44) 7525940046
 25 Christopher Street, London, EC2A 2BS


 On Tue, Nov 18, 2014 at 8:27 PM, John Omernik j...@omernik.com wrote:

 Hey all, I updated somethings on my cluster and in broke.  :)

 That said, I am at a loss, the JT spins up, however tasks fail right
 after the configuration listing with the error below, and am not sure how
 to get the debug information to troubleshoot this. Any pointers would be
 appreciated.

 Thanks!

  14/11/18 14:19:42 INFO mapred.TaskTracker: /tmp is not tmpfs or ramfs. Java 
 Hotspot Instrumentation will be disabled by default
 14/11/18 14:19:42 INFO mapred.TaskTracker: Cleaning up config files from the 
 job history folder
 java.lang.NumberFormatException: null
  at java.lang.Integer.parseInt(Integer.java:454)
  at java.lang.Integer.valueOf(Integer.java:582)
  at 
 org.apache.hadoop.mapred.TaskTracker.getResourceInfo(TaskTracker.java:2965)
  at org.apache.hadoop.mapred.TaskTracker.init(TaskTracker.java:2108)
  at 
 org.apache.hadoop.mapred.MesosExecutor.launchTask(MesosExecutor.java:80)
 Exception in thread Thread-1 I1118 14:19:42.040426  8716 exec.cpp:413] 
 Deactivating the executor libprocess






Re: Untaring Framework tgzs: Can we customize?

2014-09-13 Thread John Omernik
I would but I am out of pocket for a week, if you put the ticket in, I'll
buy you a drink of your choice at the next Mesos con?  ( I don't want this
to get lost as I am traveling) :)
On Sep 12, 2014 9:06 PM, Vinod Kone vinodk...@gmail.com wrote:

 Having a skip chown option sounds good to me. We'll add the option to
 CommandInfo.URI so that frameworks can override the default if desired.
 Mind filing a ticket?

 On Thu, Sep 11, 2014 at 5:00 AM, John Omernik j...@omernik.com wrote:

 Vinod -

 I believe this is EXACTLY the issue.  I also understand why in most cases
 this is ok. If a user is provided, then a fair assumption would be to chown
 the extracted archive as that user.  (Assuming the untar is happening as
 root in all cases)  So that leads to three components we may want to make
 customizable by the framework:

 1. Who untars the archive. Right now, it appears root untars the archive
 (otherwise, I would imagine that the chown would be unneeded, if the user
 untared the archive, the user would already have permissions, thus the
 chown would not be needed).  If it is root, perhaps this is ok to leave
 as is?  Another option may be to set the untar user separate from the
 running user, but I am not sure we'd need to if root always untars.

 2. Pass a flag from framework that allow a skipping of the chown. For
 compatibility sakes, the flag would default to off so that it wouldn't
 break existing things, but if the framework wanted, they could tell the
 slave that the permissions are fine how they are set, and there is no need
 to chown.  I am not sure I understand the architecture of Mesos well enough
 yet to comment on the best way to do this.  Should it be a framework
 variable? (Frameworks would have to be updated to make use of this)  A
 string in the filename (could this be abused?)  Etc.

 3.  The user that runs the executor.  This is already passed, and I am
 not sure we need to change anything here. As long as A. Root untars the
 archive, and B. We have the ability to skip the chown, the user stuff
 should be perfectly ok as is.  This way, in my case, root would untar the
 archive, I could set the skip on chown, and then I'd have the user hadoop
 run the framework.  In this model, the LinuxTaskController should work.

 Thanks for looking into this, I welcome more thoughts on the subject.

 John



 On Wed, Sep 10, 2014 at 4:39 PM, Vinod Kone vinodk...@gmail.com wrote:

 IanD: Mind helping John out here?

 My hunch here is that this is because the slave does chown() after
 extracting (
 https://github.com/apache/mesos/blob/master/src/launcher/fetcher.cpp#L258
 )?

 From POSIX standard, it looks like chown() when invoked by root doesn't
 clear the setuid bit for ordinary files but clears them for other types
 (e.g., binary).


 http://unix.stackexchange.com/questions/53665/chown-removes-sticky-bit-bug-or-feature
 http://pubs.opengroup.org/onlinepubs/009695399/utilities/chown.html


 On Wed, Sep 10, 2014 at 2:17 PM, John Omernik j...@omernik.com wrote:

 I am wondering about the process of fetching the tgz files and running
 them on slaves. Basically, I am trying to run hadoop-mesos, but still use
 the LinuxTaskController (
 http://hadoop.apache.org/docs/r1.0.4/cluster_setup.html for details).

 When I am using hadoop, I have to swich to the defaultTaskController
 because when Mesos untars the tgz, it loses the setuid bit on the binary.
 I've done a bit of testing around this, and I am unsure why it loses it
 (even if the running process is root) but it does.

 Basically, tar by itself works like this: If the user is a super user,
 tar maintain all permissions that are in the tgz. (I've tested this, when I
 manually untar with tar zxf myhadoop.tgz it untars properly, including
 permissions and setuid on the Linux Task Controller.)

 When I untar as a non-super user, the permissions all get moved to the
 user that untared it, and the setuid bit is lost. It makes sense from a
 security point of view.

 So how does this work in mesos and hadoop?

 Well, if I run the jobtracker as user hadoop, hadoop is not a super
 user, all the files in the untared hadooop folder are owned by
 hadoop:hadoop, and the setuid bit is lost.

 Ok, next test, well, let's run jobtracker as root, and see what
 happens.  (remember, when I untared as root, the setuid and all permissions
 were preserved).   So, when we run JT as root, all the files become
 root:root, and the setuid bit is lost.  That's weird?  What happened here?
 (This is where I get lost, perhaps the untar/gzipping isn't using the tar
 command thus permissions are not preserved like I would expect)

 Either way, when using the LinuxTaskController, tasktrackers WILL NOT
 RUN if the setuid bit is not set.  That's a pain, the LinuxTaskController
 is really nice from an impersonation/security setup with hadoop jobs.  I
 CAN run  my hadoop framework as hadoop:hadoop, but then I am limited in how
 things are setup and I get strange permissions issues when trying

Untaring Framework tgzs: Can we customize?

2014-09-10 Thread John Omernik
I am wondering about the process of fetching the tgz files and running them
on slaves. Basically, I am trying to run hadoop-mesos, but still use the
LinuxTaskController (http://hadoop.apache.org/docs/r1.0.4/cluster_setup.html
for details).

When I am using hadoop, I have to swich to the defaultTaskController
because when Mesos untars the tgz, it loses the setuid bit on the binary.
I've done a bit of testing around this, and I am unsure why it loses it
(even if the running process is root) but it does.

Basically, tar by itself works like this: If the user is a super user, tar
maintain all permissions that are in the tgz. (I've tested this, when I
manually untar with tar zxf myhadoop.tgz it untars properly, including
permissions and setuid on the Linux Task Controller.)

When I untar as a non-super user, the permissions all get moved to the user
that untared it, and the setuid bit is lost. It makes sense from a security
point of view.

So how does this work in mesos and hadoop?

Well, if I run the jobtracker as user hadoop, hadoop is not a super user,
all the files in the untared hadooop folder are owned by hadoop:hadoop, and
the setuid bit is lost.

Ok, next test, well, let's run jobtracker as root, and see what happens.
 (remember, when I untared as root, the setuid and all permissions were
preserved).   So, when we run JT as root, all the files become root:root,
and the setuid bit is lost.  That's weird?  What happened here? (This is
where I get lost, perhaps the untar/gzipping isn't using the tar command
thus permissions are not preserved like I would expect)

Either way, when using the LinuxTaskController, tasktrackers WILL NOT RUN
if the setuid bit is not set.  That's a pain, the LinuxTaskController is
really nice from an impersonation/security setup with hadoop jobs.  I CAN
run  my hadoop framework as hadoop:hadoop, but then I am limited in how
things are setup and I get strange permissions issues when trying to run
certain jobs as other users.

The fix?

I am hoping we can have a discussion around this.  As I see it, the slaves
are running as root, they have the power to run however we need them to
run.  Ideally, I'd like to see the untarring happen with the preserve
permissions bit. I.e. the archives for mesos, at the very least having the
OPTION to preserve permissions in the tgz. If we could do this, as an
option somehow, this would be a win.

Also ideally, I don't want to run the framework as root, just untar the tgz
as root, preserving permissions.  There is a difference between the action
of untarring, and the execution of the framework, and the security nerd in
me would like to ensure while the slave COULD run the framework as root, we
avoid it if possible.

I am not sure how exactly mesos untars things, nor am I aware how hard it
would be to do this, but I think from a security perspective, the
flexibility that untarting/preserving permissions (especially the setuid
bit) would bring Mesos would warrant the dev time.

Thoughts?


Re: Sandbox Log Links

2014-09-07 Thread John Omernik
Ya,  just confirmed, when I set --work_dir=anything  (anything being even
the default /tmp/mesos/slave/ ) there are no sandbox logs, yet, when I
leave it off on the slave, then it shows the sandbox.  Anything thoughts?
Anyone able to reproduce?




On Thu, Sep 4, 2014 at 7:23 PM, John Omernik j...@omernik.com wrote:

 No firewalls.  When I changed the slave work Dir it fixed ...I wonder if
 its a permissions thing?
 On Sep 4, 2014 5:38 PM, Dick Davies d...@hellooperator.net wrote:

 I don't think that's the issue - i have a custom work_dir too and can
 see the logs fine.

 Don't they still get served up from the slaves themselves (port 5051)?
 Maybe you've got
 a firewall blocking that from where you're viewing the mesos ui?

 On 4 September 2014 23:58, John Omernik j...@omernik.com wrote:
  Thanks Tim. Some testing showed that when I moved to 0.20, I setup the
  slaves to use a specific log directory rather than just default to /tmp.
  Basically, if you specify a customer work_dir for the slave, the master
  doesn't know (I am guessing?) where to find to logs? This seems like
  something that should work (if you change the work_dir, it should
 update the
  master with where to look for logs in the gui).  Thoughts?
 
 
  On Thu, Sep 4, 2014 at 5:34 PM, Tim Chen t...@mesosphere.io wrote:
 
  Hi John,
 
  Take a look at the slave log and see if your task failed, what was the
  failure message that was part of your task failure.
 
  Tim
 
 
  On Thu, Sep 4, 2014 at 3:24 PM, John Omernik j...@omernik.com wrote:
 
  Hey all, I upgraded to 0.20 and when I click on sandbox, the link is
  good, but there are not futher links for logs (i.e. standard err, out
 etc)
  like there was in 0.19. I have changed my log location, but it should
 still
  work... Curious on what I can look at to troubleshoot.
 
  Thanks!
 
  John
 
 
 




Re: Sandbox Log Links

2014-09-04 Thread John Omernik
Thanks Tim. Some testing showed that when I moved to 0.20, I setup the
slaves to use a specific log directory rather than just default to /tmp.
 Basically, if you specify a customer work_dir for the slave, the master
doesn't know (I am guessing?) where to find to logs? This seems like
something that should work (if you change the work_dir, it should update
the master with where to look for logs in the gui).  Thoughts?


On Thu, Sep 4, 2014 at 5:34 PM, Tim Chen t...@mesosphere.io wrote:

 Hi John,

 Take a look at the slave log and see if your task failed, what was the
 failure message that was part of your task failure.

 Tim


 On Thu, Sep 4, 2014 at 3:24 PM, John Omernik j...@omernik.com wrote:

 Hey all, I upgraded to 0.20 and when I click on sandbox, the link is
 good, but there are not futher links for logs (i.e. standard err, out etc)
 like there was in 0.19. I have changed my log location, but it should still
 work... Curious on what I can look at to troubleshoot.

 Thanks!

 John





Re: Struggling with task controller Permissions on Hadoop Mesos

2014-08-22 Thread John Omernik
Just to keep all updated:

The issue is MapR uses the LinuxTaskController by default. I went back to
the DefaultTaskController, and this fixed my issue!

Thanks!

http://hadoop.apache.org/docs/r1.0.4/cluster_setup.html


On Tue, Aug 19, 2014 at 1:47 PM, John Omernik j...@omernik.com wrote:

 Well yes, I know they are set correctly, I am curious on how your mesos
 unpacks the hadoop tarball.  That file doesn't exist though? Interesting...
 in your hadoop tar, where does the task-controller binary live?


 On Tue, Aug 19, 2014 at 12:00 PM, Brenden Matthews 
 brenden.matth...@airbedandbreakfast.com wrote:

 I'm not sure which file you're referring to (since that doesn't exist),
 but I can assure you the permissions are set correctly.


 On Tue, Aug 19, 2014 at 9:14 AM, John Omernik j...@omernik.com wrote:

 Can you do me a favor? on one of your running tasks, or recently
 completed tasks, in the Mesos task, click on it, go to the logs (it shows
 the stderr and stdout) and then drill into the extracted hadoop package to
 /hadoop-version/bin/Linux-amd64-64/bin and let me know what the
 owner/permissions of task-controller are for you?  I would be interested to
 know if it maintains setuid for you.







Data Center Awareness - Mesos

2014-08-21 Thread John Omernik
I was wondering... does Mesos have any concept of datacenter awareness?
I.e. if you have two primary data centers, nodes can be flagged as such,
and then certain frameworks can be localized to a datacenter, or, if the
frameworks allow, be distributed across high latency links?  Or is this all
just crazy talk? :)


Re: Struggling with task controller Permissions on Hadoop Mesos

2014-08-19 Thread John Omernik
Yes

On Monday, August 18, 2014, Vinod Kone vinodk...@gmail.com wrote:


 On Sat, Aug 16, 2014 at 4:26 AM, John Omernik j...@omernik.com
 javascript:_e(%7B%7D,'cvml','j...@omernik.com'); wrote:

 I've confirmed on the package I am using that when I untar it using tar
 zxf as root, that the task-controller does NOT lose the setuid bit.  But on
 the lost tasks in Mesos I get the error below.  What's interesting is that
 if drill down to the directory, the owner is root:root, but just the
 setuid bit is missing.


 What user is the slave running as? root?



-- 
Sent from my iThing


Re: Struggling with task controller Permissions on Hadoop Mesos

2014-08-19 Thread John Omernik
Can you do me a favor? on one of your running tasks, or recently completed
tasks, in the Mesos task, click on it, go to the logs (it shows the stderr
and stdout) and then drill into the extracted hadoop package to
/hadoop-version/bin/Linux-amd64-64/bin and let me know what the
owner/permissions of task-controller are for you?  I would be interested to
know if it maintains setuid for you.


Re: Alternate HDFS Filesystems + Hadoop on Mesos

2014-08-18 Thread John Omernik
Adam - I am new to using Jira properly. (I couldn't find the JIRA for the
Tachyon change as an example, so I linked to the code... is that ok?)

I created

https://issues.apache.org/jira/browse/MESOS-1711

If you wouldn't mind taking a quick look to make sure I filled things out
correctly to get addressed I'd appreciate it. If you want to hit me up off
list with any recommendations on what I did to make it better in the
future, I'd appreciate it as well.

Thanks!

John



On Mon, Aug 18, 2014 at 4:43 AM, Adam Bordelon a...@mesosphere.io wrote:

 Okay, I guess MapRFS is protocol compatible with HDFS, but not
 uri-compatible. I know the MapR guys have gotten MapR on Mesos working.
 They may have more answers for you on how they accomplished this.

  why hard code the file prefixes?
 We allow any uri, so we need to have handlers coded for each type of
 protocol group, which so far includes hdfs/hftp/s3/s3n which use
 hdfs::copyToLocal, or http/https/ftp/ftps which use net::download, or
 file:// or an absolute/relative path for files pre-populated on the machine
 (uses 'cp'). MapRFS (and Tachyon) would probably fit into the
 hdfs::copyToLocal group so easily that it would be a one-line fix each.

  I really think the hdfs vs other prefixes should be looked at
 I agree. Could you file a JIRA with your request? It should be an easy
 enough change for us to pick up. I would also like to see Tachyon as a
 possible filesystem for the fetcher.


 On Fri, Aug 15, 2014 at 5:16 PM, John Omernik j...@omernik.com wrote:

 I tried hdfs:/// and hdfs://cldbnode:7222/ Neither worked (examples
 below) I really think the hdfs vs other prefixes should be looked at. Like
 I said above, the tachyon project just added a env variable to address
 this.



 hdfs://cldbnode:7222/

 WARNING: Logging before InitGoogleLogging() is written to STDERR
 I0815 19:14:17.101666 22022 fetcher.cpp:76] Fetching URI 
 'hdfs://hadoopmapr1:7222/mesos/hadoop-0.20.2-mapr-4.0.0.tgz'
 I0815 19:14:17.101780 22022 fetcher.cpp:105] Downloading resource from 
 'hdfs://hadoopmapr1:7222/mesos/hadoop-0.20.2-mapr-4.0.0.tgz' to 
 '/tmp/mesos/slaves/20140815-103603-1677764800-5050-24315-2/frameworks/20140815-154511-1677764800-5050-7162-0003/executors/executor_Task_Tracker_5/runs/b3174e72-75ea-48be-bbb8-a9a6cc605018/hadoop-0.20.2-mapr-4.0.0.tgz'
 E0815 19:14:17.778833 22022 fetcher.cpp:109] HDFS copyToLocal failed: hadoop 
 fs -copyToLocal 'hdfs://hadoopmapr1:7222/mesos/hadoop-0.20.2-mapr-4.0.0.tgz' 
 '/tmp/mesos/slaves/20140815-103603-1677764800-5050-24315-2/frameworks/20140815-154511-1677764800-5050-7162-0003/executors/executor_Task_Tracker_5/runs/b3174e72-75ea-48be-bbb8-a9a6cc605018/hadoop-0.20.2-mapr-4.0.0.tgz'
 WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please 
 use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties 
 files.
 -copyToLocal: Wrong FS: 
 maprfs://hadoopmapr1:7222/mesos/hadoop-0.20.2-mapr-4.0.0.tgz, expected: 
 hdfs://hadoopmapr1:7222/mesos/hadoop-0.20.2-mapr-4.0.0.tgz
 Usage: hadoop fs [generic options] -copyToLocal [-p] [-ignoreCrc] [-crc] 
 src ... localdst
 Failed to fetch: hdfs://hadoopmapr1:7222/mesos/hadoop-0.20.2-mapr-4.0.0.tgz
 Failed to synchronize with slave (it's probably exited)




 hdfs:///





 I0815 19:10:45.006803 21508 fetcher.cpp:76] Fetching URI 
 'hdfs:///mesos/hadoop-0.20.2-mapr-4.0.0.tgz'
 I0815 19:10:45.007099 21508 fetcher.cpp:105] Downloading resource from 
 'hdfs:///mesos/hadoop-0.20.2-mapr-4.0.0.tgz' to 
 '/tmp/mesos/slaves/20140815-103603-1677764800-5050-24315-2/frameworks/20140815-154511-1677764800-5050-7162-0002/executors/executor_Task_Tracker_2/runs/22689054-aff6-4f7c-9746-a068a11ff000/hadoop-0.20.2-mapr-4.0.0.tgz'
 E0815 19:10:45.681922 21508 fetcher.cpp:109] HDFS copyToLocal failed: hadoop 
 fs -copyToLocal 'hdfs:///mesos/hadoop-0.20.2-mapr-4.0.0.tgz' 
 '/tmp/mesos/slaves/20140815-103603-1677764800-5050-24315-2/frameworks/20140815-154511-1677764800-5050-7162-0002/executors/executor_Task_Tracker_2/runs/22689054-aff6-4f7c-9746-a068a11ff000/hadoop-0.20.2-mapr-4.0.0.tgz'
 WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please 
 use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties 
 files.
 -copyToLocal: Wrong FS: maprfs:/mesos/hadoop-0.20.2-mapr-4.0.0.tgz, 
 expected: hdfs:///mesos/hadoop-0.20.2-mapr-4.0.0.tgz
 Usage: hadoop fs [generic options] -copyToLocal [-p] [-ignoreCrc] [-crc] 
 src ... localdst
 Failed to fetch: hdfs:///mesos/hadoop-0.20.2-mapr-4.0.0.tgz
 Failed to synchronize with slave (it's probably exited)



 On Fri, Aug 15, 2014 at 5:38 PM, John Omernik j...@omernik.com wrote:

 I am away from my cluster right now, I trued doing a hadoop fs -ls
 maprfs:// and that worked.   When I tries hadoop fs -ls hdfs:/// it failed
 with wrong fs type.  With that error I didn't try it in the mapred-site.  I
 will try it.  Still...why hard code the file prefixes? I guess I am curious
 on how glusterfs would work, or others

Re: Alternate HDFS Filesystems + Hadoop on Mesos

2014-08-15 Thread John Omernik
I tried hdfs:/// and hdfs://cldbnode:7222/ Neither worked (examples below)
I really think the hdfs vs other prefixes should be looked at. Like I said
above, the tachyon project just added a env variable to address this.



hdfs://cldbnode:7222/

WARNING: Logging before InitGoogleLogging() is written to STDERR
I0815 19:14:17.101666 22022 fetcher.cpp:76] Fetching URI
'hdfs://hadoopmapr1:7222/mesos/hadoop-0.20.2-mapr-4.0.0.tgz'
I0815 19:14:17.101780 22022 fetcher.cpp:105] Downloading resource from
'hdfs://hadoopmapr1:7222/mesos/hadoop-0.20.2-mapr-4.0.0.tgz' to
'/tmp/mesos/slaves/20140815-103603-1677764800-5050-24315-2/frameworks/20140815-154511-1677764800-5050-7162-0003/executors/executor_Task_Tracker_5/runs/b3174e72-75ea-48be-bbb8-a9a6cc605018/hadoop-0.20.2-mapr-4.0.0.tgz'
E0815 19:14:17.778833 22022 fetcher.cpp:109] HDFS copyToLocal failed:
hadoop fs -copyToLocal
'hdfs://hadoopmapr1:7222/mesos/hadoop-0.20.2-mapr-4.0.0.tgz'
'/tmp/mesos/slaves/20140815-103603-1677764800-5050-24315-2/frameworks/20140815-154511-1677764800-5050-7162-0003/executors/executor_Task_Tracker_5/runs/b3174e72-75ea-48be-bbb8-a9a6cc605018/hadoop-0.20.2-mapr-4.0.0.tgz'
WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated.
Please use org.apache.hadoop.log.metrics.EventCounter in all the
log4j.properties files.
-copyToLocal: Wrong FS:
maprfs://hadoopmapr1:7222/mesos/hadoop-0.20.2-mapr-4.0.0.tgz,
expected: hdfs://hadoopmapr1:7222/mesos/hadoop-0.20.2-mapr-4.0.0.tgz
Usage: hadoop fs [generic options] -copyToLocal [-p] [-ignoreCrc]
[-crc] src ... localdst
Failed to fetch: hdfs://hadoopmapr1:7222/mesos/hadoop-0.20.2-mapr-4.0.0.tgz
Failed to synchronize with slave (it's probably exited)




hdfs:///


I0815 19:10:45.006803 21508 fetcher.cpp:76] Fetching URI
'hdfs:///mesos/hadoop-0.20.2-mapr-4.0.0.tgz'
I0815 19:10:45.007099 21508 fetcher.cpp:105] Downloading resource from
'hdfs:///mesos/hadoop-0.20.2-mapr-4.0.0.tgz' to
'/tmp/mesos/slaves/20140815-103603-1677764800-5050-24315-2/frameworks/20140815-154511-1677764800-5050-7162-0002/executors/executor_Task_Tracker_2/runs/22689054-aff6-4f7c-9746-a068a11ff000/hadoop-0.20.2-mapr-4.0.0.tgz'
E0815 19:10:45.681922 21508 fetcher.cpp:109] HDFS copyToLocal failed:
hadoop fs -copyToLocal 'hdfs:///mesos/hadoop-0.20.2-mapr-4.0.0.tgz'
'/tmp/mesos/slaves/20140815-103603-1677764800-5050-24315-2/frameworks/20140815-154511-1677764800-5050-7162-0002/executors/executor_Task_Tracker_2/runs/22689054-aff6-4f7c-9746-a068a11ff000/hadoop-0.20.2-mapr-4.0.0.tgz'
WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated.
Please use org.apache.hadoop.log.metrics.EventCounter in all the
log4j.properties files.
-copyToLocal: Wrong FS: maprfs:/mesos/hadoop-0.20.2-mapr-4.0.0.tgz,
expected: hdfs:///mesos/hadoop-0.20.2-mapr-4.0.0.tgz
Usage: hadoop fs [generic options] -copyToLocal [-p] [-ignoreCrc]
[-crc] src ... localdst
Failed to fetch: hdfs:///mesos/hadoop-0.20.2-mapr-4.0.0.tgz
Failed to synchronize with slave (it's probably exited)



On Fri, Aug 15, 2014 at 5:38 PM, John Omernik j...@omernik.com wrote:

 I am away from my cluster right now, I trued doing a hadoop fs -ls
 maprfs:// and that worked.   When I tries hadoop fs -ls hdfs:/// it failed
 with wrong fs type.  With that error I didn't try it in the mapred-site.  I
 will try it.  Still...why hard code the file prefixes? I guess I am curious
 on how glusterfs would work, or others as they pop up.
  On Aug 15, 2014 5:04 PM, Adam Bordelon a...@mesosphere.io wrote:

 Can't you just use the hdfs:// protocol for maprfs? That should work just
 fine.


 On Fri, Aug 15, 2014 at 2:50 PM, John Omernik j...@omernik.com wrote:

 Thanks all.

 I realized MapR has a work around for me that I will try soon in that I
 have MapR fs NFS mounted on each node, I.e. I should be able to get the tar
 from there.

 That said, perhaps someone with better coding skills than me could
 provide an env variable where a user could provide the HDFS prefixes to
 try. I know we did that with the tachyon project and it works well for
 other HDFS compatible fs implementations, perhaps that would work here?
 Hard coding a pluggable system seems like a long term issue that will keep
 coming up.
  On Aug 15, 2014 4:02 PM, Tim St Clair tstcl...@redhat.com wrote:

 The uri doesn't currently start with any of the known types (at least
 on 1st grok).
 You could redirect via a proxy that does the job for you.

 | if you had some fuse mount that would work too.

 Cheers,
 Tim

 --

 *From: *John Omernik j...@omernik.com
 *To: *user@mesos.apache.org
 *Sent: *Friday, August 15, 2014 3:55:02 PM
 *Subject: *Alternate HDFS Filesystems + Hadoop on Mesos

 I am on a wonderful journey trying to get hadoop on Mesos working with
 MapR.   I feel like I am close, but when the slaves try to run the packaged
 Hadoop, I get the error below.  The odd thing is,  I KNOW I got Spark
 running on Mesos pulling both data and the packages from MapRFS.  So I am
 confused why there is and issue

Re: Does Mesos support Hadoop MR V2

2014-07-27 Thread John Omernik
So excuse my naivety in this space, but my ignorance has never really
stopped me from asking questions:

I see YARN (Yet another resource negotiator) as very similar to Mesos. I.e.
something to manage resources on a cluster of machines. So when I hear talk
of running YARN on Mesos it's seems very redundant indeed, and I ask
myself, what are we actually getting out of this setup?

So, going to the mapr/reduce question, I see Mapr Reduce V1 and MaprReduce
V2 like this:  Map Reduce V2 is an application that runs on YARN. I.e. if
you run a job, it creates an application master, that application master
requests resources, and the job gets run.  It differs from Map Reduce V1 is
there is no long running Job Tracker (other than the YARN Resource Manager,
but that is managing resources for all applications, not just Map Reduce
Applications).  Ok, so Mesos, why can't there be a Mesos Application that
is similar to a Map Reduce V2 Application in YARN?  Why do we need to run
YARN on Mesos? That doesn't really make sense.  Basically, for M/R V2 vs
M/R V1, the only difference is to mimic M/R V1 we need task trackers and
job trackers running as Mesos applications (which we have).  So in M/R v2,
we just need the equivalent of an application master running on Yarn,
requesting resources across the cluster.

Fundamentally, YARN is confusing because I think they coupled running Map
Reduce jobs with the resource manager and called it Hadoop v2.  By
coupling the two, people look at YARN as Map Reduce V2, but it's not
really.  It's a way to running jobs on a cluster of machines (ala Mesos)
with a application that is the equivalent of Map Reduce V1.   The names
being given seem to be confusing to me, it makes people who have invested
in Hadoop (Map Reduce V1) be very interested in YARN because it's called
Hadoop V2.  While Mesos is seen as the Other


Just for my sake I summarized a TL;DR form so if someone wants to correct
my understanding they can

Mesos = Tool to manage resources

YARN = Tool to manage resources it's also called Hadoopv2

Map Reduce V1 = Job trackers/Task Trackers it's what we know. It can run on
Hadoop clusters, and Mesos.  It's also called Hadoopv1

Map Reduce V2 =  Application that can run on YARN that mimics Map Reduce V1
on a YARN Cluster. This + YARN has been called Hadoopv2.


















On Sun, Jul 27, 2014 at 4:10 AM, Maxime Brugidou maxime.brugi...@gmail.com
wrote:

 When I said that running yarn over mesos did not make sense I meant that
 running a resource manager in a resource manager was very sub-optimal. You
 will eventually do static allocation of resources for the Yarn framework in
 Mesos or have complex logic to determine how much resource should be given
 to yarn. You will also have the same burden of managing 2 different
 clusters instead of one, even if yarn is sort of hidden as mesos framework.

 However yes I believe its easier to run yarn on mesos than to run mrv2 on
 top of mesos. The solution I was discussing was obviously ideal and I
 looked at the MRAppMaster since and it discouraged me :)
  On Jul 27, 2014 12:41 AM, Rick Richardson rick.richard...@gmail.com
 wrote:

 FWIW I also think the fastest approach here is is porting Yarn onto
 Mesos.

 In a perfect world, writing an implementation layer for the Yarn
 Interface on Mesos would certainly be the optimal approach, but looking at
 the MRv2 code, it is very very coupled to many Yarn modules.

 If someone wanted to take on the project of making a generic resource
 scheduler Interface for MRv2, that works be amazing :)
 On Jul 26, 2014 6:19 PM, Jie Yu yujie@gmail.com wrote:

 I am interested in investigating the idea of YARN on top of Mesos. One
 of the benefits I can think of is that we can get rid of the static
 resource allocation between YARN and Mesos clusters. In that way, Mesos can
 allocate those resources that are not used by YARN to other Mesos
 frameworks like Aurora, Marathon, etc, to increase the resource utilization
 of the entire data center. Also, we could avoid running each MRv2 job as a
 framework which I think might cause some maintenance complexity (e.g. for
 framework rate limiting, etc). Finally, YARN currently does not have a good
 isolation support. It only supports cpu isolation right now (using
 cgroups). By porting YARN on top of Mesos, we might be able to leverage the
 existing Mesos containerizer strategy to provide better isolation between
 tasks. Maxime, I am curious why do you think it does not make sense to run
 YARN over Mesos? Since I am not super familar with YARN, I might be missing
 something.

 I have been thinking of making ResourceManager in YARN a Mesos framework
 and making NodeManager a Mesos executor. The NodeManager will launch
 containers using primitives provided by Mesos so that we have a consistent
 containerizer layer. I haven't fully figured out how this could be done yet
 (e.g., nested containers, communication between NodeManager and
 ResourceManager, etc.), but I would