Re: Myriad talk link for MesosCon?

2016-03-23 Thread Adam Bordelon
Based on the count, that looks like all ~150 submissions.
I couldn't find any Myriad-specific, so I guess none of us submitted a
Myriad talk to MesosCon.
Found this though:
Experiences running HPC & Big Data frameworks on Cray Analytics Platform
"... Cray ..  leverage Apache Mesos and create interfaces to allow other
resources managers (Slurm and YARN) to dynamically acquire and release
resources from Mesos."
Not sure if they tried Mesos or used their own thing.


On Wed, Mar 23, 2016 at 10:17 AM, Darin Johnson 
wrote:

> Yeah I didn't see one either.
>
> Darin
>
> On Wed, Mar 23, 2016 at 1:10 PM, Sarjeet Singh 
> wrote:
>
> > I couldn't find any associated link of myriad talk for MesosCon voting.
> > Anyone?
> >
> > Though, I found these proposal doc:
> >
> > Developers: http://bit.ly/1RpZPvj
> > Users: http://bit.ly/1Mspaxp
> >
> >
> > *It seems the deadline for the proposal voting is today, March 23 2016.*
> >
> > -Sarjeet
> >
>


Re: Myriad talk link for MesosCon?

2016-03-23 Thread Darin Johnson
Yeah I didn't see one either.

Darin

On Wed, Mar 23, 2016 at 1:10 PM, Sarjeet Singh 
wrote:

> I couldn't find any associated link of myriad talk for MesosCon voting.
> Anyone?
>
> Though, I found these proposal doc:
>
> Developers: http://bit.ly/1RpZPvj
> Users: http://bit.ly/1Mspaxp
>
>
> *It seems the deadline for the proposal voting is today, March 23 2016.*
>
> -Sarjeet
>


Myriad talk link for MesosCon?

2016-03-23 Thread Sarjeet Singh
I couldn't find any associated link of myriad talk for MesosCon voting.
Anyone?

Though, I found these proposal doc:

Developers: http://bit.ly/1RpZPvj
Users: http://bit.ly/1Mspaxp


*It seems the deadline for the proposal voting is today, March 23 2016.*

-Sarjeet


[jira] [Created] (MYRIAD-192) Better Support Cgroups

2016-03-23 Thread DarinJ (JIRA)
DarinJ created MYRIAD-192:
-

 Summary: Better Support Cgroups
 Key: MYRIAD-192
 URL: https://issues.apache.org/jira/browse/MYRIAD-192
 Project: Myriad
  Issue Type: Bug
  Components: Scheduler
Affects Versions: Myriad 0.1.0
Reporter: DarinJ


Current many of the options for cgroups are hard coded into Myriad.  These 
should be configurable.  In addition we should no longer chown the sandbox 
directory to yarn in `DownloadNMExecutorCLGenImpl.java`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MYRIAD-188) Zero sized node managers can cause the Resource Manager to crash with an NPE

2016-03-23 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MYRIAD-188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MYRIAD-188:
--
Fix Version/s: Myriad 0.1.1

> Zero sized node managers can cause the Resource Manager to crash with an NPE
> 
>
> Key: MYRIAD-188
> URL: https://issues.apache.org/jira/browse/MYRIAD-188
> Project: Myriad
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: Myriad 0.1.0
>Reporter: DarinJ
>Assignee: DarinJ
> Fix For: Myriad 0.2.0, Myriad 0.1.1
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MYRIAD-153) Placeholder tasks yarn_container_* is not cleaned after yarn job is complete.

2016-03-23 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MYRIAD-153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MYRIAD-153:
--
Fix Version/s: Myriad 0.1.1

> Placeholder tasks yarn_container_* is not cleaned after yarn job is complete.
> -
>
> Key: MYRIAD-153
> URL: https://issues.apache.org/jira/browse/MYRIAD-153
> Project: Myriad
>  Issue Type: Bug
>Reporter: Sarjeet Singh
>Assignee: DarinJ
> Fix For: Myriad 0.2.0, Myriad 0.1.1
>
> Attachments: Mesos_UI_screeshot_placeholder_tasks_running.png
>
>
> Observed the placeholder tasks for containers launched on FGS are still in 
> RUNNING state on mesos. These container tasks are not cleaned up properly 
> after job is finished completely.
> see screenshot attached for mesos UI with placeholder tasks still running.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: 0.2.0 release

2016-03-23 Thread Darin Johnson
Swanil,

I concur and want to keep both options for Mesos and Docker networking
available, and putting the configuration for both in should be a priority.
However, one has to be careful with this as the NM's register with the RM
via heartbeats with their container port (Not the host port), this isn't an
issue if NM and RM are in the same Docker Network, via Weave or Kubernetes
but is with simple bridged networking. We also have to be careful as Myriad
currently doesn't run HDFS itself so we'd lose data locality.  My idea was
the start with Host Networking so we could make Myriad easier to deploy but
leave room to add additional networking options: basically exposing all the
protobuf options for Docker Parameters (used to configure docker
networking) and NetworkInfo (used to configure Mesos networking).

Darin

On Tue, Mar 22, 2016 at 2:48 PM, Swapnil Daingade 
wrote:

> Hi Darin,
>
> I feel docker networking is something we should spent time to think
> through.
> A user should be able to use multiple options provided by Mesos, Docker,
> 3rd party etc
>
> It would be great if we can abstract the specific implementation to provide
> container ip addresses behind interfaces. User should be able to switch
> implementations by making simple changes in configuration files.
>
> Regards
> Swapnil
>
>
> On Tue, Mar 22, 2016 at 8:20 AM, Darin Johnson 
> wrote:
>
> > Swapnil,
> >
> > Any help would be appreciated.  I'll try to write up what I'm working on
> > tomorrow.  But essentially the ideas are:
> > 1. Ability to launch the resource manager and node managers in docker
> > containers
> > 2. Use host networking for now (Ports configured to be pulled from mesos
> -
> > ability to use ports reserved by role), but leave hooks to easily add IP
> > per container.
> > 3. Ability to get configuration files for a URI
> > 4. Ability to mount local volumes for local directories in the shuffle
> > phase etc (though will require more config).
> >
> > Darin
> >
>


Re: NM does not start with cgroups enabled

2016-03-23 Thread Darin Johnson
Hey, Bjorn sorry for the delay, looking at the difference between the
exceptions and my own experience I believe you left some cgroup configs in
yarn-site.xml of the node manager.
On Mar 18, 2016 2:58 AM, "Björn Hagemeier" 
wrote:

> Hi Darin,
>
> thanks a lot for this. But what about the other case below, when cgroups
> is disabled?
>
>
> Björn
>
> Am 18.03.2016 um 00:25 schrieb Darin Johnson:
> > Hey Bjorn,
> >
> > I think I figured out the issue.  Some of the values for cgroups are
> still
> > hardcoded in myriad.  I'll add a JIRA Ticket hopefully we can get an
> update
> > for 0.2.0.  I'll also respond to this thread after a pull request is
> > submitted in case you'd like to test it.
> >
> > Darin
> > Hi all,
> >
> > I have trouble starting the NM on the slave nodes. Apparently, it does
> > not find it's configuration or sth. is wrong with the configuration.
> >
> > With cgroups enabled, the NM does not start, the logs contain,
> > indicating that there is sth. wrong in the configuratin. However,
> > yarn.nodemanager.linux-container-executor.group is set (to "yarn"). The
> > value used to be "${yarn.nodemanager.linux-container-executor.group}" as
> > indicated by the installation documentation, however I'm uncertain
> > whether this recursion is the correct approach.
> >
> >
> > ==
> > 16/03/14 09:32:45 FATAL nodemanager.NodeManager: Error starting
> NodeManager
> > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to
> > initialize container executor
> > at
> >
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:213)
> > at
> > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> > at
> >
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:474)
> > at
> >
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:521)
> > Caused by: java.io.IOException: Linux container executor not configured
> > properly (error=24)
> > at
> >
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:193)
> > at
> >
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:211)
> > ... 3 more
> > Caused by: ExitCodeException exitCode=24: Can't get configured value for
> > yarn.nodemanager.linux-container-executor.group.
> >
> > at org.apache.hadoop.util.Shell.runCommand(Shell.java:543)
> > at org.apache.hadoop.util.Shell.run(Shell.java:460)
> > at
> > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:720)
> > at
> >
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:187)
> > ... 4 more
> > ==
> >
> >
> > I have given it another try with cgroups disabled (in
> > myriad-config-default.yml), I seem to get a little further, but still
> > stuck at running Yarn jobs:
> >
> > ==
> > 16/03/14 10:56:34 INFO container.Container: Container
> > container_1457949199710_0001_01_01 transitioned from LOCALIZED to
> > RUNNING
> > 16/03/14 10:56:34 INFO nodemanager.DefaultContainerExecutor:
> > launchContainer: [bash,
> >
> /var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/bjoernh/appcache/application_1457949199710_0001/container_1457949199710_0001_01_01/default_container_executor.sh]
> > 16/03/14 10:56:34 WARN nodemanager.DefaultContainerExecutor: Exit code
> > from container container_1457949199710_0001_01_01 is : 1
> > 16/03/14 10:56:34 WARN nodemanager.DefaultContainerExecutor: Exception
> > from container-launch with container ID:
> > container_1457949199710_0001_01_01 and exit code: 1
> > ExitCodeException exitCode=1:
> > at org.apache.hadoop.util.Shell.runCommand(Shell.java:543)
> > at org.apache.hadoop.util.Shell.run(Shell.java:460)
> > at
> > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:720)
> > at
> >
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:210)
> > at
> >
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
> > at
> >
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
> > at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> > at java.lang.Thread.run(Thread.java:745)
> > 16/03/14 10:56:34 INFO nodemanager.ContainerExecutor: Exception from
> > container-launch.
> > 16/03/14 10:56:34