Re: Myriad talk link for MesosCon?
Based on the count, that looks like all ~150 submissions. I couldn't find any Myriad-specific, so I guess none of us submitted a Myriad talk to MesosCon. Found this though: Experiences running HPC & Big Data frameworks on Cray Analytics Platform "... Cray .. leverage Apache Mesos and create interfaces to allow other resources managers (Slurm and YARN) to dynamically acquire and release resources from Mesos." Not sure if they tried Mesos or used their own thing. On Wed, Mar 23, 2016 at 10:17 AM, Darin Johnsonwrote: > Yeah I didn't see one either. > > Darin > > On Wed, Mar 23, 2016 at 1:10 PM, Sarjeet Singh > wrote: > > > I couldn't find any associated link of myriad talk for MesosCon voting. > > Anyone? > > > > Though, I found these proposal doc: > > > > Developers: http://bit.ly/1RpZPvj > > Users: http://bit.ly/1Mspaxp > > > > > > *It seems the deadline for the proposal voting is today, March 23 2016.* > > > > -Sarjeet > > >
Re: Myriad talk link for MesosCon?
Yeah I didn't see one either. Darin On Wed, Mar 23, 2016 at 1:10 PM, Sarjeet Singhwrote: > I couldn't find any associated link of myriad talk for MesosCon voting. > Anyone? > > Though, I found these proposal doc: > > Developers: http://bit.ly/1RpZPvj > Users: http://bit.ly/1Mspaxp > > > *It seems the deadline for the proposal voting is today, March 23 2016.* > > -Sarjeet >
Myriad talk link for MesosCon?
I couldn't find any associated link of myriad talk for MesosCon voting. Anyone? Though, I found these proposal doc: Developers: http://bit.ly/1RpZPvj Users: http://bit.ly/1Mspaxp *It seems the deadline for the proposal voting is today, March 23 2016.* -Sarjeet
[jira] [Created] (MYRIAD-192) Better Support Cgroups
DarinJ created MYRIAD-192: - Summary: Better Support Cgroups Key: MYRIAD-192 URL: https://issues.apache.org/jira/browse/MYRIAD-192 Project: Myriad Issue Type: Bug Components: Scheduler Affects Versions: Myriad 0.1.0 Reporter: DarinJ Current many of the options for cgroups are hard coded into Myriad. These should be configurable. In addition we should no longer chown the sandbox directory to yarn in `DownloadNMExecutorCLGenImpl.java`. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MYRIAD-188) Zero sized node managers can cause the Resource Manager to crash with an NPE
[ https://issues.apache.org/jira/browse/MYRIAD-188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam B updated MYRIAD-188: -- Fix Version/s: Myriad 0.1.1 > Zero sized node managers can cause the Resource Manager to crash with an NPE > > > Key: MYRIAD-188 > URL: https://issues.apache.org/jira/browse/MYRIAD-188 > Project: Myriad > Issue Type: Bug > Components: Scheduler >Affects Versions: Myriad 0.1.0 >Reporter: DarinJ >Assignee: DarinJ > Fix For: Myriad 0.2.0, Myriad 0.1.1 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MYRIAD-153) Placeholder tasks yarn_container_* is not cleaned after yarn job is complete.
[ https://issues.apache.org/jira/browse/MYRIAD-153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam B updated MYRIAD-153: -- Fix Version/s: Myriad 0.1.1 > Placeholder tasks yarn_container_* is not cleaned after yarn job is complete. > - > > Key: MYRIAD-153 > URL: https://issues.apache.org/jira/browse/MYRIAD-153 > Project: Myriad > Issue Type: Bug >Reporter: Sarjeet Singh >Assignee: DarinJ > Fix For: Myriad 0.2.0, Myriad 0.1.1 > > Attachments: Mesos_UI_screeshot_placeholder_tasks_running.png > > > Observed the placeholder tasks for containers launched on FGS are still in > RUNNING state on mesos. These container tasks are not cleaned up properly > after job is finished completely. > see screenshot attached for mesos UI with placeholder tasks still running. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: 0.2.0 release
Swanil, I concur and want to keep both options for Mesos and Docker networking available, and putting the configuration for both in should be a priority. However, one has to be careful with this as the NM's register with the RM via heartbeats with their container port (Not the host port), this isn't an issue if NM and RM are in the same Docker Network, via Weave or Kubernetes but is with simple bridged networking. We also have to be careful as Myriad currently doesn't run HDFS itself so we'd lose data locality. My idea was the start with Host Networking so we could make Myriad easier to deploy but leave room to add additional networking options: basically exposing all the protobuf options for Docker Parameters (used to configure docker networking) and NetworkInfo (used to configure Mesos networking). Darin On Tue, Mar 22, 2016 at 2:48 PM, Swapnil Daingadewrote: > Hi Darin, > > I feel docker networking is something we should spent time to think > through. > A user should be able to use multiple options provided by Mesos, Docker, > 3rd party etc > > It would be great if we can abstract the specific implementation to provide > container ip addresses behind interfaces. User should be able to switch > implementations by making simple changes in configuration files. > > Regards > Swapnil > > > On Tue, Mar 22, 2016 at 8:20 AM, Darin Johnson > wrote: > > > Swapnil, > > > > Any help would be appreciated. I'll try to write up what I'm working on > > tomorrow. But essentially the ideas are: > > 1. Ability to launch the resource manager and node managers in docker > > containers > > 2. Use host networking for now (Ports configured to be pulled from mesos > - > > ability to use ports reserved by role), but leave hooks to easily add IP > > per container. > > 3. Ability to get configuration files for a URI > > 4. Ability to mount local volumes for local directories in the shuffle > > phase etc (though will require more config). > > > > Darin > > >
Re: NM does not start with cgroups enabled
Hey, Bjorn sorry for the delay, looking at the difference between the exceptions and my own experience I believe you left some cgroup configs in yarn-site.xml of the node manager. On Mar 18, 2016 2:58 AM, "Björn Hagemeier"wrote: > Hi Darin, > > thanks a lot for this. But what about the other case below, when cgroups > is disabled? > > > Björn > > Am 18.03.2016 um 00:25 schrieb Darin Johnson: > > Hey Bjorn, > > > > I think I figured out the issue. Some of the values for cgroups are > still > > hardcoded in myriad. I'll add a JIRA Ticket hopefully we can get an > update > > for 0.2.0. I'll also respond to this thread after a pull request is > > submitted in case you'd like to test it. > > > > Darin > > Hi all, > > > > I have trouble starting the NM on the slave nodes. Apparently, it does > > not find it's configuration or sth. is wrong with the configuration. > > > > With cgroups enabled, the NM does not start, the logs contain, > > indicating that there is sth. wrong in the configuratin. However, > > yarn.nodemanager.linux-container-executor.group is set (to "yarn"). The > > value used to be "${yarn.nodemanager.linux-container-executor.group}" as > > indicated by the installation documentation, however I'm uncertain > > whether this recursion is the correct approach. > > > > > > == > > 16/03/14 09:32:45 FATAL nodemanager.NodeManager: Error starting > NodeManager > > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to > > initialize container executor > > at > > > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:213) > > at > > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > > at > > > org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:474) > > at > > > org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:521) > > Caused by: java.io.IOException: Linux container executor not configured > > properly (error=24) > > at > > > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:193) > > at > > > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:211) > > ... 3 more > > Caused by: ExitCodeException exitCode=24: Can't get configured value for > > yarn.nodemanager.linux-container-executor.group. > > > > at org.apache.hadoop.util.Shell.runCommand(Shell.java:543) > > at org.apache.hadoop.util.Shell.run(Shell.java:460) > > at > > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:720) > > at > > > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:187) > > ... 4 more > > == > > > > > > I have given it another try with cgroups disabled (in > > myriad-config-default.yml), I seem to get a little further, but still > > stuck at running Yarn jobs: > > > > == > > 16/03/14 10:56:34 INFO container.Container: Container > > container_1457949199710_0001_01_01 transitioned from LOCALIZED to > > RUNNING > > 16/03/14 10:56:34 INFO nodemanager.DefaultContainerExecutor: > > launchContainer: [bash, > > > /var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/bjoernh/appcache/application_1457949199710_0001/container_1457949199710_0001_01_01/default_container_executor.sh] > > 16/03/14 10:56:34 WARN nodemanager.DefaultContainerExecutor: Exit code > > from container container_1457949199710_0001_01_01 is : 1 > > 16/03/14 10:56:34 WARN nodemanager.DefaultContainerExecutor: Exception > > from container-launch with container ID: > > container_1457949199710_0001_01_01 and exit code: 1 > > ExitCodeException exitCode=1: > > at org.apache.hadoop.util.Shell.runCommand(Shell.java:543) > > at org.apache.hadoop.util.Shell.run(Shell.java:460) > > at > > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:720) > > at > > > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:210) > > at > > > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) > > at > > > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) > > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > > at > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > > at > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > > at java.lang.Thread.run(Thread.java:745) > > 16/03/14 10:56:34 INFO nodemanager.ContainerExecutor: Exception from > > container-launch. > > 16/03/14 10:56:34