subject:"NM does not start with cgroups enabled"

Re: NM does not start with cgroups enabled

2016-05-05 Thread Darin Johnson

Bjorn, I don't know if you're still experimenting with Myriad, but I
believe I've got a fix for your issue.  I'm going to try to get it in our
next release, so if you have any feedback it would be great.  I verified it
on a couple small systems.

https://github.com/apache/incubator-myriad/pull/69

On Wed, Mar 23, 2016 at 8:17 AM, Darin Johnson 
wrote:

> Hey, Bjorn sorry for the delay, looking at the difference between the
> exceptions and my own experience I believe you left some cgroup configs in
> yarn-site.xml of the node manager.
> On Mar 18, 2016 2:58 AM, "Björn Hagemeier" 
> wrote:
>
>> Hi Darin,
>>
>> thanks a lot for this. But what about the other case below, when cgroups
>> is disabled?
>>
>>
>> Björn
>>
>> Am 18.03.2016 um 00:25 schrieb Darin Johnson:
>> > Hey Bjorn,
>> >
>> > I think I figured out the issue.  Some of the values for cgroups are
>> still
>> > hardcoded in myriad.  I'll add a JIRA Ticket hopefully we can get an
>> update
>> > for 0.2.0.  I'll also respond to this thread after a pull request is
>> > submitted in case you'd like to test it.
>> >
>> > Darin
>> > Hi all,
>> >
>> > I have trouble starting the NM on the slave nodes. Apparently, it does
>> > not find it's configuration or sth. is wrong with the configuration.
>> >
>> > With cgroups enabled, the NM does not start, the logs contain,
>> > indicating that there is sth. wrong in the configuratin. However,
>> > yarn.nodemanager.linux-container-executor.group is set (to "yarn"). The
>> > value used to be "${yarn.nodemanager.linux-container-executor.group}" as
>> > indicated by the installation documentation, however I'm uncertain
>> > whether this recursion is the correct approach.
>> >
>> >
>> > ==
>> > 16/03/14 09:32:45 FATAL nodemanager.NodeManager: Error starting
>> NodeManager
>> > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to
>> > initialize container executor
>> > at
>> >
>> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:213)
>> > at
>> > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>> > at
>> >
>> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:474)
>> > at
>> >
>> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:521)
>> > Caused by: java.io.IOException: Linux container executor not configured
>> > properly (error=24)
>> > at
>> >
>> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:193)
>> > at
>> >
>> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:211)
>> > ... 3 more
>> > Caused by: ExitCodeException exitCode=24: Can't get configured value for
>> > yarn.nodemanager.linux-container-executor.group.
>> >
>> > at org.apache.hadoop.util.Shell.runCommand(Shell.java:543)
>> > at org.apache.hadoop.util.Shell.run(Shell.java:460)
>> > at
>> >
>> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:720)
>> > at
>> >
>> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:187)
>> > ... 4 more
>> > ==
>> >
>> >
>> > I have given it another try with cgroups disabled (in
>> > myriad-config-default.yml), I seem to get a little further, but still
>> > stuck at running Yarn jobs:
>> >
>> > ==
>> > 16/03/14 10:56:34 INFO container.Container: Container
>> > container_1457949199710_0001_01_01 transitioned from LOCALIZED to
>> > RUNNING
>> > 16/03/14 10:56:34 INFO nodemanager.DefaultContainerExecutor:
>> > launchContainer: [bash,
>> >
>> /var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/bjoernh/appcache/application_1457949199710_0001/container_1457949199710_0001_01_01/default_container_executor.sh]
>> > 16/03/14 10:56:34 WARN nodemanager.DefaultContainerExecutor: Exit code
>> > from container container_1457949199710_0001_01_01 is : 1
>> > 16/03/14 10:56:34 WARN nodemanager.DefaultContainerExecutor: Exception
>> > from container-launch with container ID:
>> > container_1457949199710_0001_01_01 and exit code: 1
>> > ExitCodeException exitCode=1:
>> > at org.apache.hadoop.util.Shell.runCommand(Shell.java:543)
>> > at org.apache.hadoop.util.Shell.run(Shell.java:460)
>> > at
>> >
>> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:720)
>> > at
>> >
>> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:210)
>> > at
>> >
>> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
>> > at
>> >
>> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)

Re: NM does not start with cgroups enabled

2016-03-23 Thread Darin Johnson

Hey, Bjorn sorry for the delay, looking at the difference between the
exceptions and my own experience I believe you left some cgroup configs in
yarn-site.xml of the node manager.
On Mar 18, 2016 2:58 AM, "Björn Hagemeier" 
wrote:

> Hi Darin,
>
> thanks a lot for this. But what about the other case below, when cgroups
> is disabled?
>
>
> Björn
>
> Am 18.03.2016 um 00:25 schrieb Darin Johnson:
> > Hey Bjorn,
> >
> > I think I figured out the issue.  Some of the values for cgroups are
> still
> > hardcoded in myriad.  I'll add a JIRA Ticket hopefully we can get an
> update
> > for 0.2.0.  I'll also respond to this thread after a pull request is
> > submitted in case you'd like to test it.
> >
> > Darin
> > Hi all,
> >
> > I have trouble starting the NM on the slave nodes. Apparently, it does
> > not find it's configuration or sth. is wrong with the configuration.
> >
> > With cgroups enabled, the NM does not start, the logs contain,
> > indicating that there is sth. wrong in the configuratin. However,
> > yarn.nodemanager.linux-container-executor.group is set (to "yarn"). The
> > value used to be "${yarn.nodemanager.linux-container-executor.group}" as
> > indicated by the installation documentation, however I'm uncertain
> > whether this recursion is the correct approach.
> >
> >
> > ==
> > 16/03/14 09:32:45 FATAL nodemanager.NodeManager: Error starting
> NodeManager
> > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to
> > initialize container executor
> > at
> >
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:213)
> > at
> > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> > at
> >
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:474)
> > at
> >
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:521)
> > Caused by: java.io.IOException: Linux container executor not configured
> > properly (error=24)
> > at
> >
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:193)
> > at
> >
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:211)
> > ... 3 more
> > Caused by: ExitCodeException exitCode=24: Can't get configured value for
> > yarn.nodemanager.linux-container-executor.group.
> >
> > at org.apache.hadoop.util.Shell.runCommand(Shell.java:543)
> > at org.apache.hadoop.util.Shell.run(Shell.java:460)
> > at
> > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:720)
> > at
> >
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:187)
> > ... 4 more
> > ==
> >
> >
> > I have given it another try with cgroups disabled (in
> > myriad-config-default.yml), I seem to get a little further, but still
> > stuck at running Yarn jobs:
> >
> > ==
> > 16/03/14 10:56:34 INFO container.Container: Container
> > container_1457949199710_0001_01_01 transitioned from LOCALIZED to
> > RUNNING
> > 16/03/14 10:56:34 INFO nodemanager.DefaultContainerExecutor:
> > launchContainer: [bash,
> >
> /var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/bjoernh/appcache/application_1457949199710_0001/container_1457949199710_0001_01_01/default_container_executor.sh]
> > 16/03/14 10:56:34 WARN nodemanager.DefaultContainerExecutor: Exit code
> > from container container_1457949199710_0001_01_01 is : 1
> > 16/03/14 10:56:34 WARN nodemanager.DefaultContainerExecutor: Exception
> > from container-launch with container ID:
> > container_1457949199710_0001_01_01 and exit code: 1
> > ExitCodeException exitCode=1:
> > at org.apache.hadoop.util.Shell.runCommand(Shell.java:543)
> > at org.apache.hadoop.util.Shell.run(Shell.java:460)
> > at
> > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:720)
> > at
> >
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:210)
> > at
> >
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
> > at
> >
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
> > at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> > at java.lang.Thread.run(Thread.java:745)
> > 16/03/14 10:56:34 INFO nodemanager.ContainerExecutor: Exception from
> > container-launch.
> > 16/03/14 10:56:34 INFO nodemanager.ContainerExec

Re: NM does not start with cgroups enabled

2016-03-20 Thread Darin Johnson

Hey Bjorn,

I think I figured out the issue.  Some of the values for cgroups are still
hardcoded in myriad.  I'll add a JIRA Ticket hopefully we can get an update
for 0.2.0.  I'll also respond to this thread after a pull request is
submitted in case you'd like to test it.

Darin
Hi all,

I have trouble starting the NM on the slave nodes. Apparently, it does
not find it's configuration or sth. is wrong with the configuration.

With cgroups enabled, the NM does not start, the logs contain,
indicating that there is sth. wrong in the configuratin. However,
yarn.nodemanager.linux-container-executor.group is set (to "yarn"). The
value used to be "${yarn.nodemanager.linux-container-executor.group}" as
indicated by the installation documentation, however I'm uncertain
whether this recursion is the correct approach.


==
16/03/14 09:32:45 FATAL nodemanager.NodeManager: Error starting NodeManager
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to
initialize container executor
at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:213)
at
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:474)
at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:521)
Caused by: java.io.IOException: Linux container executor not configured
properly (error=24)
at
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:193)
at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:211)
... 3 more
Caused by: ExitCodeException exitCode=24: Can't get configured value for
yarn.nodemanager.linux-container-executor.group.

at org.apache.hadoop.util.Shell.runCommand(Shell.java:543)
at org.apache.hadoop.util.Shell.run(Shell.java:460)
at
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:720)
at
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:187)
... 4 more
==


I have given it another try with cgroups disabled (in
myriad-config-default.yml), I seem to get a little further, but still
stuck at running Yarn jobs:

==
16/03/14 10:56:34 INFO container.Container: Container
container_1457949199710_0001_01_01 transitioned from LOCALIZED to
RUNNING
16/03/14 10:56:34 INFO nodemanager.DefaultContainerExecutor:
launchContainer: [bash,
/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/bjoernh/appcache/application_1457949199710_0001/container_1457949199710_0001_01_01/default_container_executor.sh]
16/03/14 10:56:34 WARN nodemanager.DefaultContainerExecutor: Exit code
from container container_1457949199710_0001_01_01 is : 1
16/03/14 10:56:34 WARN nodemanager.DefaultContainerExecutor: Exception
from container-launch with container ID:
container_1457949199710_0001_01_01 and exit code: 1
ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:543)
at org.apache.hadoop.util.Shell.run(Shell.java:460)
at
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:720)
at
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:210)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
16/03/14 10:56:34 INFO nodemanager.ContainerExecutor: Exception from
container-launch.
16/03/14 10:56:34 INFO nodemanager.ContainerExecutor: Container id:
container_1457949199710_0001_01_01
16/03/14 10:56:34 INFO nodemanager.ContainerExecutor: Exit code: 1
==

Unfortunately, directory
/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/bjoernh/appcache/
is empty, the log indicates that it is being deleted after the failed
attempt.

Again, any hint would be useful. Also regarding the activation of cgroups.


Best regards,
Björn

--
Dipl.-Inform. Björn Hagemeier
Federated Systems and Data
Juelich Supercomputing Centre
Institute for Advanced Simulation

Phone: +49 2461 61 1584
Fax  : +49 2461 61 6656
Email: b.hageme...@fz-juelich.de
Skype: bhagemeier
WWW  : http://www.fz-juelich.de/jsc

JSC is the coordinator of the
John von Neumann Institute fo

Re: NM does not start with cgroups enabled

2016-03-19 Thread Björn Hagemeier

Hi Darin,

container-executor.cfg on the master node:
==
yarn.nodemanager.linux-container-executor.group=yarn #configured value
of yarn.nodemanager.linux-container-executor.group
banned.users=#comma separated list of users who can not run applications
min.user.id=1000#Prevent other super-users
allowed.system.users=bjoernh,yarn##comma separated list of system users
who CAN run applications
==

On the slave nodes:
==
#configured value of yarn.nodemanager.linux-container-executor.group
yarn.nodemanager.linux-container-executor.group=yarn
#comma separated list of users who can not run applications
banned.users=hfds,yarn,mapred,bin
#Prevent other super-users
min.user.id=99
#comma separated list of system users who CAN run applications
allowed.system.users=
==

The difference comes from not having defined the installation of NM in
Puppet on the master node. I already played with diff. values for the
allowed.system.users, but had no success so far.

Is it correct that the container-executor.cfg is only relevant on the NM
nodes?


Best regards and thanks for your efforts,
Björn

Am 16.03.2016 um 07:10 schrieb Darin Johnson:
> what does your container-executor.cfg look like?  Seems like
> yarn.nodemanager.linux-container-executor.group isn't set, or possibly
> bannerusers= hasn't been set (some distro's).
> 
> On Tue, Mar 15, 2016 at 12:52 PM, Darin Johnson 
> wrote:
> 
>> Bjorn,
>>
>> You're isolation configuration is correct, I was going from memory.  I'll
>> take a look at you're configs a little later on my test environment and see
>> what I can come up with.
>>
>> Darin
>>
>> On Tue, Mar 15, 2016 at 12:07 PM, Björn Hagemeier <
>> b.hageme...@fz-juelich.de> wrote:
>>
>>> Dear Darin,
>>>
>>> thanks for your response.
>>>
>>> The precise content of /etc/mesos-slave/isolation is:
>>>
>>> ==
>>> cgroups/cpu,cgroups/mem
>>> ==
>>>
>>> Which I took from some documentation, it may have been that of the
>>> Puppet module I'm using [1]. Should the values be different? Your string
>>> looks a bit different: "cpu/cgroups,memory/cgroups".
>>>
>>> Please find my yarn-site.xml and myriad-config-default.yml attached. I
>>> don't think they contain any sensitive information.
>>>
>>>
>>> Best regards,
>>> Björn
>>>
>>> [1] https://github.com/deric/puppet-mesos
>>>
>>> Am 15.03.2016 um 16:46 schrieb Darin Johnson:
 Hey Bjorn,

 Can you copy paste the relevant part of the Myriad and yarn-site.xml?
 Also, can you ensure you are running the mesos-slave with
 --isolation="cpu/cgroups,memory/cgroups?.

 I'll try to recreate the problem and/or tell you what's missing in the
 config.

 Darin

 On Mon, Mar 14, 2016 at 6:19 AM, Björn Hagemeier <
>>> b.hageme...@fz-juelich.de>
 wrote:

> Hi all,
>
> I have trouble starting the NM on the slave nodes. Apparently, it does
> not find it's configuration or sth. is wrong with the configuration.
>
> With cgroups enabled, the NM does not start, the logs contain,
> indicating that there is sth. wrong in the configuratin. However,
> yarn.nodemanager.linux-container-executor.group is set (to "yarn"). The
> value used to be "${yarn.nodemanager.linux-container-executor.group}"
>>> as
> indicated by the installation documentation, however I'm uncertain
> whether this recursion is the correct approach.
>
>
> ==
> 16/03/14 09:32:45 FATAL nodemanager.NodeManager: Error starting
>>> NodeManager
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to
> initialize container executor
> at
>
>
>>> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:213)
> at
>
>>> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at
>
>
>>> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:474)
> at
>
>
>>> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:521)
> Caused by: java.io.IOException: Linux container executor not configured
> properly (error=24)
> at
>
>
>>> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:193)
> at
>
>
>>> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:211)
> ... 3 more
> Caused by: ExitCodeException exitCode=24: Can't get configured value
>>> for
> yarn.nodemanager.linux-container-executor.group.
>
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:543)
>

Re: NM does not start with cgroups enabled

2016-03-19 Thread Björn Hagemeier

Hi Darin,

thanks a lot for this. But what about the other case below, when cgroups
is disabled?


Björn

Am 18.03.2016 um 00:25 schrieb Darin Johnson:
> Hey Bjorn,
> 
> I think I figured out the issue.  Some of the values for cgroups are still
> hardcoded in myriad.  I'll add a JIRA Ticket hopefully we can get an update
> for 0.2.0.  I'll also respond to this thread after a pull request is
> submitted in case you'd like to test it.
> 
> Darin
> Hi all,
> 
> I have trouble starting the NM on the slave nodes. Apparently, it does
> not find it's configuration or sth. is wrong with the configuration.
> 
> With cgroups enabled, the NM does not start, the logs contain,
> indicating that there is sth. wrong in the configuratin. However,
> yarn.nodemanager.linux-container-executor.group is set (to "yarn"). The
> value used to be "${yarn.nodemanager.linux-container-executor.group}" as
> indicated by the installation documentation, however I'm uncertain
> whether this recursion is the correct approach.
> 
> 
> ==
> 16/03/14 09:32:45 FATAL nodemanager.NodeManager: Error starting NodeManager
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to
> initialize container executor
> at
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:213)
> at
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:474)
> at
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:521)
> Caused by: java.io.IOException: Linux container executor not configured
> properly (error=24)
> at
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:193)
> at
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:211)
> ... 3 more
> Caused by: ExitCodeException exitCode=24: Can't get configured value for
> yarn.nodemanager.linux-container-executor.group.
> 
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:543)
> at org.apache.hadoop.util.Shell.run(Shell.java:460)
> at
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:720)
> at
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:187)
> ... 4 more
> ==
> 
> 
> I have given it another try with cgroups disabled (in
> myriad-config-default.yml), I seem to get a little further, but still
> stuck at running Yarn jobs:
> 
> ==
> 16/03/14 10:56:34 INFO container.Container: Container
> container_1457949199710_0001_01_01 transitioned from LOCALIZED to
> RUNNING
> 16/03/14 10:56:34 INFO nodemanager.DefaultContainerExecutor:
> launchContainer: [bash,
> /var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/bjoernh/appcache/application_1457949199710_0001/container_1457949199710_0001_01_01/default_container_executor.sh]
> 16/03/14 10:56:34 WARN nodemanager.DefaultContainerExecutor: Exit code
> from container container_1457949199710_0001_01_01 is : 1
> 16/03/14 10:56:34 WARN nodemanager.DefaultContainerExecutor: Exception
> from container-launch with container ID:
> container_1457949199710_0001_01_01 and exit code: 1
> ExitCodeException exitCode=1:
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:543)
> at org.apache.hadoop.util.Shell.run(Shell.java:460)
> at
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:720)
> at
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:210)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> 16/03/14 10:56:34 INFO nodemanager.ContainerExecutor: Exception from
> container-launch.
> 16/03/14 10:56:34 INFO nodemanager.ContainerExecutor: Container id:
> container_1457949199710_0001_01_01
> 16/03/14 10:56:34 INFO nodemanager.ContainerExecutor: Exit code: 1
> ==
> 
> Unfortunately, directory
> /var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/bjoernh/appcache/
> is empty, the log indicates that it is being deleted after the failed
> attempt.
> 
> Again, any hint would be useful. Also regarding the activation of cg

Re: NM does not start with cgroups enabled

2016-03-15 Thread Darin Johnson

what does your container-executor.cfg look like?  Seems like
yarn.nodemanager.linux-container-executor.group isn't set, or possibly
bannerusers= hasn't been set (some distro's).

On Tue, Mar 15, 2016 at 12:52 PM, Darin Johnson 
wrote:

> Bjorn,
>
> You're isolation configuration is correct, I was going from memory.  I'll
> take a look at you're configs a little later on my test environment and see
> what I can come up with.
>
> Darin
>
> On Tue, Mar 15, 2016 at 12:07 PM, Björn Hagemeier <
> b.hageme...@fz-juelich.de> wrote:
>
>> Dear Darin,
>>
>> thanks for your response.
>>
>> The precise content of /etc/mesos-slave/isolation is:
>>
>> ==
>> cgroups/cpu,cgroups/mem
>> ==
>>
>> Which I took from some documentation, it may have been that of the
>> Puppet module I'm using [1]. Should the values be different? Your string
>> looks a bit different: "cpu/cgroups,memory/cgroups".
>>
>> Please find my yarn-site.xml and myriad-config-default.yml attached. I
>> don't think they contain any sensitive information.
>>
>>
>> Best regards,
>> Björn
>>
>> [1] https://github.com/deric/puppet-mesos
>>
>> Am 15.03.2016 um 16:46 schrieb Darin Johnson:
>> > Hey Bjorn,
>> >
>> > Can you copy paste the relevant part of the Myriad and yarn-site.xml?
>> > Also, can you ensure you are running the mesos-slave with
>> > --isolation="cpu/cgroups,memory/cgroups?.
>> >
>> > I'll try to recreate the problem and/or tell you what's missing in the
>> > config.
>> >
>> > Darin
>> >
>> > On Mon, Mar 14, 2016 at 6:19 AM, Björn Hagemeier <
>> b.hageme...@fz-juelich.de>
>> > wrote:
>> >
>> >> Hi all,
>> >>
>> >> I have trouble starting the NM on the slave nodes. Apparently, it does
>> >> not find it's configuration or sth. is wrong with the configuration.
>> >>
>> >> With cgroups enabled, the NM does not start, the logs contain,
>> >> indicating that there is sth. wrong in the configuratin. However,
>> >> yarn.nodemanager.linux-container-executor.group is set (to "yarn"). The
>> >> value used to be "${yarn.nodemanager.linux-container-executor.group}"
>> as
>> >> indicated by the installation documentation, however I'm uncertain
>> >> whether this recursion is the correct approach.
>> >>
>> >>
>> >> ==
>> >> 16/03/14 09:32:45 FATAL nodemanager.NodeManager: Error starting
>> NodeManager
>> >> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to
>> >> initialize container executor
>> >> at
>> >>
>> >>
>> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:213)
>> >> at
>> >>
>> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>> >> at
>> >>
>> >>
>> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:474)
>> >> at
>> >>
>> >>
>> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:521)
>> >> Caused by: java.io.IOException: Linux container executor not configured
>> >> properly (error=24)
>> >> at
>> >>
>> >>
>> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:193)
>> >> at
>> >>
>> >>
>> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:211)
>> >> ... 3 more
>> >> Caused by: ExitCodeException exitCode=24: Can't get configured value
>> for
>> >> yarn.nodemanager.linux-container-executor.group.
>> >>
>> >> at org.apache.hadoop.util.Shell.runCommand(Shell.java:543)
>> >> at org.apache.hadoop.util.Shell.run(Shell.java:460)
>> >> at
>> >>
>> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:720)
>> >> at
>> >>
>> >>
>> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:187)
>> >> ... 4 more
>> >> ==
>> >>
>> >>
>> >> I have given it another try with cgroups disabled (in
>> >> myriad-config-default.yml), I seem to get a little further, but still
>> >> stuck at running Yarn jobs:
>> >>
>> >> ==
>> >> 16/03/14 10:56:34 INFO container.Container: Container
>> >> container_1457949199710_0001_01_01 transitioned from LOCALIZED to
>> >> RUNNING
>> >> 16/03/14 10:56:34 INFO nodemanager.DefaultContainerExecutor:
>> >> launchContainer: [bash,
>> >>
>> >>
>> /var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/bjoernh/appcache/application_1457949199710_0001/container_1457949199710_0001_01_01/default_container_executor.sh]
>> >> 16/03/14 10:56:34 WARN nodemanager.DefaultContainerExecutor: Exit code
>> >> from container container_1457949199710_0001_01_01 is : 1
>> >> 16/03/14 10:56:34 WARN nodemanager.DefaultContainerExecutor: Exception
>> >> from container-launch with container ID:
>> >> container_1457949199710_0001_01_01 and exit code: 1
>> >

Re: NM does not start with cgroups enabled

2016-03-15 Thread Darin Johnson

Bjorn,

You're isolation configuration is correct, I was going from memory.  I'll
take a look at you're configs a little later on my test environment and see
what I can come up with.

Darin

On Tue, Mar 15, 2016 at 12:07 PM, Björn Hagemeier  wrote:

> Dear Darin,
>
> thanks for your response.
>
> The precise content of /etc/mesos-slave/isolation is:
>
> ==
> cgroups/cpu,cgroups/mem
> ==
>
> Which I took from some documentation, it may have been that of the
> Puppet module I'm using [1]. Should the values be different? Your string
> looks a bit different: "cpu/cgroups,memory/cgroups".
>
> Please find my yarn-site.xml and myriad-config-default.yml attached. I
> don't think they contain any sensitive information.
>
>
> Best regards,
> Björn
>
> [1] https://github.com/deric/puppet-mesos
>
> Am 15.03.2016 um 16:46 schrieb Darin Johnson:
> > Hey Bjorn,
> >
> > Can you copy paste the relevant part of the Myriad and yarn-site.xml?
> > Also, can you ensure you are running the mesos-slave with
> > --isolation="cpu/cgroups,memory/cgroups?.
> >
> > I'll try to recreate the problem and/or tell you what's missing in the
> > config.
> >
> > Darin
> >
> > On Mon, Mar 14, 2016 at 6:19 AM, Björn Hagemeier <
> b.hageme...@fz-juelich.de>
> > wrote:
> >
> >> Hi all,
> >>
> >> I have trouble starting the NM on the slave nodes. Apparently, it does
> >> not find it's configuration or sth. is wrong with the configuration.
> >>
> >> With cgroups enabled, the NM does not start, the logs contain,
> >> indicating that there is sth. wrong in the configuratin. However,
> >> yarn.nodemanager.linux-container-executor.group is set (to "yarn"). The
> >> value used to be "${yarn.nodemanager.linux-container-executor.group}" as
> >> indicated by the installation documentation, however I'm uncertain
> >> whether this recursion is the correct approach.
> >>
> >>
> >> ==
> >> 16/03/14 09:32:45 FATAL nodemanager.NodeManager: Error starting
> NodeManager
> >> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to
> >> initialize container executor
> >> at
> >>
> >>
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:213)
> >> at
> >> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> >> at
> >>
> >>
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:474)
> >> at
> >>
> >>
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:521)
> >> Caused by: java.io.IOException: Linux container executor not configured
> >> properly (error=24)
> >> at
> >>
> >>
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:193)
> >> at
> >>
> >>
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:211)
> >> ... 3 more
> >> Caused by: ExitCodeException exitCode=24: Can't get configured value for
> >> yarn.nodemanager.linux-container-executor.group.
> >>
> >> at org.apache.hadoop.util.Shell.runCommand(Shell.java:543)
> >> at org.apache.hadoop.util.Shell.run(Shell.java:460)
> >> at
> >>
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:720)
> >> at
> >>
> >>
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:187)
> >> ... 4 more
> >> ==
> >>
> >>
> >> I have given it another try with cgroups disabled (in
> >> myriad-config-default.yml), I seem to get a little further, but still
> >> stuck at running Yarn jobs:
> >>
> >> ==
> >> 16/03/14 10:56:34 INFO container.Container: Container
> >> container_1457949199710_0001_01_01 transitioned from LOCALIZED to
> >> RUNNING
> >> 16/03/14 10:56:34 INFO nodemanager.DefaultContainerExecutor:
> >> launchContainer: [bash,
> >>
> >>
> /var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/bjoernh/appcache/application_1457949199710_0001/container_1457949199710_0001_01_01/default_container_executor.sh]
> >> 16/03/14 10:56:34 WARN nodemanager.DefaultContainerExecutor: Exit code
> >> from container container_1457949199710_0001_01_01 is : 1
> >> 16/03/14 10:56:34 WARN nodemanager.DefaultContainerExecutor: Exception
> >> from container-launch with container ID:
> >> container_1457949199710_0001_01_01 and exit code: 1
> >> ExitCodeException exitCode=1:
> >> at org.apache.hadoop.util.Shell.runCommand(Shell.java:543)
> >> at org.apache.hadoop.util.Shell.run(Shell.java:460)
> >> at
> >>
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:720)
> >> at
> >>
> >>
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:2

Re: NM does not start with cgroups enabled

2016-03-15 Thread Björn Hagemeier

Dear Darin,

thanks for your response.

The precise content of /etc/mesos-slave/isolation is:

==
cgroups/cpu,cgroups/mem
==

Which I took from some documentation, it may have been that of the
Puppet module I'm using [1]. Should the values be different? Your string
looks a bit different: "cpu/cgroups,memory/cgroups".

Please find my yarn-site.xml and myriad-config-default.yml attached. I
don't think they contain any sensitive information.


Best regards,
Björn

[1] https://github.com/deric/puppet-mesos

Am 15.03.2016 um 16:46 schrieb Darin Johnson:
> Hey Bjorn,
> 
> Can you copy paste the relevant part of the Myriad and yarn-site.xml?
> Also, can you ensure you are running the mesos-slave with
> --isolation="cpu/cgroups,memory/cgroups?.
> 
> I'll try to recreate the problem and/or tell you what's missing in the
> config.
> 
> Darin
> 
> On Mon, Mar 14, 2016 at 6:19 AM, Björn Hagemeier 
> wrote:
> 
>> Hi all,
>>
>> I have trouble starting the NM on the slave nodes. Apparently, it does
>> not find it's configuration or sth. is wrong with the configuration.
>>
>> With cgroups enabled, the NM does not start, the logs contain,
>> indicating that there is sth. wrong in the configuratin. However,
>> yarn.nodemanager.linux-container-executor.group is set (to "yarn"). The
>> value used to be "${yarn.nodemanager.linux-container-executor.group}" as
>> indicated by the installation documentation, however I'm uncertain
>> whether this recursion is the correct approach.
>>
>>
>> ==
>> 16/03/14 09:32:45 FATAL nodemanager.NodeManager: Error starting NodeManager
>> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to
>> initialize container executor
>> at
>>
>> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:213)
>> at
>> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>> at
>>
>> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:474)
>> at
>>
>> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:521)
>> Caused by: java.io.IOException: Linux container executor not configured
>> properly (error=24)
>> at
>>
>> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:193)
>> at
>>
>> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:211)
>> ... 3 more
>> Caused by: ExitCodeException exitCode=24: Can't get configured value for
>> yarn.nodemanager.linux-container-executor.group.
>>
>> at org.apache.hadoop.util.Shell.runCommand(Shell.java:543)
>> at org.apache.hadoop.util.Shell.run(Shell.java:460)
>> at
>> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:720)
>> at
>>
>> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:187)
>> ... 4 more
>> ==
>>
>>
>> I have given it another try with cgroups disabled (in
>> myriad-config-default.yml), I seem to get a little further, but still
>> stuck at running Yarn jobs:
>>
>> ==
>> 16/03/14 10:56:34 INFO container.Container: Container
>> container_1457949199710_0001_01_01 transitioned from LOCALIZED to
>> RUNNING
>> 16/03/14 10:56:34 INFO nodemanager.DefaultContainerExecutor:
>> launchContainer: [bash,
>>
>> /var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/bjoernh/appcache/application_1457949199710_0001/container_1457949199710_0001_01_01/default_container_executor.sh]
>> 16/03/14 10:56:34 WARN nodemanager.DefaultContainerExecutor: Exit code
>> from container container_1457949199710_0001_01_01 is : 1
>> 16/03/14 10:56:34 WARN nodemanager.DefaultContainerExecutor: Exception
>> from container-launch with container ID:
>> container_1457949199710_0001_01_01 and exit code: 1
>> ExitCodeException exitCode=1:
>> at org.apache.hadoop.util.Shell.runCommand(Shell.java:543)
>> at org.apache.hadoop.util.Shell.run(Shell.java:460)
>> at
>> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:720)
>> at
>>
>> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:210)
>> at
>>
>> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
>> at
>>
>> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>> at
>>
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> at
>>
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(T

Re: NM does not start with cgroups enabled

2016-03-15 Thread Darin Johnson

Hey Bjorn,

Can you copy paste the relevant part of the Myriad and yarn-site.xml?
Also, can you ensure you are running the mesos-slave with
--isolation="cpu/cgroups,memory/cgroups?.

I'll try to recreate the problem and/or tell you what's missing in the
config.

Darin

On Mon, Mar 14, 2016 at 6:19 AM, Björn Hagemeier 
wrote:

> Hi all,
>
> I have trouble starting the NM on the slave nodes. Apparently, it does
> not find it's configuration or sth. is wrong with the configuration.
>
> With cgroups enabled, the NM does not start, the logs contain,
> indicating that there is sth. wrong in the configuratin. However,
> yarn.nodemanager.linux-container-executor.group is set (to "yarn"). The
> value used to be "${yarn.nodemanager.linux-container-executor.group}" as
> indicated by the installation documentation, however I'm uncertain
> whether this recursion is the correct approach.
>
>
> ==
> 16/03/14 09:32:45 FATAL nodemanager.NodeManager: Error starting NodeManager
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to
> initialize container executor
> at
>
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:213)
> at
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at
>
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:474)
> at
>
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:521)
> Caused by: java.io.IOException: Linux container executor not configured
> properly (error=24)
> at
>
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:193)
> at
>
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:211)
> ... 3 more
> Caused by: ExitCodeException exitCode=24: Can't get configured value for
> yarn.nodemanager.linux-container-executor.group.
>
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:543)
> at org.apache.hadoop.util.Shell.run(Shell.java:460)
> at
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:720)
> at
>
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:187)
> ... 4 more
> ==
>
>
> I have given it another try with cgroups disabled (in
> myriad-config-default.yml), I seem to get a little further, but still
> stuck at running Yarn jobs:
>
> ==
> 16/03/14 10:56:34 INFO container.Container: Container
> container_1457949199710_0001_01_01 transitioned from LOCALIZED to
> RUNNING
> 16/03/14 10:56:34 INFO nodemanager.DefaultContainerExecutor:
> launchContainer: [bash,
>
> /var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/bjoernh/appcache/application_1457949199710_0001/container_1457949199710_0001_01_01/default_container_executor.sh]
> 16/03/14 10:56:34 WARN nodemanager.DefaultContainerExecutor: Exit code
> from container container_1457949199710_0001_01_01 is : 1
> 16/03/14 10:56:34 WARN nodemanager.DefaultContainerExecutor: Exception
> from container-launch with container ID:
> container_1457949199710_0001_01_01 and exit code: 1
> ExitCodeException exitCode=1:
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:543)
> at org.apache.hadoop.util.Shell.run(Shell.java:460)
> at
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:720)
> at
>
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:210)
> at
>
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
> at
>
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> 16/03/14 10:56:34 INFO nodemanager.ContainerExecutor: Exception from
> container-launch.
> 16/03/14 10:56:34 INFO nodemanager.ContainerExecutor: Container id:
> container_1457949199710_0001_01_01
> 16/03/14 10:56:34 INFO nodemanager.ContainerExecutor: Exit code: 1
> ==
>
> Unfortunately, directory
> /var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/bjoernh/appcache/
> is empty, the log indicates that it is being deleted after the failed
> attempt.
>
> Again, any hint would be useful. Also regarding the activation of cgroups.
>
>
> Best regards,
> Björn
>
> --
> Dipl.-Inform. Björn Hagemeier
> Federated Systems and Data
> J

NM does not start with cgroups enabled

2016-03-14 Thread Björn Hagemeier

Hi all,

I have trouble starting the NM on the slave nodes. Apparently, it does
not find it's configuration or sth. is wrong with the configuration.

With cgroups enabled, the NM does not start, the logs contain,
indicating that there is sth. wrong in the configuratin. However,
yarn.nodemanager.linux-container-executor.group is set (to "yarn"). The
value used to be "${yarn.nodemanager.linux-container-executor.group}" as
indicated by the installation documentation, however I'm uncertain
whether this recursion is the correct approach.


==
16/03/14 09:32:45 FATAL nodemanager.NodeManager: Error starting NodeManager
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to
initialize container executor
at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:213)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:474)
at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:521)
Caused by: java.io.IOException: Linux container executor not configured
properly (error=24)
at
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:193)
at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:211)
... 3 more
Caused by: ExitCodeException exitCode=24: Can't get configured value for
yarn.nodemanager.linux-container-executor.group.

at org.apache.hadoop.util.Shell.runCommand(Shell.java:543)
at org.apache.hadoop.util.Shell.run(Shell.java:460)
at
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:720)
at
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:187)
... 4 more
==


I have given it another try with cgroups disabled (in
myriad-config-default.yml), I seem to get a little further, but still
stuck at running Yarn jobs:

==
16/03/14 10:56:34 INFO container.Container: Container
container_1457949199710_0001_01_01 transitioned from LOCALIZED to
RUNNING
16/03/14 10:56:34 INFO nodemanager.DefaultContainerExecutor:
launchContainer: [bash,
/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/bjoernh/appcache/application_1457949199710_0001/container_1457949199710_0001_01_01/default_container_executor.sh]
16/03/14 10:56:34 WARN nodemanager.DefaultContainerExecutor: Exit code
from container container_1457949199710_0001_01_01 is : 1
16/03/14 10:56:34 WARN nodemanager.DefaultContainerExecutor: Exception
from container-launch with container ID:
container_1457949199710_0001_01_01 and exit code: 1
ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:543)
at org.apache.hadoop.util.Shell.run(Shell.java:460)
at
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:720)
at
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:210)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
16/03/14 10:56:34 INFO nodemanager.ContainerExecutor: Exception from
container-launch.
16/03/14 10:56:34 INFO nodemanager.ContainerExecutor: Container id:
container_1457949199710_0001_01_01
16/03/14 10:56:34 INFO nodemanager.ContainerExecutor: Exit code: 1
==

Unfortunately, directory
/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/bjoernh/appcache/
is empty, the log indicates that it is being deleted after the failed
attempt.

Again, any hint would be useful. Also regarding the activation of cgroups.


Best regards,
Björn

-- 
Dipl.-Inform. Björn Hagemeier
Federated Systems and Data
Juelich Supercomputing Centre
Institute for Advanced Simulation

Phone: +49 2461 61 1584
Fax  : +49 2461 61 6656
Email: b.hageme...@fz-juelich.de
Skype: bhagemeier
WWW  : http://www.fz-juelich.de/jsc

JSC is the coordinator of the
John von Neumann Institute for Computing
and member of the
Gauss Centre for Supercomputing

-
-
Forschungszentrum Juelich GmbH
52425 Juelic

Re: NM does not start with cgroups enabled

Re: NM does not start with cgroups enabled

Re: NM does not start with cgroups enabled

Re: NM does not start with cgroups enabled

Re: NM does not start with cgroups enabled

Re: NM does not start with cgroups enabled

Re: NM does not start with cgroups enabled

Re: NM does not start with cgroups enabled

Re: NM does not start with cgroups enabled

NM does not start with cgroups enabled

10 matches

Site Navigation

Mail list logo

Footer information