[jira] [Commented] (MYRIAD-252) chown: cannot access '/sys/fs/cgroup/cpu/mesos/330e5ca9-b8ba-4e30-822a-817f7c905891': No such file or directory

2017-03-16 Thread Sarjeet Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/MYRIAD-252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15928476#comment-15928476
 ] 

Sarjeet Singh commented on MYRIAD-252:
--

>> E0316 15:32:03.124871 12039 shell.hpp:107] Command 'hadoop version 2>&1' 
>> failed; this is the output:
sh: hadoop: command not found
Failed to fetch 'hdfs://csv-dcos02:9000/dist/hadoop-myriad.tgz': Failed to 
create HDFS client: Failed to execute 'hadoop version 2>&1'; the command was 
either not found or exited with a non-zero exit status: 127

@Hongtaosun, can you check the following:
1. Is hadoop installed on all mesos agents?
2. Can you run hadoop commands from agent machines? (also try "/bin/sh hadoop" 
command)
3. Can you check if HADOOP_HOME is set or you have hadoop bin location in PATH 
environment variable?

> chown: cannot access 
> '/sys/fs/cgroup/cpu/mesos/330e5ca9-b8ba-4e30-822a-817f7c905891': No such file 
> or directory
> ---
>
> Key: MYRIAD-252
> URL: https://issues.apache.org/jira/browse/MYRIAD-252
> Project: Myriad
>  Issue Type: Bug
>Reporter: yangjunfeng
>
> I have configured myriad these days and some probloms shown below:
> 14:49:45.216470 23824 fetcher.cpp:409] Fetching URI 
> 'hdfs://188.188.0.189:9000/usr/hadoop-2.7.3.tgz'
> I0123 14:49:45.216486 23824 fetcher.cpp:250] Fetching directly into the 
> sandbox directory
> I0123 14:49:45.216532 23824 fetcher.cpp:187] Fetching URI 
> 'hdfs://188.188.0.189:9000/usr/hadoop-2.7.3.tgz'
> I0123 14:49:45.851863 23824 fetcher.cpp:109] Downloading resource with Hadoop 
> client from 'hdfs://188.188.0.189:9000/usr/hadoop-2.7.3.tgz' to 
> '/var/lib/mesos/slaves/3337a77d-41ff-4d1c-b9ce-6b0c971e7ba1-S2/frameworks/1ff6c07d-a86c-400e-abac-e9618a5504db-0014/executors/myriad_executor1ff6c07d-a86c-400e-abac-e9618a5504db-00141ff6c07d-a86c-400e-abac-e9618a5504db-O93123337a77d-41ff-4d1c-b9ce-6b0c971e7ba1-S2/runs/330e5ca9-b8ba-4e30-822a-817f7c905891/hadoop-2.7.3.tgz'
> I0123 14:49:50.121711 23824 fetcher.cpp:547] Fetched 
> 'hdfs://188.188.0.189:9000/usr/hadoop-2.7.3.tgz' to 
> '/var/lib/mesos/slaves/3337a77d-41ff-4d1c-b9ce-6b0c971e7ba1-S2/frameworks/1ff6c07d-a86c-400e-abac-e9618a5504db-0014/executors/myriad_executor1ff6c07d-a86c-400e-abac-e9618a5504db-00141ff6c07d-a86c-400e-abac-e9618a5504db-O93123337a77d-41ff-4d1c-b9ce-6b0c971e7ba1-S2/runs/330e5ca9-b8ba-4e30-822a-817f7c905891/hadoop-2.7.3.tgz'
> I0123 14:49:50.121835 23824 fetcher.cpp:409] Fetching URI 
> 'http://188.188.0.189:8088/conf'
> I0123 14:49:50.121855 23824 fetcher.cpp:250] Fetching directly into the 
> sandbox directory
> I0123 14:49:50.121888 23824 fetcher.cpp:187] Fetching URI 
> 'http://188.188.0.189:8088/conf'
> I0123 14:49:50.121911 23824 fetcher.cpp:134] Downloading resource from 
> 'http://188.188.0.189:8088/conf' to 
> '/var/lib/mesos/slaves/3337a77d-41ff-4d1c-b9ce-6b0c971e7ba1-S2/frameworks/1ff6c07d-a86c-400e-abac-e9618a5504db-0014/executors/myriad_executor1ff6c07d-a86c-400e-abac-e9618a5504db-00141ff6c07d-a86c-400e-abac-e9618a5504db-O93123337a77d-41ff-4d1c-b9ce-6b0c971e7ba1-S2/runs/330e5ca9-b8ba-4e30-822a-817f7c905891/conf'
> W0123 14:49:50.139974 23824 fetcher.cpp:289] Copying instead of extracting 
> resource from URI with 'extract' flag, because it does not seem to be an 
> archive: http://188.188.0.189:8088/conf
> I0123 14:49:50.140010 23824 fetcher.cpp:547] Fetched 
> 'http://188.188.0.189:8088/conf' to 
> '/var/lib/mesos/slaves/3337a77d-41ff-4d1c-b9ce-6b0c971e7ba1-S2/frameworks/1ff6c07d-a86c-400e-abac-e9618a5504db-0014/executors/myriad_executor1ff6c07d-a86c-400e-abac-e9618a5504db-00141ff6c07d-a86c-400e-abac-e9618a5504db-O93123337a77d-41ff-4d1c-b9ce-6b0c971e7ba1-S2/runs/330e5ca9-b8ba-4e30-822a-817f7c905891/conf'
> chown: cannot access 
> '/sys/fs/cgroup/cpu/mesos/330e5ca9-b8ba-4e30-822a-817f7c905891': No such file 
> or directory
> env: /bin/yarn: No such file or directory
> How can I fix this problom?
> Thanks very much!



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MYRIAD-250) Should shutdown mesos framework when stop resourcemanager

2016-11-29 Thread Sarjeet Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/MYRIAD-250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15707851#comment-15707851
 ] 

Sarjeet Singh commented on MYRIAD-250:
--

Tao, Does the issue happens for Zero profile NM or any NM launched? I think 
I've tried this before and it worked without any issue.

> Should shutdown mesos framework when stop resourcemanager
> -
>
> Key: MYRIAD-250
> URL: https://issues.apache.org/jira/browse/MYRIAD-250
> Project: Myriad
>  Issue Type: Bug
>Affects Versions: Myriad 0.2.0
>Reporter: Tao Jie
>
> When I started resourcemanager and flex up nodes, nodemanagers were launched 
> as mesos tasks in framework created by RM.
> I stopped resourcemanager, the framework turned to inactive framework but 
> nodemanagers still run as active task. Then I restarted the resourcemanager, 
> which create another framework. Those nodemanager would report to the new 
> Resourcemanager, and I could not kill those nodemanager by flex down nodes.
> It seems that the framework should be shutdown once the resourcemanager is 
> stopped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Website is updated, 0.2.0 is official!

2016-06-29 Thread Sarjeet Singh
Great! Good job everyone and special thanks to Darin for the release :)

-Sarjeet

On Wed, Jun 29, 2016 at 2:05 PM, Darin Johnson 
wrote:

> http://myriad.apache.org/
>
> Tell your friends!
>


[jira] [Commented] (MYRIAD-228) Duplicated NM opts

2016-06-28 Thread Sarjeet Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/MYRIAD-228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15353696#comment-15353696
 ] 

Sarjeet Singh commented on MYRIAD-228:
--

I filed a JIRA MYRIAD-125 when I first saw this issue, but later, I couldn't 
reproduce the issue. Anyways, thanks for the fix [~darinj]].

> Duplicated NM opts
> --
>
> Key: MYRIAD-228
> URL: https://issues.apache.org/jira/browse/MYRIAD-228
> Project: Myriad
>  Issue Type: Bug
>Reporter: Klaus Ma
>
> In {{NMExecutorCLGenImpl.java:addYarnNodemanagerOpt}}, it keep appending NM 
> Opts. It'll make arguments too long.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [Vote] Release apache-myriad-0.2.0-incubating (release candidate 4)

2016-05-31 Thread sarjeet singh
+1 (Non-binding)

Verified md5 and sha512 checksums.
D/L myriad-0.2.0-incubating-rc4.tar.gz, Compiled & deployed it on a 1 node
MapR cluster.
Tried FGS/CGS flex up/down, and ran long/short running M/R jobs.
Tried framework shutdown from UI/API, and tried re-launching myriad again.
Tried Cgroups and able to launch NMs w/ cgroups enabled successfully.

- Sarjeet Singh

On Fri, May 27, 2016 at 3:36 PM, Santosh Marella <smare...@maprtech.com>
wrote:

> +1 (Binding).
>
> - Verified signature
> - Verified MD5 and SHA512 hashes
> - Builds from source tar ball.
> - Ran Apache RAT. Verified that all the sources have license headers.
> - Verified CGS/FGS behaviors with MapReduce jobs on a 4 node Mesos/Yarn
> cluster.
>
> Thanks,
> Santosh
>
> On Tue, May 24, 2016 at 7:46 PM, Darin Johnson <dbjohnson1...@gmail.com>
> wrote:
>
> > I'm voting +1 (Binding)
> >
> > Verified md5/sha hashes.  Compiled with gradle build, gradle
> buildRMDocker
> > (on OSX with docker-machine).
> >
> > Ran remote distribution (with cgroups) on a 4 node cluster (Ubuntu,
> > hadoop-2.6.0, hadoop 2.7.0) with one CGS NM and 3 FGS NM.  Ran 8
> > simultaneous jobs.  Shut down Framework.  Restarted NodeManager, ran an
> > additional 3 jobs.
> >
> > Ran the same with docker (minus cgroups).
> >
> > Darin
> >
> > On Tue, May 24, 2016 at 10:40 PM, Darin Johnson <dbjohnson1...@gmail.com
> >
> > wrote:
> >
> > > Hi All,
> > >
> > > I have created a source tar ball for Apache Myriad 0.2.0-incubating,
> > > release candidate 3 based off the feed back received from release
> > > candidate 1,2 & 3.  Thanks Sarjeet for a very thorough review!
> > >
> > > Here’s the release notes:
> > > https://cwiki.apache.org/confluence/display/MYRIAD/Release+Notes
> > >
> > > The commit to be voted upon is tagged with
> "myriad-0.2.0-incubating-rc4"
> > > and is available here:
> > >
> > >
> >
> https://git-wip-us.apache.org/repos/asf?p=incubator-myriad.git;a=shortlog;h=refs/tags/myriad-0.2.0-incubating-rc
> > > <
> >
> https://git-wip-us.apache.org/repos/asf?p=incubator-myriad.git;a=shortlog;h=refs/tags/myriad-0.2.0-incubating-rc4
> > >
> > > 4
> > >
> > > The artifacts to be voted upon are located below. Please note that this
> > is
> > > a source release:
> > >
> > >
> >
> https://dist.apache.org/repos/dist/dev/incubator/myriad/myriad-0.2.0-incubating-rc4/
> > >
> > > Release artifacts are signed with the following key:
> > > *https://home.apache.org/~darinj/gpg/2AAE9E3F.asc
> > > <https://home.apache.org/~darinj/gpg/2AAE9E3F.asc>*
> > >
> > > **Please note that the release tar ball does not include the gradlew
> > script
> > > to build. You need to install gradle in order to build.**
> > >
> > > Please try out the release candidate and vote. The vote is open for a
> > > minimum of 3 business days (Friday May 27) or until the necessary
> number
> > > of votes (3 binding +1s)
> > > is reached.
> > >
> > > If/when this vote succeeds, I will call for a vote with IPMC seeking
> > > permission to release RC3 as Apache Myriad 0.2.0 (incubating).
> > >
> > > [ ] +1 Release this package as Apache Myriad 0.2.0-incubating
> > > [ ]  0 I don't feel strongly about it, but I'm okay with the release
> > > [ ] -1 Do not release this package because...
> > >
> > > Thanks,
> > > Darin
> > >
> >
>


Re: [Vote] Release apache-myriad-0.2.0-incubating (release candidate 3)

2016-05-24 Thread Sarjeet Singh
>> Specifically, this corrected some documentation and a minor typo

Darin, RC3 is missing PR#75 changes. I D/L'ed the tar and manually checked
the changes and wasn't there.

-Sarjeet

On Mon, May 23, 2016 at 9:15 PM, Darin Johnson 
wrote:

> Hi All,
>
> I have created a source tar ball for Apache Myriad 0.2.0-incubating,
> release candidate 3 based off the feed back received from release candidate
> 1 & 2.  Specifically, this corrected some documentation and a minor typo.
>
> Here’s the release notes:
> https://cwiki.apache.org/confluence/display/MYRIAD/Release+Notes
>
> The commit to be voted upon is tagged with "myriad-0.2.0-incubating-rc2"
> and is available here:
>
> https://git-wip-us.apache.org/repos/asf?p=incubator-myriad.git;a=shortlog;h=refs/tags/myriad-0.2.0-incubating-rc
> <
> https://git-wip-us.apache.org/repos/asf?p=incubator-myriad.git;a=shortlog;h=refs/tags/myriad-0.2.0-incubating-rc3
> >
> 3
>
> The artifacts to be voted upon are located below. Please note that this is
> a source release:
>
> https://dist.apache.org/repos/dist/dev/incubator/myriad/myriad-0.2.0-incubating-rc3/
>
> Release artifacts are signed with the following key:
> *https://home.apache.org/~darinj/gpg/2AAE9E3F.asc
> *
>
> **Please note that the release tar ball does not include the gradlew script
> to build. You need to install gradle in order to build.**
>
> Please try out the release candidate and vote. The vote is open for a
> minimum of 3 business days (Friday May 27) or until the necessary number of
> votes (3 binding +1s)
> is reached.
>
> If/when this vote succeeds, I will call for a vote with IPMC seeking
> permission to release RC3 as Apache Myriad 0.2.0 (incubating).
>
> [ ] +1 Release this package as Apache Myriad 0.2.0-incubating
> [ ]  0 I don't feel strongly about it, but I'm okay with the release
> [ ] -1 Do not release this package because...
>
> Thanks,
> Darin
>


Re: Issues related to RM docker container on Myriad RC2.

2016-05-23 Thread Sarjeet Singh
AFAIK, I might have few configuration and setup issue related to docker
issues. I'll still look into it more once I have some more time. But
overall, RC2 looks good and docker issue is not a blocker as such.

Let me know if anyone else is able to try docker stuff successfully and
willing to share the story :).

-Sarjeet

On Sun, May 22, 2016 at 8:54 PM, sarjeet singh <ssarjeetsi...@gmail.com>
wrote:

> I tried following to try out docker from myriad rc2, but couldn't past
> after RM is launched and not able to launch NMs.
>
> Here is the formatted output for the RM docker launch:
>
> root@qa101-139:~/myriad/myriad-0.2.0-incubating-rc2/docker# docker run
> --net=host -v $PWD/dist -v $PWD/config:/usr/local/hadoop/etc/hadoop
> --name='myriad-resourcemanager' -t sarjeet/myriad
>
> 2016-05-23 03:38:21,431 INFO  [main] myriad.Main
> (Main.java:initHealthChecks(140)) - Initializing HealthChecks
>
> 2016-05-23 03:38:21,445 INFO  [main] myriad.Main
> (Main.java:initProfiles(148)) - Initializing Profiles
>
> 2016-05-23 03:38:21,450 INFO  [main] scheduler.ServiceProfileManager
> (ServiceProfileManager.java:add(40)) - Adding profile zero with CPU: 0.0
> and Memory: 0.0
>
> 2016-05-23 03:38:21,450 INFO  [main] scheduler.ServiceProfileManager
> (ServiceProfileManager.java:add(40)) - Adding profile small with CPU: 2.0
> and Memory: 2048.0
>
> 2016-05-23 03:38:21,450 INFO  [main] scheduler.ServiceProfileManager
> (ServiceProfileManager.java:add(40)) - Adding profile medium with CPU: 4.0
> and Memory: 4096.0
>
> 2016-05-23 03:38:21,450 INFO  [main] scheduler.ServiceProfileManager
> (ServiceProfileManager.java:add(40)) - Adding profile large with CPU: 10.0
> and Memory: 12288.0
>
> 2016-05-23 03:38:21,451 INFO  [main] myriad.Main
> (Main.java:validateNMInstances(175)) - Validating nmInstances..
>
> 2016-05-23 03:38:21,451 INFO  [main] myriad.Main
> (Main.java:initServiceConfigurations(238)) - Initializing
> initServiceConfigurations
>
> 2016-05-23 03:38:21,534 INFO  [main] myriad.Main
> (Main.java:startMesosDriver(119)) - starting mesosDriver..
>
> 2016-05-23 03:38:21,534 INFO  [main] scheduler.MyriadDriverManager
> (MyriadDriverManager.java:startDriver(51)) - Starting driver...
>
> 2016-05-23 03:38:21,534 INFO  [main] scheduler.MyriadDriver
> (MyriadDriver.java:start(49)) - Starting driver
>
> 2016-05-23 03:38:21,535:7(0x7f4e9b3ac700):ZOO_INFO@log_env@712: Client
> environment:zookeeper.version=zookeeper C client 3.4.5
>
> 2016-05-23 03:38:21,535:7(0x7f4e9b3ac700):ZOO_INFO@log_env@716: Client
> environment:host.name=qa101-139
>
> 2016-05-23 03:38:21,535:7(0x7f4e9b3ac700):ZOO_INFO@log_env@723: Client
> environment:os.name=Linux
>
> 2016-05-23 03:38:21,535:7(0x7f4e9b3ac700):ZOO_INFO@log_env@724: Client
> environment:os.arch=3.13.0-57-generic
>
> 2016-05-23 03:38:21,535:7(0x7f4e9b3ac700):ZOO_INFO@log_env@725: Client
> environment:os.version=#95-Ubuntu SMP Fri Jun 19 09:28:15 UTC 2015
>
> I0523 03:38:21.53564624 sched.cpp:222] Version: 0.28.1
>
> 2016-05-23 03:38:21,535 INFO  [main] scheduler.MyriadDriver
> (MyriadDriver.java:start(51)) - Driver started with status: DRIVER_RUNNING
>
> 2016-05-23 03:38:21,536 INFO  [main] scheduler.MyriadDriverManager
> (MyriadDriverManager.java:startDriver(53)) - Driver started with status:
> DRIVER_RUNNING
>
> 2016-05-23 03:38:21,536 INFO  [main] myriad.Main
> (Main.java:startMesosDriver(121)) - started mesosDriver..
>
> 2016-05-23 03:38:21,536:7(0x7f4e8d685700):ZOO_INFO@check_events@1703:
> initiated connection to server [10.10.101.139:5181]
>
> 2016-05-23 03:38:21,536 INFO  [main] interceptor.CompositeInterceptor
> (CompositeInterceptor.java:register(74)) - Registered
> org.apache.myriad.policy.LeastAMNodesFirstPolicy into the registry.
>
> 2016-05-23 03:38:21,539 INFO  [main] myriad.Main
> (Main.java:startNMInstances(226)) - Launching 1 NM(s) with profile medium
>
> 2016-05-23 03:38:21,540 INFO  [main] scheduler.MyriadOperations
> (MyriadOperations.java:flexUpCluster(80)) - Adding 1 NM instances to cluster
>
> 2016-05-23 03:38:21,555:7(0x7f4e8d685700):ZOO_INFO@check_events@1750:
> session establishment complete on server [10.10.101.139:5181],
> sessionId=0x15314ddb816d02b, negotiated timeout=1
>
> I0523 03:38:21.55587197 group.cpp:349] Group process (group(1)@
> 10.10.101.139:57196) connected to ZooKeeper
>
> I0523 03:38:21.55594597 group.cpp:831] Syncing group operations: queue
> size (joins, cancels, datas) = (0, 0, 0)
>
> I0523 03:38:21.55597997 group.cpp:427] Trying to create path '/mesos'
> in ZooKeeper
>
> I0523 03:38:21.55707398 detector.cpp:152] Detected a new leader:
> (id='1')
>
> I0523 03:38:21.55723575 group.cpp:700] Trying to get
> 

Re: [Vote] Release apache-myriad-0.2.0-incubating (release candidate 2)

2016-05-23 Thread Sarjeet Singh
+1 (Non-binding)

Verified md5 and sha512 checksums.
D/L myriad-0.2.0-incubating-rc2.tar.gz, Compiled & deployed it on a 1 node
MapR cluster.
Tried FGS/CGS flex up/down, and ran long/short running M/R jobs.
Tried framework shutdown from UI/API, and tried re-launching myriad again.
Tried Cgroups and able to launch NMs w/ cgroups enabled successfully.

-Sarjeet

On Thu, May 19, 2016 at 8:21 PM, Darin Johnson 
wrote:

> I'm voting +1.
> Build, ran multiple map/reduce jobs, a few spark and flink jobs.
>
> Darin
>
> On Tue, May 17, 2016 at 9:24 PM, Darin Johnson 
> wrote:
>
> > Hi All,
> >
> > I have created a source tar ball for Apache Myriad 0.2.0-incubating,
> > release candidate 2 based off the feed back received from release
> > candidate 1.  Specifically, the NOTICE file has been updated to 2016 and
> > the framework properly shuts down when using the web ui.
> >
> > Here’s the release notes:
> > https://cwiki.apache.org/confluence/display/MYRIAD/Release+Notes
> >
> > The commit to be voted upon is tagged with "myriad-0.2.0-incubating-rc2"
> > and is available here:
> >
> >
> https://git-wip-us.apache.org/repos/asf?p=incubator-myriad.git;a=shortlog;h=refs/tags/myriad-0.2.0-incubating-rc2
> >
> > The artifacts to be voted upon are located below. Please note that this
> is
> > a source release:
> >
> >
> https://dist.apache.org/repos/dist/dev/incubator/myriad/myriad-0.2.0-incubating-rc2/
> >
> > Release artifacts are signed with the following key:
> > *https://home.apache.org/~darinj/gpg/2AAE9E3F.asc
> > *
> >
> > **Please note that the release tar ball does not include the gradlew
> script
> > to build. You need to install gradle in order to build.**
> >
> > Please try out the release candidate and vote. The vote is open for a
> > minimum of 3 business days (Friday May 20) or until the necessary number
> > of votes (3 binding +1s)
> > is reached.
> >
> > If/when this vote succeeds, I will call for a vote with IPMC seeking
> > permission to release RC1 as Apache Myriad 0.2.0 (incubating).
> >
> > [ ] +1 Release this package as Apache Myriad 0.2.0-incubating
> > [ ]  0 I don't feel strongly about it, but I'm okay with the release
> > [ ] -1 Do not release this package because...
> >
> > Thanks,
> > Darin
> >
>


Issues related to RM docker container on Myriad RC2.

2016-05-22 Thread sarjeet singh
k:
nm.medium.ce650d07-e6a6-48bf-aef3-be11c7385bd0 | state: TASK_FAILED



To debug above failure, I checked the NM task stdout/stderr but failed to
get any logs (see screenshot attached)

Then, On attaching to the running containers, found following from the
docker container:

root@qa101-139:~/myriad/myriad-0.2.0-incubating-rc2/docker# docker exec -it
8f730dc2de1a /bin/bash

yarn@qa101-139:/$

yarn@qa101-139:/$ ps -ef

UIDPID  PPID  C STIME TTY  TIME CMD

yarn 1 0  0 03:38 ?00:00:00 /bin/sh -c
/usr/local/hadoop/bin/yarn resourcemanager

yarn 7 1 14 03:38 ?00:00:14 /usr/bin/java
-Dproc_resourcemanager -Xmx1000m -Dhadoop.log.dir=/usr/local/hadoop/logs
-Dyarn.log.dir=/usr/l

yarn   296 0  0 03:39 ?00:00:00 /bin/bash

yarn   302 0  0 03:39 ?00:00:00 /bin/bash

yarn   309   302  0 03:39 ?00:00:00 ps -ef

yarn@qa101-139:/$ ls -l /usr/local/hadoop/

total 52

-rw-r--r-- 1 root root 15429 Apr 10  2015 LICENSE.txt

-rw-r--r-- 1 root root   101 Apr 10  2015 NOTICE.txt

-rw-r--r-- 1 root root  1366 Apr 10  2015 README.txt

drwxr-xr-x 2 root root  4096 May 22 19:22 bin

drwxr-xr-x 6 root root  4096 May 22 20:42 etc

drwxr-xr-x 2 root root  4096 May 22 19:22 include

drwxr-xr-x 4 root root  4096 May 22 19:22 lib

drwxr-xr-x 2 root root  4096 May 22 19:22 libexec

drwxr-xr-x 2 root root  4096 May 22 19:22 sbin

drwxr-xr-x 7 root root  4096 May 22 20:42 share

yarn@qa101-139:/$ ls -l /usr/local/hadoop/etc/hadoop/

total 16

-rw-r--r-- 1 root root 1340 May 22 20:02 mapred-site.xml

-rw-r--r-- 1 root root 3395 May 23 03:38 myriad-config-default.yml

-rw-r--r-- 1 root root 4207 May 23 00:27 yarn-site.xml

yarn@qa101-139:/$ cat
/usr/local/hadoop/etc/hadoop/myriad-config-default.yml

mesosMaster: zk://10.10.101.139:5181/mesos   ->> (Running on the host
outside of the container)

#Container information for the node managers

containerInfo:

type: DOCKER

dockerInfo:

image: sarjeet/myriad

volumes:

-

  containerPath: /tmp

  hostPath: /tmp

checkpoint: false

frameworkFailoverTimeout: 4320

frameworkName: MyriadAlpha

frameworkRole:

frameworkUser: mapr

  # running the resource manager.

frameworkSuperUser: root  # To be depricated, currently permissions need
set by a superuser due to Mesos-1790.  Must be

  # root or have passwordless sudo. Required if
nodeManagerURI set, ignored otherwise.

nativeLibrary: /usr/local/lib/libmesos.so

zkServers: 10.10.101.139:5181

zkTimeout: 2

restApiPort: 8192

profiles:

  zero:  # NMs launched with this profile dynamically obtain cpu/mem from
Mesos

cpu: 0

mem: 0

spindles: 0

  small:

cpu: 2

mem: 2048

spindles: 1

  medium:

cpu: 4

mem: 4096

spindles: 2

  large:

cpu: 10

mem: 12288

spindles: 4

nmInstances: # NMs to start with. Requires at least 1 NM with a non-zero
profile.

  medium: 1 # 

rebalancer: false

haEnabled: true

servedConfigPath: /dist/config.tgz

nodemanager:

  jvmMaxMemoryMB: 1024

  cpus: 0.2

  cgroups: true

executor:

  jvmMaxMemoryMB: 256

  configUri: http://172.17.0.1:8192/api/config.tgz

  path:
file:///opt/mapr/myriad/myriad-0.1/lib/myriad-executor-runnable-0.1.0.jar

  #The following should be used for a remotely distributed URI, hdfs
assumed but other URI types valid.

  #nodeManagerUri: hdfs://namenode:port/dist/hadoop-2.7.0.tar.gz

  #path:
file:///opt/mapr/myriad/myriad-0.1/lib/myriad-executor-runnable-0.1.0.jar

yarnEnvironment:

  YARN_NODEMANAGER_OPTS: -Dcluster.name.prefix=/cluster1
-Dnodemanager.resource.io-spindles=4.0

  YARN_HOME: /opt/mapr/hadoop/hadoop-2.7.0

  HADOOP_CONF_DIR: /mnt/mesos/sandbox/config

  HADOOP_TMP_DIR: /tmp

  HADOOP_LOG_DIR: /mnt/mesos/sandbox

  #JAVA_HOME: /usr/lib/jvm/java-default #System dependent, but sometimes
necessary

mesosAuthenticationPrincipal:

mesosAuthenticationSecretFilename:

yarn@qa101-139:/$ netstat -anlp | grep 8088

tcp6   0  0 10.10.101.139:8088  :::*LISTEN
7/java

yarn@qa101-139:/$ netstat -anlp | grep 8192

tcp6   0  0 :::8192 :::*LISTEN
7/java

yarn@qa101-139:/$



Reference: https://github.com/apache/incubator-myriad/tree/master/docker

I might not have looked deeper enough to see if there was any configuration
issue on launching docker RM, but in case, there is a trivial fix or config
I missed, I can give this another try. Let me know if there was anything I
missed?
- Sarjeet Singh


gradle Issue when building RM docker on MacOSX

2016-05-22 Thread sarjeet singh
)

at
org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:424)

at
org.glassfish.jersey.client.JerseyInvocation.invoke(JerseyInvocation.java:679)

at
org.glassfish.jersey.client.JerseyInvocation$Builder.method(JerseyInvocation.java:435)

at
org.glassfish.jersey.client.JerseyInvocation$Builder.post(JerseyInvocation.java:338)

at
com.github.dockerjava.jaxrs.async.POSTCallbackNotifier.response(POSTCallbackNotifier.java:29)

at
com.github.dockerjava.jaxrs.async.AbstractCallbackNotifier.call(AbstractCallbackNotifier.java:45)

at
com.github.dockerjava.jaxrs.async.AbstractCallbackNotifier.call(AbstractCallbackNotifier.java:22)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)

at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:745)

:docker:buildRMDocker FAILED


FAILURE: Build failed with an exception.


* What went wrong:

Execution failed for task ':docker:buildRMDocker'.

> Could not build image


* Try:

Run with --stacktrace option to get the stack trace. Run with --info or
--debug option to get more log output.


BUILD FAILED


Total time: 44.089 secs

=

It seems above is caused due to following in build.gradle: url =
'unix:///var/run/docker.sock'

I tried applying few work-around but nothing worked for me. I am reporting
this issue in-case others may also hit this issue or if there is any
work-around to resolve this issue.

Note: Though, I am able to get docker images built fine on a ubuntu node.
The above issue only specific for MacOSX.

- Sarjeet Singh


Need help with cgroup troubleshooting or setup issue with NM launch.

2016-05-21 Thread Sarjeet Singh
When trying cgroups on myriad-0.2 RC on a single node mapr cluster, I am
getting the following issue:

1. The below errors is when launching NodeManager with cgroups enabled:

*stdout*:

export TASK_DIR=afe954c5-79dc-4238-af84-14855090df34&& sudo chown mapr
/sys/fs/cgroup/cpu/mesos/afe954c5-79dc-4238-af84-14855090df34 && export
YARN_HOME=/opt/mapr/hadoop/hadoop-2.7.0; env
YARN_NODEMANAGER_OPTS=-Dcluster.name.prefix=/cluster1
-Dnodemanager.resource.io-spindles=4.0
-Dyarn.nodemanager.linux-container-executor.cgroups.hierarchy=mesos/
afe954c5-79dc-4238-af84-14855090df34 -Dyarn.home=/opt/mapr/hadoop/hadoop-2.7.0
-Dnodemanager.resource.cpu-vcores=4 -Dnodemanager.resource.memory-mb=4096
-Dmyriad.yarn.nodemanager.address=0.0.0.0:31847
-Dmyriad.yarn.nodemanager.localizer.address=0.0.0.0:31132
-Dmyriad.yarn.nodemanager.webapp.address=0.0.0.0:31181
-Dmyriad.mapreduce.shuffle.port=31166
YARN_HOME=/opt/mapr/hadoop/hadoop-2.7.0
/opt/mapr/hadoop/hadoop-2.7.0/bin/yarn nodemanager


*stderr*:

16/05/21 01:43:13 INFO service.AbstractService: Service NodeManager failed
in state INITED; cause:
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to
initialize container executor

org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to
initialize container executor

at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:214)

at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)

at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:476)

at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:524)

Caused by: java.io.IOException: Not able to enforce cpu weights; cannot
write to cgroup at: /sys/fs/cgroup/cpu

at
org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler.initializeControllerPaths(CgroupsLCEResourcesHandler.java:493)

at
org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler.init(CgroupsLCEResourcesHandler.java:152)

at
org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler.init(CgroupsLCEResourcesHandler.java:135)

at
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:192)

at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:212)

... 3 more

16/05/21 01:43:13 WARN service.AbstractService: When stopping the service
NodeManager : java.lang.NullPointerException

java.lang.NullPointerException

at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.stopRecoveryStore(NodeManager.java:164)

at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStop(NodeManager.java:276)

at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)

at
org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)

at
org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)

at org.apache.hadoop.service.AbstractService.init(AbstractService.java:171)

at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:476)

at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:524)

16/05/21 01:43:13 FATAL nodemanager.NodeManager: Error starting NodeManager

org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to
initialize container executor

at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:214)

at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)

at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:476)

at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:524)

Caused by: java.io.IOException: Not able to enforce cpu weights; cannot
write to cgroup at: /sys/fs/cgroup/cpu

at
org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler.initializeControllerPaths(CgroupsLCEResourcesHandler.java:493)

at
org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler.init(CgroupsLCEResourcesHandler.java:152)

at
org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler.init(CgroupsLCEResourcesHandler.java:135)

at
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:192)

at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:212)

... 3 more

16/05/21 01:43:13 INFO nodemanager.NodeManager: SHUTDOWN_MSG:

/

SHUTDOWN_MSG: Shutting down NodeManager at qa101-139/10.10.101.139

/

Here is the yarn-site.xml configurations:



  

  

yarn.resourcemanager.hostname

testrm.marathon.mesos

host is the hostname of the resourcemanager

  

  

yarn.resourcemanager.recovery.enabled

true

RM Recovery Enabled

  

  

yarn.resourcemanager.scheduler.class


Myriad talk link for MesosCon?

2016-03-23 Thread Sarjeet Singh
I couldn't find any associated link of myriad talk for MesosCon voting.
Anyone?

Though, I found these proposal doc:

Developers: http://bit.ly/1RpZPvj
Users: http://bit.ly/1Mspaxp


*It seems the deadline for the proposal voting is today, March 23 2016.*

-Sarjeet


[jira] [Commented] (MYRIAD-188) Zero sized node managers can cause the Resource Manager to crash with an NPE

2016-03-02 Thread Sarjeet Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/MYRIAD-188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15177186#comment-15177186
 ] 

Sarjeet Singh commented on MYRIAD-188:
--

Could this be same or related to MYRIAD-156?

> Zero sized node managers can cause the Resource Manager to crash with an NPE
> 
>
> Key: MYRIAD-188
> URL: https://issues.apache.org/jira/browse/MYRIAD-188
> Project: Myriad
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: Myriad 0.1.0
>Reporter: DarinJ
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Mesos and Myriad Kerberos Support

2016-02-11 Thread Sarjeet Singh
Thanks Miguel for summarizing this. I missed the hangout yesterday, though
:(

-Sarjeet

On Wed, Feb 10, 2016 at 5:56 PM,  wrote:

> Hello guys,
>
> I wanted to follow up a little further on today’s Hangouts call about
> Kerberos. For everyone else who may have not been on the call the idea is
> if you have Spark, Myriad and some task running top of Mesos and it needs
> access to some third party service like HDFS that needs kerberos
> credentials how will that work?
>
> Adam has mentioned one solution he’s seen. This was to have credentials
> cached on the master that will then intercept the calls and annotate the
> task with their credentials and wrap the calls with something that unwraps
> the credentials and puts it into place to authenticate. This will require
> update the TGT as they expire.
>
> Adam, you’ve mentioned that is Mesosphere doing in this space as well, do
> you know if that is specific to Kerberos or something else? Any other
> suggestion will be helpful!
>
> Thanks!
>
>
> *Known Jiras regarding this adding kerberos support for Mesos
>
> https://issues.apache.org/jira/browse/MESOS-907
>
> > Miguel Bernadin Accenture Technology Labs – System Engineering
> Contact: W (408) 817-2742 | M (631) 835-6345 |
> miguel.berna...@accenture.com
>
> 
>
> This message is for the designated recipient only and may contain
> privileged, proprietary, or otherwise confidential information. If you have
> received it in error, please notify the sender immediately and delete the
> original. Any other use of the e-mail by you is prohibited. Where allowed
> by local law, electronic communications with Accenture and its affiliates,
> including e-mail and instant messaging (including content), may be scanned
> by our systems for the purposes of information security and assessment of
> internal compliance with Accenture policy.
>
> __
>
> www.accenture.com
>


Re: Sync today?

2016-01-27 Thread Sarjeet Singh
same here..

On Wed, Jan 27, 2016 at 9:06 AM, Darin Johnson 
wrote:

> Couldn't get in.
>


Re: Myriad Vagrant Setup Issue

2016-01-19 Thread sarjeet singh
Glad to know it finally worked, Matt.

You're always welcome to add anything that is missing from documentation,
and anything useful that may help troubleshoot mesos/myriad issues for new
users too.

- Sarjeet Singh

On Mon, Jan 18, 2016 at 10:31 PM, Matthew J. Loppatto <
mloppa...@keywcorp.com> wrote:

> I was looking through more of my mesos slave stderr logs and found a
> message in one of them that said there was no class defined for
> myriad_executor.  Adding the following to my yarn-site.xml file resolved
> this issue:
>
> 
> yarn.nodemanager.aux-services.myriad_executor.class
> org.apache.myriad.executor.MyriadExecutorAuxService
> 
>
> Now my node manager is in a running state and appears to be staying there
> :)
>
> I'm not sure if this is related to the stderr message below but all seems
> to be working now.
>
> As a first time user of Myriad, I was planning on adding info to the setup
> guide to clarify things that weren't immediately obvious to me while
> setting it up.  Let me know if there would be any interest in this.
>
> Thanks everyone for your help!
> Matt
>
> 
> From: Matthew J. Loppatto [mloppa...@keywcorp.com]
> Sent: Monday, January 18, 2016 7:53 AM
> To: dev@myriad.incubator.apache.org
> Subject: RE: Myriad Vagrant Setup Issue
>
> Hi all,
>
> My latest stdout logs are empty.  My latest stderr logs in
> /tmp/mesos/slaves/... show only the following:
>
> ABORT:
> (/tmp/mesos-build/mesos-repo/3rdparty/libprocess/src/subprocess.cpp:177):
> Failed to os::execvpe in childMain: Argument list too long*** Aborted at
> 1452885649 (unix time) try "date -d @1452885649" if you are using GNU date
> ***
> PC: @ 0x7f670062fcc9 (unknown)
> *** SIGABRT (@0x41a4) received by PID 16804 (TID 0x7f66f7119700) from PID
> 16804; stack trace: ***
> @ 0x7f67009ce340 (unknown)
> @ 0x7f670062fcc9 (unknown)
> @ 0x7f67006330d8 (unknown)
> @   0x40ac42 _Abort()
> @   0x40ac7c _Abort()
> @ 0x7f670234a1ed process::childMain()
> @ 0x7f670234c23d std::_Function_handler<>::_M_invoke()
> @ 0x7f67006f347d (unknown)
>
>
> 
> From: Swapnil Daingade [sdaing...@maprtech.com]
> Sent: Friday, January 15, 2016 5:03 PM
> To: dev@myriad.incubator.apache.org
> Subject: Re: Myriad Vagrant Setup Issue
>
> Hi Matt,
>
> Looks like the Mesos slave now launches the NodeManager mesos task.
> However, the NodeManager seems to be dying after a while.
>
> The log files for the NodeManager Mesos task (that Darin mentioned) below
> should help figure out why the NodeManager died.
> Could you please post those files as well.
>
> Regards
> Swapnil
>
>
> On Fri, Jan 15, 2016 at 11:42 AM, Darin Johnson <dbjohnson1...@gmail.com>
> wrote:
>
> > Matt, if you can't access the UI, on the slave you should still be able
> to
> > access stderr and stdout going to:
> >
> > /tmp/mesos/slaves//frameworks/ representing
> > frameworkID>/executors/myriad_executor/runs/latest/stderr
> >
> > /tmp/mesos/slaves//frameworks/ representing
> > frameworkID>/executors/myriad_executor/runs/latest/stdout
> > Replace /tmp/mesos/ with your workdir (likely /var/run/mesos/ or
> > /tmp/mesos).  The error messages here are usually informative.
> >
> > On Fri, Jan 15, 2016 at 11:13 AM, Matthew J. Loppatto <
> > mloppa...@keywcorp.com> wrote:
> >
> > > Hey Darin,
> > >
> > > For some reason my Mesos UI hangs when loading the logs but I posted
> the
> > > contents of my mesos slave logs in /var/log/mesos to this public Gist:
> > > https://gist.github.com/FearTheParrot/b00aa7eee9ae169498d3
> > >
> > > Matt
> > >
> > > -Original Message-
> > > From: Darin Johnson [mailto:dbjohnson1...@gmail.com]
> > > Sent: Friday, January 15, 2016 10:55 AM
> > > To: Dev
> > > Subject: Re: Myriad Vagrant Setup Issue
> > >
> > > Hey Matt, if you look at the mesos ui is there any information in the
> > > stderr or stdout of the Slave Host it's staging on?
> > >
> > > Darin
> > >
> > > On Fri, Jan 15, 2016 at 10:36 AM, Matthew J. Loppatto <
> > > mloppa...@keywcorp.com> wrote:
> > >
> > > > I've gotten a little farther on this issue by increasing the mesos
> > > > slave memory to 4 GB from 2GB.  The node manager task get launched
> and
> > > > sits in the STAGING state for a minute and then the mesos-slave.INFO
> > log
> > 

Re: Next dev sync hangout will be on 1/6/2016

2016-01-12 Thread Sarjeet Singh
+1.

Can we confirm this for tomorrow if this is happening?

-Sarjeet

On Fri, Jan 8, 2016 at 6:53 PM, Darin Johnson 
wrote:

> Sounds good to me.  I think we might have another possible east coast
> contributor join.
>
> Darin
>
> On Thu, Jan 7, 2016 at 9:09 PM, Adam Bordelon  wrote:
>
> > Sorry, this slipped off my calendar, so I didn't even try to attend. I
> > guess we'll pick up again next week?
> >
> > On Wed, Jan 6, 2016 at 11:22 AM, Paul Curtis 
> wrote:
> >
> > > I attempted to join as well  it seems that no one was in the
> > > hangout. I gave up after about 15 minutes.
> > >
> > > paul
> > >
> > > On Wed, Jan 6, 2016 at 12:19 PM, Darin Johnson <
> dbjohnson1...@gmail.com>
> > > wrote:
> > > > Can't seem to join...
> > > > On Jan 6, 2016 12:16 PM, "Darin Johnson" 
> > > wrote:
> > > >
> > > >> Trying to join
> > > >> On Jan 6, 2016 12:06 PM, "yuliya Feldman"
>  > >
> > > >> wrote:
> > > >>
> > > >>> Do we have a sync today?
> > > >>>
> > > >>>
> > > >>>   From: Santosh Marella 
> > > >>>  To: dev@myriad.incubator.apache.org
> > > >>>  Sent: Wednesday, December 16, 2015 9:47 AM
> > > >>>  Subject: Next dev sync hangout will be on 1/6/2016
> > > >>>
> > > >>> We have decided to hold the next dev sync on 1/6/2016 (instead of
> > > >>> 12/30/2015).
> > > >>>
> > > >>> Meeting notes from today's hangout:
> > > >>>
> > > >>>
> > >
> >
> https://docs.google.com/document/d/1JGmJrgeg98bHw_0_sSRmyX6WiAe13OdErcFlaz6Aa04/edit#
> > > >>>
> > > >>> Thanks,
> > > >>> Santosh
> > > >>>
> > > >>>
> > > >>>
> > > >>
> > > >>
> > >
> > >
> > >
> > > --
> > > Paul Curtis - Senior Product Technologist
> > > O: +1 203-660-0015 - M: +1 203-539-9705
> > >
> > > Now Available - Free Hadoop On-Demand Training
> > >
> >
>


Re: Myriad is 0.1.0

2015-12-10 Thread Sarjeet Singh
Great job, Everyone!

Definitely a good news to start the day with :)

Sent from my iPhone

> On Dec 10, 2015, at 5:38 AM, Brandon Gulla  wrote:
> 
> Wooo! Great job everyone!
> 
>> On Thu, Dec 10, 2015 at 4:42 AM, Adam Bordelon  wrote:
>> 
>> Hooray! Time for champagne!
>> 
>> 
>> 
>> On Thu, Dec 10, 2015 at 1:13 AM, Santosh Marella 
>> wrote:
>> 
>>> Hi All,
>>> 
>>>  Congratulations on a the first Apache Myriad release..! Kudos to
>> everyone
>>> involved for making this happen.
>>> 
>>>  As we now have IPMC's approval, there are a few things that I did to
>> wrap
>>> up the release:
>>>  - Make 0.1.0 artifacts available from the release SVN repo [1].
>>>  - Git tag the voted RC as the "0.1.0" release [2].
>>>  - Delete the previously marked git RC tags.
>>>  - Closed the remaining JIRAs marked for 0.1.0 version and marked the
>>> 0.1.0 version as "released" [3].
>>>  - Submitted a PR [4] with scripts to prepare a RC and release a RC
>>> (automates the above git and svn steps)
>>>  - Updated the release guide [5] with voting links to help future
>> release
>>> managers.
>>> 
>>>  Here are a couple of things still remaining:
>>>  - Update the downloads page on Myriad's website with links to 0.1.0
>>> artifacts on svn, git tag, release notes.
>>>  - Do an announcement blog post. Here is a draft [6]. Please suggest any
>>> changes.
>>> 
>>>  If I may be missing something for 0.1.0, appreciate if you could bring
>> to
>>> my notice.
>>> 
>>>   1.
>> https://dist.apache.org/repos/dist/release/incubator/myriad/myriad-0.1.0-incubating/
>>>   2.
>> https://github.com/apache/incubator-myriad/releases/tag/myriad-0.1.0-incubating
>>>   3.
>> https://issues.apache.org/jira/browse/MYRIAD/?selectedTab=com.atlassian.jira.jira-projects-plugin:versions-panel
>>>   4. https://github.com/apache/incubator-myriad/pull/53
>>>   5. https://cwiki.apache.org/confluence/display/MYRIAD/Release+Guide
>>>   6.
>> https://docs.google.com/document/d/1zCXnDlqzNhj0BL_CqRz5-poCap9QFah7R3tKkHdspYg/edit
>>> 
>>> Thanks,
>>> Santosh
> 
> 
> 
> -- 
> Brandon


Re: [TESTERS NEEDED] ResourceManager Docker

2015-12-04 Thread Sarjeet Singh
Nice, Brandon!!

I can give it a try today, probably. Send me any more details if required
or necessary, or I'll PM you if I run into any issues. Hope this is ok to
PM you directly (Don't want to spam myriad mailing list).

-Sarjeet

On Fri, Dec 4, 2015 at 7:17 AM, Brandon Gulla 
wrote:

> I have built a new ResourceManager Docker that needs some testing love.
> Please test it out and let me know what bugs you may find. This is built
> for Apache Hadoop 2.7.0 and has not been tested with MapR distributions.
> You no longer have to do anything fancy, just mount the directory with your
> yarn-site.xml and myriad-config and you should be good to go. Once we feel
> good about it I can push it to the official mesos docker repo (Thanks again
> Ken).
>
> https://hub.docker.com/r/bgulla/myriad/
>
> Also, I hooked the docker build into gradle for future building goodness
> but will wait to submit that PR until after we cut a release.
>
> Thanks!
>
>
> --
> Brandon
>


Re: Help !! - Myriad mesos subprocess erroring with Argument list too long

2015-12-04 Thread Sarjeet Singh
Prabhu,

Can you paste/send your mesos-slave, mesos-master log file, if this is OK?

P.S., We might have seen this when frameworkUser was not set correctly in
myriad-config-default.yml. Can you double check if all configuration are
correct and the permissions are OK as well?

-Sarjeet

On Fri, Dec 4, 2015 at 10:53 AM, Prabhu Inbarajan <
inbarajan.pra...@gmail.com> wrote:

> I followed the myriad setup instructions , and was able to get resource
> manager invoke the myriad scheduler and talk to the mesos master. But  I
> see the following error in the mesos slave logs and my yarn submissions are
>  stuck.
>
> My setup is as follows:
> 1. Hadoop 2.7.1
> 2. Jdk8
> 3. Mesos Version: 0.25.0
> 4. 1 master + 2 slaves
> 5. ubuntu 14.04 + Kernel Linux master.dev 3.19.0-33-generic
> #38~14.04.1-Ubuntu SMP Fri Nov 6 18:17:28 UTC 2015 x86_64 x86_64 x86_64
> GNU/Linux
>
> Given this team is running with this, it is hard for me to presume this is
> a argument overflow issue and would require somekind of a kernel recompile
> : http://www.linuxjournal.com/article/6060?page=0,0. I am also thinking if
> to recompile mesos for better diagnostics. the subprocess.cpp seems to have
> better logging in master :
>
> https://github.com/apache/mesos/blob/master/3rdparty/libprocess/src/subprocess.cpp
> than in 0.25.0
>
>
>
> ABORT:
> (/tmp/mesos-build/mesos-repo/3rdparty/libprocess/src/subprocess.cpp:177):
> Failed to os::execvpe in childMain: Argument list too long*** Aborted
> at 1449220361 (unix time) try "date -d @1449220361" if you are using
> GNU date ***
> PC: @ 0x7fbfd2c66cc9 (unknown)
> *** SIGABRT (@0x231d) received by PID 8989 (TID 0x7fbfc944a700) from
> PID 8989; stack trace: ***
> @ 0x7fbfd3005340 (unknown)
> @ 0x7fbfd2c66cc9 (unknown)
> @ 0x7fbfd2c6a0d8 (unknown)
> @   0x40a902 _Abort()
> @   0x40a93c _Abort()
> @ 0x7fbfd477ac3b process::childMain()
> @ 0x7fbfd477cc6d std::_Function_handler<>::_M_invoke()
> @ 0x7fbfd2d2a47d (unknown)
>


[jira] [Created] (MYRIAD-175) Change Destroy myriad REST api method from 'GET' to 'POST/PUT'.

2015-11-24 Thread Sarjeet Singh (JIRA)
Sarjeet Singh created MYRIAD-175:


 Summary: Change Destroy myriad REST api method from 'GET' to 
'POST/PUT'.
 Key: MYRIAD-175
 URL: https://issues.apache.org/jira/browse/MYRIAD-175
 Project: Myriad
  Issue Type: Improvement
Reporter: Sarjeet Singh
Priority: Trivial


This is just trivial, but this can be fixed easily. Haven't looked at the code, 
but this should be fairly straight forward as we don't pass any parameter for 
shutdown myriad endpoint, except call on the /api/framework/shutdown/framework 
endpoint.

Currently, shutdown myriad api REST supported method is 'GET'. We should change 
it to either 'PUT' or maybe, POST.

Any suggestion or oppose on this?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Missing doc/readme on 'destroy myriad' API and a trivial JIRA MYRIAD-175.

2015-11-24 Thread Sarjeet Singh
hello,

I have filed a JIRA (trivial, enhancement) for the REST API method for
'destroy myriad' to be either 'PUT'/'POST' than 'GET'. Filed JIRA
MYRIAD-175.

And, I observed on README, wiki and myriad doc where 'destroy myriad API'
description is missing. I'll send a PR for the git readme/doc, and add
comment to WIKI for it.

Let me know if anything else should be taken care for this as well?

Happy holidays, Happy thanksgiving!!

-Sarjeet


Re: [VOTE] Release apache-myriad-0.1.0-incubating (release candidate 3)

2015-11-23 Thread Sarjeet Singh
>> noticed if JHS is up in HA mode and the RM is restarted a new JHS is
launched resulting in two JHS running (Minor should fix in next release)

Darin,

I had also tested the same scenario but I had "maxInstances: 1" in
myriad-config-default.yml for JHS configuration and it didn't start new JHS
instance on RM restart.

Here is my JHS configuration looks like from myriad-config-default.yml:

* taskName: jobhistory*

* serviceOptsName: HADOOP_JOB_HISTORYSERVER_OPTS*

* command: $YARN_HOME/bin/mapred historyserver*

* maxInstances: 1*

-Sarjeet

On Mon, Nov 23, 2015 at 7:57 PM, Darin Johnson <dbjohnson1...@gmail.com>
wrote:

> +1
> D/L'd built and ran and 4 node vanilla hadoop cluster (remote distro).
> verified md5 and sha hashes
> Ran M/R Job on CGS/FGS
> Flexup and down nodes.
> killed RM with HA enabled verified nodes up
>
> noticed if JHS is up in HA mode and the RM is restarted a new JHS is
> launched resulting in two JHS running (Minor should fix in next release).
> noticed if HA is enabled and the the framework time expires you must
> manually delete the statestore (Minor, should add documentation).
>
> On Mon, Nov 23, 2015 at 10:34 PM, Sarjeet Singh <sarjeetsi...@maprtech.com
> >
> wrote:
>
> > +1 (Non-Binding)
> >
> > Verified md5 & sha512 checksums.
> > D/L myriad-0.1.0-incubating-rc3.tar.gz, install gradle, Compiled code &
> > deployed it on a 4 node MapR cluster.
> > Tried FGS/CGS NM flex up/down, and ran hadoop M/R jobs.
> > Tried myriad HA with RM restart/kill.
> > Tried framework shutdown, and start myriad again.
> > Tried JHS configuration flex up/down and its functionality.
> >
> > -Sarjeet
> >
> > On Thu, Nov 19, 2015 at 10:37 PM, Santosh Marella <smare...@maprtech.com
> >
> > wrote:
> >
> > > Hi All,
> > >
> > > Firstly, thanks everyone for the valuable contributions to the project
> > and
> > > for holding on tight as we move along the release process. We're almost
> > > home!
> > >
> > > I have created a source tar ball for Apache Myriad 0.1.0-incubating,
> > > release candidate 3. This includes the feedback from the recent IPMC
> > > voting.
> > > Here’s the release notes:
> > > https://cwiki.apache.org/confluence/display/MYRIAD/Release+Notes
> > >
> > > The commit to be voted upon is tagged with
> "myriad-0.1.0-incubating-rc3"
> > > and is available here:
> > >
> > >
> >
> https://git-wip-us.apache.org/repos/asf?p=incubator-myriad.git;a=shortlog;h=refs/tags/myriad-0.1.0-incubating-rc3
> > >
> > > The artifacts to be voted upon are located below. Please note that this
> > is
> > > a source release:
> > >
> > >
> >
> https://dist.apache.org/repos/dist/dev/incubator/myriad/myriad-0.1.0-incubating-rc3/
> > >
> > > Release artifacts are signed with the following key:
> > > https://people.apache.org/keys/committer/smarella.asc
> > >
> > > **Please note that the release tar ball does not include the gradlew
> > script
> > > to build. You need to generate one in order to build.**
> > >
> > > Please try out the release candidate and vote. The vote is open for a
> > > minimum of 72 hours or until the necessary number of votes (3 binding
> > +1s)
> > > is reached.
> > >
> > > If/when this vote succeeds, I will call for a vote with IPMC seeking
> > > permission to release RC3 as Apache Myriad 0.1.0 (incubating).
> > >
> > > [ ] +1 Release this package as Apache Myriad 0.1.0-incubating
> > > [ ]  0 I don't feel strongly about it, but I'm okay with the release
> > > [ ] -1 Do not release this package because...
> > >
> > > Thanks,
> > > Santosh
> > >
> >
>


Re: [VOTE] Release apache-myriad-0.1.0-incubating (release candidate 3)

2015-11-23 Thread Sarjeet Singh
+1 (Non-Binding)

Verified md5 & sha512 checksums.
D/L myriad-0.1.0-incubating-rc3.tar.gz, install gradle, Compiled code &
deployed it on a 4 node MapR cluster.
Tried FGS/CGS NM flex up/down, and ran hadoop M/R jobs.
Tried myriad HA with RM restart/kill.
Tried framework shutdown, and start myriad again.
Tried JHS configuration flex up/down and its functionality.

-Sarjeet

On Thu, Nov 19, 2015 at 10:37 PM, Santosh Marella 
wrote:

> Hi All,
>
> Firstly, thanks everyone for the valuable contributions to the project and
> for holding on tight as we move along the release process. We're almost
> home!
>
> I have created a source tar ball for Apache Myriad 0.1.0-incubating,
> release candidate 3. This includes the feedback from the recent IPMC
> voting.
> Here’s the release notes:
> https://cwiki.apache.org/confluence/display/MYRIAD/Release+Notes
>
> The commit to be voted upon is tagged with "myriad-0.1.0-incubating-rc3"
> and is available here:
>
> https://git-wip-us.apache.org/repos/asf?p=incubator-myriad.git;a=shortlog;h=refs/tags/myriad-0.1.0-incubating-rc3
>
> The artifacts to be voted upon are located below. Please note that this is
> a source release:
>
> https://dist.apache.org/repos/dist/dev/incubator/myriad/myriad-0.1.0-incubating-rc3/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/smarella.asc
>
> **Please note that the release tar ball does not include the gradlew script
> to build. You need to generate one in order to build.**
>
> Please try out the release candidate and vote. The vote is open for a
> minimum of 72 hours or until the necessary number of votes (3 binding +1s)
> is reached.
>
> If/when this vote succeeds, I will call for a vote with IPMC seeking
> permission to release RC3 as Apache Myriad 0.1.0 (incubating).
>
> [ ] +1 Release this package as Apache Myriad 0.1.0-incubating
> [ ]  0 I don't feel strongly about it, but I'm okay with the release
> [ ] -1 Do not release this package because...
>
> Thanks,
> Santosh
>


Re: [VOTE] Release apache-myriad-0.1.0-incubating (release candidate 2)

2015-11-13 Thread Sarjeet Singh
+1 (Non-Binding)

Verified checksums.
D/L myriad-0.1.0-incubating-rc2.tar.gz, Compiled & deployed it on a 4 node
MapR cluster.
Tried FGS/CGS flex up/down, and ran hadoop M/R jobs.
Tried myriad HA with RM restart and kill -9.
Tried framework shutdown, and restart myriad again.
Tried JHS configuration flex up/down and its functionality.

-Sarjeet

On Fri, Nov 13, 2015 at 4:17 PM, Aashreya Shankar 
wrote:

> +1 (non binding)
>
> Successfully built binaries from rc2 tar.gz
> Tried it 5 node MapR cluster
> Flex up/down works accordingly
> Hadoop jobs running fine.
> Build was successful through Vagrant
>
> Thank you
> Aashreya
>
> On Fri, Nov 13, 2015 at 3:13 PM, Swapnil Daingade <
> swapnil.daing...@gmail.com> wrote:
>
> > Downloaded rc2 tar.gz
> > * Verified md5 and sha512 hashes successfully
> > * Built binaries successfully
> > * Deployed on 3 node MapR cluster
> > * Tested NM flexup/flexdown with HA enabled and disabled.
> > * Tried HA
> > * Tried Framework Shutdown.
> > All operations worked as expected.
> >
> > +1
> >
> > Regards
> > Swapnil
> >
> >
> > On Thu, Nov 12, 2015 at 4:55 PM, Santosh Marella 
> > wrote:
> >
> > > Hi all,
> > >
> > > I have created a build for Apache Myriad 0.1.0-incubating, release
> > > candidate 2.
> > >
> > > Thanks to everyone who has contributed to this release.
> > >
> > > Here’s the release notes:
> > > https://cwiki.apache.org/confluence/display/MYRIAD/Release+Notes
> > >
> > > The commit to be voted upon is tagged with
> "myriad-0.1.0-incubating-rc2"
> > > and is available here:
> > >
> > >
> >
> https://git1-us-west.apache.org/repos/asf/incubator-myriad/repo?p=incubator-myriad.git;a=commit;h=fb93291e9377cccf625bed93a9ad1ae1c4b76529
> > >
> > > The artifacts to be voted upon are located here:
> > > *
> > >
> >
> https://dist.apache.org/repos/dist/dev/incubator/myriad/myriad-0.1.0-incubating-rc2/
> > > <
> > >
> >
> https://dist.apache.org/repos/dist/dev/incubator/myriad/myriad-0.1.0-incubating-rc2/
> > > >*
> > >
> > > Release artifacts are signed with the following key:
> > > https://people.apache.org/keys/committer/smarella.asc
> > >
> > > Please vote on releasing this package as Apache Myriad
> 0.1.0-incubating.
> > >
> > > The vote is open for the next 72 hours and passes if a majority of
> > > at least three +1 PPMC votes are cast.
> > >
> > > [ ] +1 Release this package as Apache Myriad 0.1.0-incubating
> > > [ ]  0 I don't feel strongly about it, but I'm okay with the release
> > > [ ] -1 Do not release this package because...
> > >
> > > Here is my vote:
> > > +1 (binding)
> > >
> > > Thanks,
> > > Santosh
> > >
> >
>


Re: [VOTE] Release apache-myriad-0.1.0-incubating (release candidate 1)

2015-11-11 Thread Sarjeet Singh
Yes, it is used for vagrant.

https://github.com/apache/incubator-myriad/blob/master/docs/vagrant.md#shutting-down

-Sarjeet

On Wed, Nov 11, 2015 at 10:19 AM, Santosh Marella <smare...@maprtech.com>
wrote:

> Any idea what shutdown.sh script is used for ? I can omit this if it's used
> for Vagrant.
>
> Santosh
>
> On Wed, Nov 11, 2015 at 10:13 AM, Santosh Marella <smare...@maprtech.com>
> wrote:
>
> > Agreed. Will do a RC2 that omits vagrant files.
> >
> > Thanks, Jim.
> >
> > --
> > Sent from mobile
> > On Nov 11, 2015 7:08 AM, "Darin Johnson" <dbjohnson1...@gmail.com>
> wrote:
> >
> >> That's a good idea.
> >>
> >> On Wed, Nov 11, 2015 at 10:03 AM, Jim Klucar <klu...@gmail.com> wrote:
> >>
> >> > 0 (non-binding)
> >> >
> >> > Vagrant environment is broken.
> >> > I did a `vagrant up` and ran the setup-yarn-1.sh and setup-yarn-2.sh
> >> > scripts. The first had a slight problem, the second failed.
> >> > I then tried `./gradlew build` from inside vagrant and the build
> failed
> >> in
> >> > the web-ui. I believe it is due to how vagrant maps things to /vagrant
> >> but
> >> > didn't really dig into it. It builds fine on my local machine.
> >> >
> >> > I recommend removing the Vagrantfile and the setup-yarn-* scripts and
> >> > releasing. We can then decide to revamp or permanently remove the
> >> Vagrant
> >> > setup for a separate release.
> >> >
> >> >
> >> >
> >> > On Tue, Nov 10, 2015 at 10:42 PM, Darin Johnson <
> >> dbjohnson1...@gmail.com>
> >> > wrote:
> >> >
> >> > > +1
> >> > > D/L'd tar ball verified checksums
> >> > > Flexed up/down nodes and JHS
> >> > > Ran MR job with FGS
> >> > >
> >> > >
> >> > >
> >> > > On Tue, Nov 10, 2015 at 9:12 PM, Sarjeet Singh <
> >> > sarjeetsi...@maprtech.com>
> >> > > wrote:
> >> > >
> >> > > > +1 (Non-Binding)
> >> > > >
> >> > > > Verified checksums.
> >> > > > Downloaded myriad-0.1.0-incubating-rc1.tar.gz, Compiled the code
> and
> >> > > > deployed it on a 4 node MapR cluster.
> >> > > > Tried basic functionality tests for FGS/CGS flex up/down and it
> >> worked
> >> > > > fine.
> >> > > > Tried running M/R job and it completed successfully.
> >> > > > Tried framework shutdown, shutdown went smooth.
> >> > > > Tried JHS configuration and service flex-up, and it worked fine.
> >> > > >
> >> > > >
> >> > > > Thanks,
> >> > > > Sarjeet Singh
> >> > > >
> >> > > > On Tue, Nov 10, 2015 at 3:05 PM, Santosh Marella <
> >> > smare...@maprtech.com>
> >> > > > wrote:
> >> > > >
> >> > > > > Hi all,
> >> > > > >
> >> > > > > I have created a build for Apache Myriad 0.1.0-incubating,
> release
> >> > > > > candidate 1.
> >> > > > >
> >> > > > > Thanks to everyone who has contributed to this release.
> >> > > > >
> >> > > > > Here’s the release notes:
> >> > > > >
> https://cwiki.apache.org/confluence/display/MYRIAD/Release+Notes
> >> > > > >
> >> > > > > The commit to be voted upon is tagged with
> >> > > "myriad-0.1.0-incubating-rc1"
> >> > > > > and is available here:
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://git1-us-west.apache.org/repos/asf/incubator-myriad/repo?p=incubator-myriad.git;a=commit;h=9f0fa15bfaa4fdc309ada27126567a2aa5bf296b
> >> > > > >
> >> > > > > The artifacts to be voted upon are located here:
> >> > > > > *
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://dist.apache.org/repos/dist/dev/incubator/myriad/myriad-0.1.0-incubating-rc1/
> >> > > > > <
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://dist.apache.org/repos/dist/dev/incubator/myriad/myriad-0.1.0-incubating-rc1/
> >> > > > > >*
> >> > > > >
> >> > > > > Release artifacts are signed with the following key:
> >> > > > > https://people.apache.org/keys/committer/smarella.asc
> >> > > > >
> >> > > > > Please vote on releasing this package as Apache Myriad
> >> > > 0.1.0-incubating.
> >> > > > >
> >> > > > > The vote is open for the next 72 hours and passes if a majority
> of
> >> > > > > at least three +1 PPMC votes are cast.
> >> > > > >
> >> > > > > [ ] +1 Release this package as Apache Myriad 0.1.0-incubating
> >> > > > > [ ]  0 I don't feel strongly about it, but I'm okay with the
> >> release
> >> > > > > [ ] -1 Do not release this package because...
> >> > > > >
> >> > > > > Here is my vote:
> >> > > > > +1 (binding)
> >> > > > >
> >> > > > > Thanks,
> >> > > > > Santosh
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
>


Re: New committer: Darin J

2015-11-05 Thread Sarjeet Singh
Congrats Darin :)

On Thu, Nov 5, 2015 at 8:14 AM, Santosh Marella 
wrote:

> Congratulations Darin.
>
> --
> Sent from mobile
> On Nov 5, 2015 4:36 AM, "Adam Bordelon"  wrote:
>
> > The Podling Project Management Committee (PPMC) for Apache Myriad has
> asked
> > Darin to become a committer and PPMC member and we are pleased to
> announce
> > that he has accepted.
> >
> > Please join me in welcoming Darin as a Myriad committer, and let's thank
> > him for all his contributions so far. Looking forward to more!
> >
> > Cheers,
> > -Adam-
> >
>


Re: New Committer: Swapnil Daingade

2015-11-05 Thread Sarjeet Singh
Congrats Swapnil, Very nice work on HA :)

On Thu, Nov 5, 2015 at 8:13 AM, Santosh Marella 
wrote:

> Congratulations Swapnil.
>
> --
> Sent from mobile
> On Nov 5, 2015 7:31 AM, "yuliya Feldman" 
> wrote:
>
> > Congratulations Swapnil!!!
> > Well done
> > Yuliya
> >   From: Adam Bordelon 
> >  To: dev@myriad.incubator.apache.org
> >  Sent: Thursday, November 5, 2015 4:38 AM
> >  Subject: New Committer: Swapnil Daingade
> >
> > The Podling Project Management Committee (PPMC) for Apache Myriad has
> asked
> > Swapnil Daingade to become a committer and PPMC member and we are pleased
> > to announce that he has accepted.
> >
> > Please join me in welcoming Swapnil as a Myriad committer, and let's
> thank
> > him for all his contributions so far. Looking forward to more!
> >
> > Cheers,
> > -Adam-
> >
> >
> >
>


[jira] [Reopened] (MYRIAD-140) NullPointerException on NM flex up/down API when no params passed to HTTP PUT request.

2015-10-20 Thread Sarjeet Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/MYRIAD-140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sarjeet Singh reopened MYRIAD-140:
--

Reopening this as I am still getting NullPointerException with NM flex down 
only, when tried with latest code. Everything else works just fine.

> NullPointerException on NM flex up/down API when no params passed to HTTP PUT 
> request.
> --
>
> Key: MYRIAD-140
> URL: https://issues.apache.org/jira/browse/MYRIAD-140
> Project: Myriad
>  Issue Type: Bug
>Affects Versions: Myriad 0.1.0
>    Reporter: Sarjeet Singh
>Assignee: Yuliya Feldman
>
> Observed the NullPointerException when tried to flex up using curl without 
> any params passed to http request. (didn't use myriad UI for flexing)
> 15/09/22 14:54:50 INFO api.ClustersResource: Received Flexup Cluster Request
> 15/09/22 14:54:50 INFO api.ClustersResource: Instances: null
> 15/09/22 14:54:50 INFO api.ClustersResource: Profile: null
> Sep 22, 2015 2:54:50 PM com.sun.jersey.spi.container.ContainerResponse 
> mapMappableContainerException
> SEVERE: The RuntimeException could not be mapped to a response, re-throwing 
> to the HTTP container
> java.lang.NullPointerException
>   at 
> java.util.concurrent.ConcurrentHashMap.hash(ConcurrentHashMap.java:333)
>   at 
> java.util.concurrent.ConcurrentHashMap.containsKey(ConcurrentHashMap.java:1016)
>   at 
> com.ebay.myriad.scheduler.NMProfileManager.exists(NMProfileManager.java:44)
>   at com.ebay.myriad.api.ClustersResource.flexUp(ClustersResource.java:84)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
>   at 
> com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:205)
>   at 
> com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
>   at 
> com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288)
>   at 
> com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
>   at 
> com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
>   at 
> com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
>   at 
> com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
>   at 
> com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1469)
>   at 
> com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1400)
>   at 
> com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349)
>   at 
> com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339)
>   at 
> com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:699)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
>   at 
> com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
>   at 
> com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
>   at 
> com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
>   at 
> com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
>   at 
> com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
>   at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at 
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
>   at 
> org.mortbay.jetty.handler.ContextHandler.handle(Context

[jira] [Commented] (MYRIAD-140) NullPointerException on NM flex up/down API when no params passed to HTTP PUT request.

2015-10-20 Thread Sarjeet Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/MYRIAD-140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14965600#comment-14965600
 ] 

Sarjeet Singh commented on MYRIAD-140:
--

15/10/20 12:23:15 INFO api.ClustersResource: Received flex down request. 
Profile: null, Instances: null, Constraints: null
15/10/20 12:23:15 ERROR api.ClustersResource: 'profile' is null or empty
Oct 20, 2015 12:23:15 PM com.sun.jersey.spi.container.ContainerResponse 
mapMappableContainerException
SEVERE: The RuntimeException could not be mapped to a response, re-throwing to 
the HTTP container
java.lang.NullPointerException
at 
java.util.concurrent.ConcurrentHashMap.hash(ConcurrentHashMap.java:333)
at 
java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:988)
at 
com.ebay.myriad.scheduler.ServiceProfileManager.get(ServiceProfileManager.java:21)
at 
com.ebay.myriad.api.ClustersResource.getNumFlexedupNMs(ClustersResource.java:262)
at 
com.ebay.myriad.api.ClustersResource.flexDown(ClustersResource.java:166)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
at 
com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:205)
at 
com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
at 
com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288)
at 
com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at 
com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
at 
com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at 
com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
at 
com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1469)
at 
com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1400)
at 
com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349)
at 
com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339)
at 
com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
at 
com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537)
at 
com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:699)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at 
com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
at 
com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
at 
com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
at 
com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
at 
com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at 
org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
at 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)

> NullPointerException on NM flex up/down API when no params passed to HTTP PUT 
> request.
> --
>
> Key: MYRIAD-140
>  

[jira] [Created] (MYRIAD-152) Myriad UI support for NM constraints for flex up/down API request.

2015-10-16 Thread Sarjeet Singh (JIRA)
Sarjeet Singh created MYRIAD-152:


 Summary: Myriad UI support for NM constraints for flex up/down API 
request.
 Key: MYRIAD-152
 URL: https://issues.apache.org/jira/browse/MYRIAD-152
 Project: Myriad
  Issue Type: Bug
Reporter: Sarjeet Singh


We need UI support for specifying NM constraints for flex up/down requests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MYRIAD-156) NullPointerException from "Error in handling event type NODE_RESOURCE_UPDATE to the scheduler"

2015-10-16 Thread Sarjeet Singh (JIRA)
Sarjeet Singh created MYRIAD-156:


 Summary: NullPointerException from "Error in handling event type 
NODE_RESOURCE_UPDATE to the scheduler"
 Key: MYRIAD-156
 URL: https://issues.apache.org/jira/browse/MYRIAD-156
 Project: Myriad
  Issue Type: Bug
Reporter: Sarjeet Singh


The NPE happens where there is a node in cluster becomes unhealthy, and 
scheduler removes them from internal data structure. However, when the node 
heartbeats and scheduler tries to search for this node, and try to operate on 
it, it gets nullPointerException there. Here is the code snippet where this is 
causing NPE: 

SchedulerNode node = getSchedulerNode(nm.getNodeID());
the node object is Null causing the Null pointer exception.

Here is the RM log for caused exception:

15/10/06 09:18:09 INFO handlers.ResourceOffersEventHandler: Offer not
sufficient for task with, cpu: 4.4, memory: 5504.0, spindles: 4.0, ports: 996
15/10/06 09:18:11 FATAL resourcemanager.ResourceManager: Error in handling
event type NODE_RESOURCE_UPDATE to the scheduler
java.lang.NullPointerException
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.updateNodeResource(AbstractYarnScheduler.java:548)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.updateNodeResource(FairScheduler.java:1712)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1293)
at
com.ebay.myriad.scheduler.yarn.MyriadFairScheduler.handle(MyriadFairScheduler.java:64)
at
com.ebay.myriad.scheduler.yarn.MyriadFairScheduler.handle(MyriadFairScheduler.java:17)
at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:693)
at java.lang.Thread.run(Thread.java:745)
15/10/06 09:18:11 INFO resourcemanager.ResourceManager: Exiting, bbye..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Marathon version 0.11 dependency on Java 8

2015-10-01 Thread Sarjeet Singh
>From the marathon 0.11.0 release notes:

   - Java 8 or higher is needed to run Marathon, since Java 6 and 7 support
   has reached end of life.

Here is the reference link: https://github.com/mesosphere/marathon/releases

-Sarjeet

On Thu, Oct 1, 2015 at 11:21 AM, yuliya Feldman  wrote:

> Hello guys,
> Is it true that Java 8 is a requirement to run Marathon 0.11?
> It feels a bit strange that it would not support Java 7.
> Thanks,Yuliya


Re: Node Managers connecting to 0.0.0.0:8031

2015-09-25 Thread Sarjeet Singh
Zhongyue,

Glad to know it worked.

As per my understanding, Myriad does pass RM's IP to NM executor when NM
tasks gets launched through mesos, but for this, we need to configure RM's
address. Without configuring this, we may not know what address is assigned
to RM when it gets launched. Correct me if I am wrong or misunderstood your
question?

The way I tried myriad was using mesos-dns service to take care the RM
discovery and only thing I needed to do was specify the *.marathon.mesos
address to yarn-site.xml or through commandline for RM's address (in case
when RM is launched through Marathon). This helped with Myriad HA also if
RM is launched on some other node in the cluster due to failure and there
is no need for manual intervention to keep changing address in this case.

For second question, Can you get mesos task info if NM task is launched or
in active state from Mesos UI? Mesos also provide the sandbox link where NM
logs can be fetched, Same goes for RM logs as well, since this is also a
mesos task/executor running by Mesos-master. Or, mesos tasks logs resides
somewhere in /tmp/mesos/* dir for storing mesos  master/slave logs.

Any more info/logs on the issue will help triage this, as this can be issue
with configuration too. Anyways, ping us again if you run into any
non-obvious issue or need any help with something else.

-Sarjeet



On Tue, Sep 22, 2015 at 7:41 PM, Zhongyue Luo <zhongyue@gmail.com>
wrote:

> Thanks Sarjeet, it work.
>
> However, this seems very strange. Shouldn't the RM's IP be included in the
> task info so that the executor injects the IP when launching the NM?
>
> Also I can see that the defaule NM has been registered to the RM through
> the RM web ui but the task status is still "STAGING" from the Mesos web ui.
> Is this normal?
>
> On Tue, Sep 22, 2015 at 11:19 PM, Sarjeet Singh <sarjeetsi...@maprtech.com
> >
> wrote:
>
> > Zhongyue,
> >
> > You can specify RM's IP from commandline when starting RM, or you can set
> > the following property in yarn-site.xml:
> >
> > 
> >
> > yarn.resourcemanager.hostname
> >
> > RM IP
> >
> >   
> >
> > OR
> >
> > From commandline,
> >
> > YARN_RESOURCEMANAGER_OPTS=-Dyarn.resourcemanager.hostname= && yarn
> > resourcemanager
> >
> > ===
> >
> > Try the following and see it it works?
> >
> > -Sarjeet
> >
> > On Tue, Sep 22, 2015 at 1:04 AM, Zhongyue Luo <zhongyue@gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > > I've recently redeployed Myriad in our Mesos cluster.
> > >
> > > However, the node managers fail because they are trying to connect to a
> > > invalid Resource Manager IP.
> > >
> > > Below is a part of the log in one of the Mesos Agents that attemts to
> > > launch a Node manager.
> > >
> > > 15/09/22 15:41:52 INFO webapp.WebApps: Web app /node started at 8042
> > > 15/09/22 15:41:52 INFO webapp.WebApps: Registered webapp guice modules
> > > 15/09/22 15:41:52 INFO client.RMProxy: Connecting to ResourceManager
> at /
> > > 0.0.0.0:8031
> > > 15/09/22 15:41:52 INFO nodemanager.NodeStatusUpdaterImpl: Sending out 0
> > NM
> > > container statuses: []
> > > 15/09/22 15:41:52 INFO nodemanager.NodeStatusUpdaterImpl: Registering
> > with
> > > RM using containers :[]
> > > 15/09/22 15:41:54 INFO ipc.Client: Retrying connect to server:
> > > 0.0.0.0/0.0.0.0:8031. Already tried 0 time(s); retry policy is
> > > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
> > > MILLISECONDS)
> > > 15/09/22 15:41:55 INFO ipc.Client: Retrying connect to server:
> > > 0.0.0.0/0.0.0.0:8031. Already tried 1 time(s); retry policy is
> > > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
> > > MILLISECONDS)
> > > 15/09/22 15:41:56 INFO ipc.Client: Retrying connect to server:
> > > 0.0.0.0/0.0.0.0:8031. Already tried 2 time(s); retry policy is
> > > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
> > > MILLISECONDS)
> > > 15/09/22 15:41:57 INFO ipc.Client: Retrying connect to server:
> > > 0.0.0.0/0.0.0.0:8031. Already tried 3 time(s); retry policy is
> > > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
> > > MILLISECONDS)
> > >
> > > You can see that it attempts to connect to 0.0.0.0:8031 when the
> active
> > > resource manager is located in a different location.
> > >
> > > I've followed the instructions here.
> > > https://github.com/mesos/myriad/blob/phase1/docs/myriad-dev.md
> > >
> > > Which configuration do I need to recheck to get this right?
> > >
> > > Thanks in advance.
> > >
> > > -zhongyue
> > >
> > > --
> > > *Intel SSG/STO/BDT*
> > > 880 Zixing Road, Zizhu Science Park, Minhang District, 200241,
> Shanghai,
> > > China
> > > +862161166500
> > >
> >
>
>
>
> --
> *Intel SSG/STO/BDT*
> 880 Zixing Road, Zizhu Science Park, Minhang District, 200241, Shanghai,
> China
> +862161166500
>


[jira] [Created] (MYRIAD-137) Resources offered by mesos are blocked with Myriad FWK on NullPointerException and FlexDown FGS NM.

2015-09-14 Thread Sarjeet Singh (JIRA)
Sarjeet Singh created MYRIAD-137:


 Summary: Resources offered by mesos are blocked with Myriad FWK on 
NullPointerException and FlexDown FGS NM.
 Key: MYRIAD-137
 URL: https://issues.apache.org/jira/browse/MYRIAD-137
 Project: Myriad
  Issue Type: Bug
  Components: Scheduler
Affects Versions: Myriad 0.1.0
Reporter: Sarjeet Singh


Observed this issue on 2 instances when I did a flex down of FGS NM & On 
another instance, this happened when NullPointerException occurred (JIRA 
Myriad-135).

>From Mesos UI, observed that no resources are left to offer, when there was no 
>utilization happening in the cluster, except 3 NMs (2 MP, 1 ZP).

On debugging RM logs, found the NullPointerException which caused the 
OfferEventHandler thread to exit and no more offers from mesos to myriad after 
that.

Then, I tried restarting RM again, and resources are back to mesos again :)

Then, I tried running few mapreduce jobs and observed the issue with Flexing 
down FGS NM which caused the whole resources offered to myriad to block 
completely and myriad didn't release any resources after that.

So, it seems that Flexing down NMs procedure only cleanup the active containers 
& NM itself, but doesn't clean up outstanding offers incase offers are saved to 
OfferLifeCycle for future task by FGS NMs. 

Resources (From mesos-master UI)
=

CPUsMem
Total84253.9 GB
Used3.3006.1 GB
Offered80.700247.8 GB
Idle-1.4210854715202004e-140 B<--- No Resources available.

Here is the active Offers (*blocked*) shown on mesos UI for offers:

Offers
=

IDFrameworkHostCPUsMem
…5050-3270-O4151MyriadAlphanode101-1160.564 MB
…5050-3270-O4149MyriadAlphanode101-1160.200282 MB
…5050-3270-O4147MyriadAlphanode101-11611.0 GB
…5050-3270-O4145MyriadAlphanode101-11611.0 GB
…5050-3270-O4143MyriadAlphanode101-11611.0 GB
…5050-3270-O4141MyriadAlphanode101-11611.0 GB
…5050-3270-O4139MyriadAlphanode101-11724.587.8 GB
…5050-3270-O4137MyriadAlphanode101-11622.987.4 GB
…5050-3270-O4135MyriadAlphanode101-11733.0 GB
…5050-3270-O4134MyriadAlphanode101-13725.665.2 GB



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Google Hangout Link

2015-08-26 Thread Sarjeet Singh
https://plus.google.com/hangouts/_/mesosphere.io/myriad

On Wednesday, August 26, 2015, Brandon Gulla gulla.bran...@gmail.com
wrote:
 Can someone send out the active google hangout link please?

 thanks

 --
 Brandon


-- 
Sent from Gmail Mobile


[jira] [Updated] (MYRIAD-128) Issue with Flex down, Pending NMs stuck in staging and don't get to active task.

2015-08-24 Thread Sarjeet Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/MYRIAD-128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sarjeet Singh updated MYRIAD-128:
-
Attachment: Screen Shot 2015-08-24 at 5.51.38 PM.png

Myriad UI screenshot

 Issue with Flex down, Pending NMs stuck in staging and don't get to active 
 task.
 

 Key: MYRIAD-128
 URL: https://issues.apache.org/jira/browse/MYRIAD-128
 Project: Myriad
  Issue Type: Bug
  Components: Scheduler
Affects Versions: Myriad 0.1.0
Reporter: Sarjeet Singh
 Attachments: Screen Shot 2015-08-24 at 5.51.38 PM.png


 Seeing some issue when I tried flexing NMs from Myriad UI. On flexing down 
 active NM,  pending NMs doesn't go to active state (not sowing in 'Active 
 Tasks') and there is no active NM showing on Myriad UI. Although, there is a 
 NM running on the node (verified from jps). 
 mapr 20528 20526  1 17:23 ?00:00:26 
 /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.85.x86_64/bin/java -Dproc_nodemanager 
 -Xmx1000m -Dhadoop.log.dir=/opt/mapr/hadoop/hadoop-2.7.0/logs 
 -Dyarn.log.dir=/opt/mapr/hadoop/hadoop-2.7.0/logs -Dhadoop.log.file=yarn.log 
 -Dyarn.log.file=yarn.log -Dyarn.home.dir= -Dyarn.id.str= 
 -Dhadoop.root.logger=INFO,console -Dyarn.root.logger=INFO,console 
 -Djava.library.path=/opt/mapr/hadoop/hadoop-2.7.0/lib/native 
 -Dyarn.policy.file=hadoop-policy.xml -server 
 -Dnodemanager.resource.io-spindles=4.0 
 -Dyarn.resourcemanager.hostname=testrm.marathon.mesos 
 -Dyarn.nodemanager.container-executor.class=org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor
  -Dnodemanager.resource.cpu-vcores=0 -Dnodemanager.resource.memory-mb=0 
 -Dmyriad.yarn.nodemanager.address=0.0.0.0:31000 
 -Dmyriad.yarn.nodemanager.localizer.address=0.0.0.0:31001 
 -Dmyriad.yarn.nodemanager.webapp.address=0.0.0.0:31002 
 -Dmyriad.mapreduce.shuffle.port=0.0.0.0:31003 -Dhadoop.login=maprsasl 
 -Dhttps.protocols=TLSv1.2 
 -Djava.security.auth.login.config=/opt/mapr/conf/mapr.login.conf 
 -Dzookeeper.sasl.clientconfig=Client_simple 
 -Dzookeeper.saslprovider=com.mapr.security.simplesasl.SimpleSaslProvider 
 -Dhadoop.log.dir=/opt/mapr/hadoop/hadoop-2.7.0/logs 
 -Dyarn.log.dir=/opt/mapr/hadoop/hadoop-2.7.0/logs -Dhadoop.log.file=yarn.log 
 -Dyarn.log.file=yarn.log -Dyarn.home.dir=/opt/mapr/hadoop/hadoop-2.7.0 
 -Dhadoop.home.dir=/opt/mapr/hadoop/hadoop-2.7.0 
 -Dhadoop.root.logger=INFO,console -Dyarn.root.logger=INFO,console 
 -Djava.library.path=/opt/mapr/hadoop/hadoop-2.7.0/lib/native -classpath 
 /opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop:/opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop:/opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/hdfs:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/hdfs/lib/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/hdfs/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/yarn/lib/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/yarn/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/mapreduce/lib/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/mapreduce/*:/contrib/capacity-scheduler/*.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/yarn/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/yarn/lib/*:/opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop/nm-config/log4j.properties:/opt/mapr/lib/JPam-1.1.jar
  org.apache.hadoop.yarn.server.nodemanager.NodeManager
 From myriad UI:
 Active Tasks
 Killable Tasks
 Pending Tasks
 Staging Tasks
 nm.large.123badb1-57d8-4bd2-aa2e-de9fc1898c7f
 nm.medium.f2c4126c-4cb2-46af-a1e0-690034b914b8
 nm.medium.a9e9fd84-350a-48bc-bcd2-8712ecdc8c66
 nm.medium.663f9c6e-f28e-4395-8540-70c306eb04c5
 nm.medium.93f7cc91-9263-48a7-821e-3b0ffbe70e66
 This is the state even after waited for about 30 min or so after flexing down 
 the NM.
 I tried this on a single node cluster though, but looks like the problem can 
 happen in any case.
 I started RM from marathon and was able to get RM  Myriad up  running. With 
 RM launched, there is a CGS (medium profile) NM is launched along with it as 
 well which is shown as 'Active Task' on Myriad UI. Then, I launched some 
 large profile  zero profile NM which are shown now in 'Pending tasks' since 
 there is a (CGS default) NM already running on a single node cluster.
 Then, I tried flexing down NM from myriad UI, which flexed up the active NM 
 and all pending NMs start moving to staging tasks, and then they stuck in 
 staging task for longer time. On waited for about  30min, I dont see any 
 active task for NM and all of the pending NM tasks are shown in 'Staging 
 task' only. (See the screenshot)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MYRIAD-125) Myriad /config API issue for YARN_NODEMANAGER_OPTS param when RM's hostname is passed.

2015-08-18 Thread Sarjeet Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/MYRIAD-125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sarjeet Singh updated MYRIAD-125:
-
Description: 
The issue is with YARN_NODEMANAGER_OPTS param that has 
-Dyarn.resourcemanager.hostname=host-name keep appending on each offer 
created by myriad to yarnEnvironment. 

Found this piece of code that has the issue.

In TaskFactory.java:
code
String rmHostName = System.getProperty(YARN_RESOURCEMANAGER_HOSTNAME);
if (rmHostName != null  !rmHostName.isEmpty()) {

String nmOpts = 
nmTaskConfig.getYarnEnvironment().get(YARN_NODEMANAGER_OPTS_KEY);
if (nmOpts == null) {
nmOpts = ;
}
nmOpts +=   + -D + YARN_RESOURCEMANAGER_HOSTNAME + = + 
rmHostName;

nmTaskConfig.getYarnEnvironment().put(YARN_NODEMANAGER_OPTS_KEY, nmOpts);
LOGGER.info(YARN_RESOURCEMANAGER_HOSTNAME +  is set to  + 
rmHostName +
 via YARN_RESOURCEMANAGER_OPTS. Passing it into 
YARN_NODEMANAGER_OPTS.);
}
/code

Observed this when I tried to start RM from marathon, and HTTP GET the /config 
API and checked YARN_NODEMANAGER_OPTS between 10-15 min interval .
 
Here is what I observed:

yarnEnvironment: {
YARN_HOME: /opt/mapr/hadoop/hadoop-2.7.0/, 
YARN_NODEMANAGER_OPTS: -Dnodemanager.resource.io-spindles=4.0 
-Dyarn.resourcemanager.hostname=rm.marathon.mesos
}, 

After 10-15 min of being idle:

yarnEnvironment: {
YARN_HOME: /opt/mapr/hadoop/hadoop-2.7.0/, 
YARN_NODEMANAGER_OPTS: -Dnodemanager.resource.io-spindles=4.0 
-Dyarn.resourcemanager.hostname=rm.marathon.mesos 
-Dyarn.resourcemanager.hostname=rm.marathon.mesos 
-Dyarn.resourcemanager.hostname=rm.marathon.mesos 
-Dyarn.resourcemanager.hostname=rm.marathon.mesos 
-Dyarn.resourcemanager.hostname=rm.marathon.mesos 
-Dyarn.resourcemanager.hostname=rm.marathon.mesos 
-Dyarn.resourcemanager.hostname=rm.marathon.mesos 
-Dyarn.resourcemanager.hostname=rm.marathon.mesos 
-Dyarn.resourcemanager.hostname=rm.marathon.mesos 
-Dyarn.resourcemanager.hostname=rm.marathon.mesos 
-Dyarn.resourcemanager.hostname=rm.marathon.mesos 
-Dyarn.resourcemanager.hostname=rm.marathon.mesos 
-Dyarn.resourcemanager.hostname=rm.marathon.mesos 
-Dyarn.resourcemanager.hostname=rm.marathon.mesos 
-Dyarn.resourcemanager.hostname=rm.marathon.mesos 
-Dyarn.resourcemanager.hostname=rm.marathon.mesos 
-Dyarn.resourcemanager.hostname=rm.marathon.mesos 
-Dyarn.resourcemanager.hostname=rm.marathon.mesos 
-Dyarn.resourcemanager.hostname=rm.marathon.mesos
}, 

Let me know if need any addiotinal detail about the issue? Thanks.

FYI, this should be easy to reproducible by just starting Myriad and check 
/config API output between 10-15 min interval.

  was:
The issue is with YARN_NODEMANAGER_OPTS param that has 
-Dyarn.resourcemanager.hostname=host-name keep appending on each offer 
created by myriad to yarnEnvironment. 

Found this piece of code that has the issue.

In TaskFactory.java:
code
String rmHostName = System.getProperty(YARN_RESOURCEMANAGER_HOSTNAME);
if (rmHostName != null  !rmHostName.isEmpty()) {

String nmOpts = 
nmTaskConfig.getYarnEnvironment().get(YARN_NODEMANAGER_OPTS_KEY);
if (nmOpts == null) {
nmOpts = ;
}
nmOpts +=   + -D + YARN_RESOURCEMANAGER_HOSTNAME + = + 
rmHostName;

nmTaskConfig.getYarnEnvironment().put(YARN_NODEMANAGER_OPTS_KEY, nmOpts);
LOGGER.info(YARN_RESOURCEMANAGER_HOSTNAME +  is set to  + 
rmHostName +
 via YARN_RESOURCEMANAGER_OPTS. Passing it into 
YARN_NODEMANAGER_OPTS.);
}

Observed this when I tried to start RM from marathon, and HTTP GET the /config 
API and checked YARN_NODEMANAGER_OPTS between 10-15 min interval .
/code
 
Here is what I observed:

yarnEnvironment: {
YARN_HOME: /opt/mapr/hadoop/hadoop-2.7.0/, 
YARN_NODEMANAGER_OPTS: -Dnodemanager.resource.io-spindles=4.0 
-Dyarn.resourcemanager.hostname=rm.marathon.mesos
}, 

After 10-15 min of being idle:

yarnEnvironment: {
YARN_HOME: /opt/mapr/hadoop/hadoop-2.7.0/, 
YARN_NODEMANAGER_OPTS: -Dnodemanager.resource.io-spindles=4.0 
-Dyarn.resourcemanager.hostname=rm.marathon.mesos 
-Dyarn.resourcemanager.hostname=rm.marathon.mesos 
-Dyarn.resourcemanager.hostname=rm.marathon.mesos 
-Dyarn.resourcemanager.hostname=rm.marathon.mesos 
-Dyarn.resourcemanager.hostname=rm.marathon.mesos 
-Dyarn.resourcemanager.hostname=rm.marathon.mesos 
-Dyarn.resourcemanager.hostname=rm.marathon.mesos 
-Dyarn.resourcemanager.hostname=rm.marathon.mesos 
-Dyarn.resourcemanager.hostname=rm.marathon.mesos 
-Dyarn.resourcemanager.hostname=rm.marathon.mesos 
-Dyarn.resourcemanager.hostname=rm.marathon.mesos 
-Dyarn.resourcemanager.hostname=rm.marathon.mesos

[jira] [Created] (MYRIAD-125) Myriad /config API issue for YARN_NODEMANAGER_OPTS param when RM's hostname is passed.

2015-08-18 Thread Sarjeet Singh (JIRA)
Sarjeet Singh created MYRIAD-125:


 Summary: Myriad /config API issue for YARN_NODEMANAGER_OPTS 
param when RM's hostname is passed.
 Key: MYRIAD-125
 URL: https://issues.apache.org/jira/browse/MYRIAD-125
 Project: Myriad
  Issue Type: Bug
  Components: Scheduler
Affects Versions: Myriad 0.1.0
Reporter: Sarjeet Singh


The issue is with YARN_NODEMANAGER_OPTS param that has 
-Dyarn.resourcemanager.hostname=host-name keep appending on each offer 
created by myriad to yarnEnvironment. 

Found this piece of code that has the issue.

In TaskFactory.java:
code
String rmHostName = System.getProperty(YARN_RESOURCEMANAGER_HOSTNAME);
if (rmHostName != null  !rmHostName.isEmpty()) {

String nmOpts = 
nmTaskConfig.getYarnEnvironment().get(YARN_NODEMANAGER_OPTS_KEY);
if (nmOpts == null) {
nmOpts = ;
}
nmOpts +=   + -D + YARN_RESOURCEMANAGER_HOSTNAME + = + 
rmHostName;

nmTaskConfig.getYarnEnvironment().put(YARN_NODEMANAGER_OPTS_KEY, nmOpts);
LOGGER.info(YARN_RESOURCEMANAGER_HOSTNAME +  is set to  + 
rmHostName +
 via YARN_RESOURCEMANAGER_OPTS. Passing it into 
YARN_NODEMANAGER_OPTS.);
}

Observed this when I tried to start RM from marathon, and HTTP GET the /config 
API and checked YARN_NODEMANAGER_OPTS between 10-15 min interval .
/code
 
Here is what I observed:

yarnEnvironment: {
YARN_HOME: /opt/mapr/hadoop/hadoop-2.7.0/, 
YARN_NODEMANAGER_OPTS: -Dnodemanager.resource.io-spindles=4.0 
-Dyarn.resourcemanager.hostname=rm.marathon.mesos
}, 

After 10-15 min of being idle:

yarnEnvironment: {
YARN_HOME: /opt/mapr/hadoop/hadoop-2.7.0/, 
YARN_NODEMANAGER_OPTS: -Dnodemanager.resource.io-spindles=4.0 
-Dyarn.resourcemanager.hostname=rm.marathon.mesos 
-Dyarn.resourcemanager.hostname=rm.marathon.mesos 
-Dyarn.resourcemanager.hostname=rm.marathon.mesos 
-Dyarn.resourcemanager.hostname=rm.marathon.mesos 
-Dyarn.resourcemanager.hostname=rm.marathon.mesos 
-Dyarn.resourcemanager.hostname=rm.marathon.mesos 
-Dyarn.resourcemanager.hostname=rm.marathon.mesos 
-Dyarn.resourcemanager.hostname=rm.marathon.mesos 
-Dyarn.resourcemanager.hostname=rm.marathon.mesos 
-Dyarn.resourcemanager.hostname=rm.marathon.mesos 
-Dyarn.resourcemanager.hostname=rm.marathon.mesos 
-Dyarn.resourcemanager.hostname=rm.marathon.mesos 
-Dyarn.resourcemanager.hostname=rm.marathon.mesos 
-Dyarn.resourcemanager.hostname=rm.marathon.mesos 
-Dyarn.resourcemanager.hostname=rm.marathon.mesos 
-Dyarn.resourcemanager.hostname=rm.marathon.mesos 
-Dyarn.resourcemanager.hostname=rm.marathon.mesos 
-Dyarn.resourcemanager.hostname=rm.marathon.mesos 
-Dyarn.resourcemanager.hostname=rm.marathon.mesos
}, 

Let me know if need any addiotinal detail about the issue? Thanks.

FYI, this should be easy to reproducible by just starting Myriad and check 
/config API output between 10-15 min interval.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)